Archive for Research

On Rotation, Part 1: The Effects of Spin on the Flight of a Pitch

My last article was a look at the effects of pitch location on batted balls. While it ended with on somewhat disappointing note, showing that the results couldn’t really be applied to individual pitchers, it did make me think more about which components of a pitch affect the pitch, and in which ways.

So I decided to examine spin. Spin is captured by PITCHf/x in two measurements: rate (in revolutions per minute) and direction (the angle in degrees). As it turns out, the spin of a pitch has quite the effect on its outcome, much like location. Different spin rates make the pitch move differently (obviously) and get hit differently. (For a look at this topic from a physics standpoint, check out this infographic and this much more complicated article, both from the excellent Alan Nathan. And, to make sure everybody knows: I know little about the actual physics of this past what I can infer from my baseball playing and watching experience. I am just looking at the PITCHf/x data.)

Before we get right to the graphs, a quick note about my methodology. I grouped each pitch from 2009 onward — which is the year PITCHf/x started to record spin rate consistently — into buckets based on spin rate (pitches were rounded to the nearest 50 RPM) and pitch type (I included four-seam fastballs, curveballs, changeups, two-seam fastballs, cutters, knuckleballs, and sliders). I then found a multitude of stats for each bucket: contact rate, average speed, average movement, ground ball rate, and many more. I also did the same with spin angle, grouping pitches into buckets by rounding to the nearest 20 degrees, but the results weren’t particularly meaningful.

I also combined two-seam fastballs and sinkers when I was doing this. There has been some discussion in the past about whether there is a difference between those two pitches. While PITCHf/x classifies them separately, they are more or less indistinguishable, and when I first did this without combining them, they overlapped on nearly all of the various graphs.

Read the rest of this entry »


Batted-Ball Rates vs. Velocity Changes

Last year, I revisited Mike Fast’s “Lose a Tick, Gain a Tick” article and found how much a pitcher should expect to see his ERA, FIP and xFIP change with a velocity decline. Additionally, I found the rate of decline of strikeouts and walks. An interesting finding from the work was that FIP and ERA change by the same amount with a velocity decline while xFIP doesn’t follow the other two. I decided to examine some batted-ball stats to see which ones change when a pitcher’s velocity changes.
Read the rest of this entry »


Modeling Salary Arbitration: Stat Components

This post is part of an ongoing arbitration research project and is coauthored by Alex Chamberlain and Sean Dolinar.

April 24: Modeling Salary Arbitration: Introduction

Feb. 25: 2015 MLB Arbitration Visualized

* * *

A couple of weeks ago, we introduced a couple of regressions that modeled arbitration results using a basic formulae predicated on wins above replacement (WAR). Ultimately, the models estimated that an arbitration-eligible pitcher could expect his salary to increase by 14 percent, and his raise in salary to increase by 56 percent, for each additional WAR. A hitter could expect increases of 13 percent and 46 percent, respectively.

The models, however, were incomplete: they did not incorporate any other stats aside from WAR. This was by design, as we wanted to introduce simple one-variable equations for the sake of demonstration. WAR is, conveniently, a comprehensive variable that attempts to summarize a player’s worth in one easily digestible number. But what about the effects of a player’s age or arbitration year?

Moreover, the r-squared statistic — a quick-and-easy check of a model’s validity — for each specification is not especially strong, clocking in anywhere between .30 and .56. This is partly a result of specifying only one explanatory variable, so including more variables — which we have done in this post — should improve the goodness of fit of the models, assuming the variables are relevant.

With that said, we have new-and-improved models to share with you: one comprised of composite statistics and another comprised of traditional statistics. They are all vanilla, linear ordinary least squares (OLS) regression models, and it is important to remember that the values for each stat can only be used in the context of that specific model.

Non-Traditional Statistics

For each player, we specify…

  • a composite statistic, such as wins above replacement (WAR) for batters and RA9-WAR for pitchers, to measure overall performance (RA9-WAR uses runs allowed per nine innings rather than FIP);
  • a service statistic, such as plate appearances (PA) and innings pitched (IP), to measure playing time;
  • a “glory” statistic, such as home runs (HR) and saves (SV), to account for baseball’s affinity for traditional statistics and social constructs;
  • arbitration year (for pitchers*), indicating a player’s total service time;
  • and his age (for hitters*), to measure as best we can the number of years for which he has inhabited the earth.

We identify these particular stats not only to cover as much analytical ground as possible but also minimize the use of stats that have high correlation among themselves (multicollinearity). We want to isolate different aspects of player performance or value as best we can.

Read the rest of this entry »


A Look at Quality of Contact Profiles

It seems like it should matter how hard you hit the baseball. That statement probably seems self-evident, but until this year we haven’t really had a whole lot of evidence to demonstrate whether that’s true. We have an old month of HITf/x data from 2009 and there’s non-public data about exit velocity, but until StatCast data arrived this year, we didn’t really have the tools to determine how much quality of contact matters.

Last week, FanGraphs launched quality of contact statistics courtesy of Baseball Info Solutions to add to this effort. The methodology isn’t based solely on a raw exit velocity, but the data stretches back to 2002 and it’s publicly available now and easy to manage. As soon as people realized the data was available, the sabermetric masses went to work to run preliminary tests on the data. One of the interesting things that showed up right away was that the data didn’t do a great job predicting itself in the future and things like Hard% didn’t correlate with stats like BABIP or LD% as well as we might have otherwise thought.

Read the rest of this entry »


How Contact Ability Might Influence a Hitter’s Transition to the Majors

Back in February, there was some discussion about the transition from Triple-A to the majors, and whether that jump was getting any more difficult. It certainly seemed that way. Several highly-regarded minor leaguers completely flopped in their first tastes of big league action last year. Gregory Polanco, Jon Singleton, Xander Bogaerts, Jackie Bradley Jr. and the late Oscar Taveras all didn’t hit a lick after tearing it up in the minors. And perhaps worst of all, Javier Baez — a consensus top 10 prospect heading into the year — hit a putrid .169/.227/.324 with an unsightly 41% strikeout rate.

Jeff Sullivan and Ben Lindbergh both looked into the validity of this phenomenon, and wrote response articles more or less debunking it. Both concluded that the gap between Triple-A and the majors wasn’t growing after all, or at least not in any meaningful way. So much for that.

However, after thinking about it for a while, I started to wonder if there might be other ways to explain the initial failures of guys like Baez. Perhaps it might be more informative to look at these transitions from a different angle: Not across time, but across skill sets.

Baez’s flaws were easily identifiable. He struggled to make contact, and also showed a tendency to chase pitches out of the zone. But perhaps his rough transition wasn’t unique to him. Maybe his skill set — his poor plate discipline and/or poor bat-to-ball ability — just doesn’t play well against major league pitching. If that’s the case, it might help us be wary of the next Javier Baez. Read the rest of this entry »


Batted Balls: It’s All About Location, Location, Location

BABIP is a really hard thing to predict for pitchers. There have been plenty of attempts, sure, but nothing all that conclusive — probably because pitchers have a negligible amount of control over it. So naturally, when I found something that I thought might be able to model and estimate pitcher BABIP to a high degree of accuracy, I was very excited.

My original idea was to figure out the BABIP — as well as other batted ball stats — of individual pitches from details about the pitch itself. Velocity, movement, sequencing, and a multitude of other factors that are within the pitcher’s control play into the likelihood that a pitch will fall for a hit (even if to a very small degree). But much more than all of those, pitch location seems to be the most important factor (as well as one of the easiest to measure).

I got impressively meaningful results by plotting BABIP, GB%, FB%, wOBA on batted balls, and other stats based on horizontal and vertical location of the pitch. So I came up with models to find the probability that any batted ball would fall for a hit with the only inputs being the horizontal and vertical location (the models worked very well). I even gave different pitch types different models, since there were differences between, for example, fastballs and breaking balls. I found the “expected” BABIP of each of each pitcher’s pitches, and then I found the average of all of those expected BABIPs — theoretically, this should be the BABIP that the pitcher should have allowed.

Read the rest of this entry »


The Non-Speed Components of Double Plays

Last week, we rolled out some minor tweaks to WAR, one of which was the addition of wGDP. If you haven’t read the primer, wGDP is a measure of double play runs above average and captures how many runs you save your team by staying out of double plays.

In general, it’s a minor piece of the overall puzzle with the best and worst players separated by less than a win of value over the course of a full season. Staying out of double plays helps your team, but even the best players don’t stay out of a large enough number to swing their value in a big way. Introducing wGDP makes WAR a better reflection of reality and that’s a good thing, but it also allows us to better measure the GIDP column we’ve all seen for years because it puts double plays in the context of double play opportunities.

Dave and August have already looked at some surprising and obvious players who are great at staying out of double plays, but I wanted to consider this new statistic from another angle. For the most part, it seems like staying out of double plays should be a base running issue, as you have to be fast enough to get to first before the infield twists it.

Read the rest of this entry »


Investigating the Idea of Scarce Right-Handed Power

I want to put to rest the discussion about the lack of right-handed power in Major League Baseball today. There has been a lot of anecdotal commentary about how scarce right-handed power has become, but there haven’t been too many analytical articles supporting this idea. If anything, the handful of articles that have been written question if the problem even exists in the first place. There are two different arguments about this topic: the first is that right-handed power is scarce — that is to say left-hand power is bountiful — but right-hand power is not, while the second argument, which I won’t address today, is that relative to left-handed power hitters, right-handed power hitters have declined in number.

In a hypothetical choice between players of equal talent, you would almost always prefer a left-handed power hitter to a right-handed power hitter, since the lefty will have the platoon advantage more often and should be more productive as a result. There are valid arguments concerning rounding out line-ups, but right-handed batters are not scarce; good left-handed hitters are actually the scarce commodity.

For reference, the general population is estimated at having a left-handed rate of 10%, while baseball has a left-handed rate among batters is about 33%; lefties are overrepresented in baseball.

This is a box plot of the various player-seasons from 2010 until 2014. I’ve chosen this time span since it’s recent and it falls after the implementation of PITCHf/x, which improved the measurement of the strike zone. I’ve excluded switch hitters for simplicity, and set a floor at 200 plate appearances.

2010-2014 Single Season HR

Read the rest of this entry »


On the Consistency of ERA

We know that ERA isn’t a perfect indicator of a pitcher’s talent level. It depends a lot on the defense behind the pitcher in question. It depends a lot on luck in getting balls in play to fall where the fielders are. It depends a lot on luck in getting fly balls to land in front of the fence. It depends a lot on luck in sequencing — getting hits and walks at times where it doesn’t hurt too much.

That’s why we have DIPS. Stats like FIP, xFIP, SIERA, my recent SERA, and Jonathan Judge’s even more recent cFIP all attempt to more accurately measure a pitcher’s talent by stripping those things out. But what if there was an easy way to figure out how much ERA actually can vary? How likely a pitcher’s ERA was? What the spread of possible outcomes is? The aforementioned ERA estimators do not address that issue. They can tell you what the pitcher’s ERA should have been with all the luck taken away (or at least what they think the ERA should have been), but they can’t answer any of the questions I just posed.

Read the rest of this entry »


Examining SERA’s Predictive Powers

SERA, my attempt to estimate ERA with simulation, started off as an estimator. Then, later, I laid out ways to make it more predictive. Well, here’s the new SERA: a more predictive, more accurate and better ERA estimator altogether.

First, a refresher: The first SERA worked by inputting a pitcher’s K%, BB%, HR% (or HR/TBF), GB%, FB%, LD% and IFFB%. Then, the simulator would simulate as many innings as specified, with each at bat having an outcome with a likelihood specified by the input. A strikeout, walk or home run was simple; a ground ball, fly ball, line drive or popup made the runners advance, score or get out with the same frequency as would happen in real life.

To make SERA a better predictor of future ERA, I outlined a few major ways: not include home runs as an input (since they are so dependent on HR/FB rate, over which pitchers have almost no control), not include IFFB% for the same reason (it is extremely volatile and pitchers also have very little control over it) and regress K%, BB%, GB%, FB% and LD% based on the last three years of available data — or two or one if the player hadn’t been playing for three years. There were some other minor things, too.

Read the rest of this entry »