On the Consistency of ERA

We know that ERA isn’t a perfect indicator of a pitcher’s talent level. It depends a lot on the defense behind the pitcher in question. It depends a lot on luck in getting balls in play to fall where the fielders are. It depends a lot on luck in getting fly balls to land in front of the fence. It depends a lot on luck in sequencing — getting hits and walks at times where it doesn’t hurt too much.

That’s why we have DIPS. Stats like FIP, xFIP, SIERA, my recent SERA, and Jonathan Judge’s even more recent cFIP all attempt to more accurately measure a pitcher’s talent by stripping those things out. But what if there was an easy way to figure out how much ERA actually can vary? How likely a pitcher’s ERA was? What the spread of possible outcomes is? The aforementioned ERA estimators do not address that issue. They can tell you what the pitcher’s ERA should have been with all the luck taken away (or at least what they think the ERA should have been), but they can’t answer any of the questions I just posed.

But SERA, which simulates ERA instead of using a formula, can be modified a little bit to help us out. If we, instead of simulating hundreds of thousands of innings at once, break the simulation up into 50- or 100- or 200-inning parts, we can find the distribution of outcomes for that pitcher. This is what I think the real advantage of SERA is. It’s a decent ERA estimator — not quite as predictive as xFIP — but it’s biggest asset is the ability to tell us about the variability in a pitcher’s ERA.

With a new script, we can now set the IP to a lower number, iterate that hundreds or thousands of times, and see the distribution of outcomes. Generally the distribution is not Normal, but luckily most of the time the distribution is around the same. It usually looks like this (the dotted line is the mean):



Of course, the spread changes based on the inputs (this was made using league average as the inputs), the innings per simulation, and even the number of iterations, but it’s usually pretty similar. The 10th percentile is generally between 1.05 and 1.2 standard deviations from the mean, and it is usually skewed slightly right (which is pretty intriguing; I don’t know why that is — but my guess is that really bad values are easier to get than really good values).

For an average pitcher, the 10th percentile is just under 3.00, the 25th percentile is just over 3.25, the median is about 3.63, the mean is about 3.66, the 75th percentile is just about 4.00, and the 90th percentile is just under 4.40. These are kind of like what percentile projections for Pecota do — estimate what the best- and worst-case scenarios are. I will later combine this with pSERA to look at each pitcher’s uncertainty, as well as plain mean projection, is for next year.

The next step is trying to figure out what makes a pitcher’s ERA more stable or unstable. Intuitively, you (or at least I) would think that strikeouts and walks both work to decrease volatility because they take out batted ball luck. However, that is not the case:

K and BB Heatmap


The StdDev is the standard deviations of the 1000 simulations of 180 innings each I ran using the given K% and BB% (with batted ball inputs all set to league average of 44.8 GB%, 34.4 FB%, 20.8 LD%, 9.6 IFFB%, and 9.5 HR/FB%). You can see that as K% increases and as BB% decreases, the standard deviation gets lower. That means that yes, a higher strikeout rate does work to decrease the variability in ERA, but a higher walk rate does not. This is probably because the sequencing luck involved with putting more runners on base has a greater effect then the batted ball luck involved with allowing more contact.

So, then, visiting the K-BB% leaderboards should give us a good sense of whose ERA last year was a pretty good indicator of their true talent. The higher the K-BB%, the lower variability there is in that pitcher’s ERA.

What about batted balls? Which batted ball types make ERA the most unstable? Here’s a chart similar to the one above, using the same numbers of 1000 iterations of 180 IP, showing GB% and FB% and the resulting standard deviation of the ERAs. LD% isn’t shown on the graph, because graphing four variables is hard, but it’s just 100-GB%-FB%; it’s lower towards the top-right and higher towards the bottom-left.

BIP Heatmap


The LD% here is sometimes negative and sometimes crazy high, which is obviously unrealistic. But the overall trend is consistent throughout: the higher the GB% and FB% — and in turn, the lower the LD% — the more consistent and less variable the ERA is. It’s the same thing as strikeouts and walks. Since line drives turn into hits more often, allowing more of them means more runners on base and the ERA is more dependent on sequencing luck.

But LD% is usually unreliable and doesn’t carry over very much year-to-year. GB% and FB% are much more stable, so we should look at those to see which is more important to reduce variability in ERA — that will tell us more about pitchers’ future ability to maintain a consistent ERA.



Another clear trend. This one shows that a higher GB% and a lower FB% decreases the variability. I had expected that a higher FB% would lead to a more stable ERA, since more fly balls gives IFFB% and HR/FB% a better chance to normalize and be closer to the league average. But that isn’t the case. When you think more about it, it makes sense, too.

A pitcher with a 50% fly ball rate (which is very high) would allow about 285 fly balls over 800 batters, assuming average K%, BB%, and HBP%. Using sampling techniques, we can figure out that one standard deviation of HR/FB% would be roughly 1.8 percentage points. For a pitcher with a 25% FB% (which is pretty low), the standard deviation is about 2.5 percentage points. That’s a huge gap in FB%, but not a huge difference in the variation of HR/FB rate. (All this is assuming that pitchers do not have control over their HR/FB and IFFB rates, which isn’t totally true, but is an assumption that holds well enough to be able to generalize safely.)

Much more important than allowing HR/FB and IFFB rates to stabilize is preventing big hits like home runs and doubles altogether, since they put more runners on and create more variability in sequencing. Ground balls do that very well; almost no ground ball ever ends up as something other than an out or a single, save the occasional one of these. Fly balls, on the other hand, end up in extra-base hits quite often (over 20% of outfield fly balls go for extra bases).

In a nutshell, I think that the most important takeaway is this: good pitching = more stable ERA. A better pitcher will have an ERA that is more indicative of their actual skill, while a worse pitcher who puts more runners on will have an ERA that could be very different from what their actual talent level is. More runners on leads to more uncertainty. The very close correlation between K%, BB%, GB%, FB%, and LD% to the ERA variance tells us that there is a very tangible effect of those things. A higher strikeout and ground ball rate and a lower walk, fly ball and line drive rate not only lead to a lower ERA, but also to a much more consistent one.

Next week, I’ll take a look at pSERA for next year and use it to find how much the ERA of each pitcher can be expected to vary.

Jonah is a baseball analyst and Red Sox fan. He would like it if you followed him on Twitter @japemstein, but can't really do anything about it if you don't.

newest oldest most voted

“Intuitively, you (or at least I) would think that strikeouts and walks both work to decrease volatility because they take out batted ball luck.”

I don’t think that’s true for a per inning type estimator like ERA. Pitchers still have to get three outs per inning, so walks shouldn’t have much effect on how much contact is allowed per inning. We measure hitters on a per PA basis, so for them walks will take out batted ball luck.


I should have added that I really enjoyed the article and look forward to more.