ERA, probably the single most cited reference for evaluating the performance of a pitcher, comes with a lot of problems. Neil does a good job outlining why in this FanGraphs Library entry. Over the last decade, plenty of research has cast a light on the variables within ERA that often have very little to do with the pitcher himself.
But what is the best way to use fielding-independent stats to estimate ERA? FIP is probably the most popular metric of this ilk, using only strikeouts, walks, hit batters, and home runs to create a linear equation that can be scaled to look like an expected ERA. Then there’s xFIP, which is based off the idea that pitchers have very little control over their HR/FB rate; to account for this, it estimates the amount of home runs that a pitcher should have allowed by multiplying their fly balls allowed by the league average HR/FB rate.
For many people, however, these are too simple. FIP more or less ignores all balls in play completely; xFIP treats all fly balls equally. Neither one correctly accounts for the effects that any ball in play can have; we know that the wOBA on line drives is much higher than the wOBA on pop ups, but we don’t see that reflected in many ERA estimators. The estimators we use also are fully linear, and may break down at the extreme ends; FIP tells us that a pitcher who strikes out every batter should have an ERA around -5.70, which is, well you know, not going to happen.
This is where simulations can help. I’m a big fan of simulations and think that they can be tremendously powerful and accurate tools when used correctly. So what I have done is created a Markov-esque* simulation to estimate a pitcher’s ERA with the following inputs: K%, BB% (which I will refer to a lot throughout this article, and every time will mean BB+HBP%, since walks and hit batsmen are for our purposes the same thing), and HR% (these three are the FIP inputs); and GB%, FB%, LD%, and IFFB%. The goal is to produce a more accurate ERA estimator that still only takes into account the pitcher’s fielding-independent stats.
*I say Markov-esque because in the technical definition of a Markov chain, each state is a result only of the state that preceded it. This is not really the case in this simulator, as you will see.
Here’s how I did this. First, I assigned each of the 7 inputs a range between 0 and 1. For example, if the pitcher had a 20% K%, an 8% BB%, and a 2% HR%, this is what those ranges would look like:
0 – 0.2: K
0.2 – 0.28: BB
0.28 – 0.3: HR
And then if they had a 50% GB%, a 35% FB%, a 15% LD%, and a 10% IFFB%, here’s what those ranges would look like:
0.3 – 0.65: GB
0.65 – 0.8705: OFFB
0.8705 – 0.895: IFFB
0.895 – 1: LD
These were calculated using the fact that GB%, FB%, and LD% are grounders, fly balls, and line drives per ball in play, not per batter, like K%, BB%, and HR%. So for our made-up pitcher, who allowed a ball in play 70% of the time, each of his GB%, FB%, and LD% had to be multiplied by 0.7. Then IFFB (infield fly balls — pop ups) were separated from OFFB (outfield fly balls) by multiplying FB% by IFFB% and 1-IFFB%, respectively. (Remember that IFFB% is not pop ups per ball in play, but rather pop ups per fly ball. So GB%+FB%+LD%+IFFB% doesn’t equal 1, GB%+FB%+LD% does.)
Note: I realize that home runs can be considered balls in play, and are included in fly ball rates. So when inputting numbers, you’ll have to use the fly ball rate that doesn’t include home runs. Don’t worry about calculating that for yourself; I’ve done it for you.
Then I defined three variables: the outs, the runs that had scored, and the runners. The beginning of the simulation, naturally, is a situation where there are no outs, no runs in, and no runners on base. From there, I generated a random number between 0 and 1. This number would fall into the range of one of the outcomes.
Then it got interesting. If the random number fell within the range for a strikeout, walk, or home run, what happened next was simple: a strikeout added one out to the current number of outs, and if that made 3 outs, the bases reset. A walk added a runner to the next available base or added one run if the bases were loaded. A home run cleared the bases and added the appropriate amount of runs. If the random number fell within the range for one of the batted balls, things were considerably more complex. Here are the outcome distributions for each of the batted ball types (home runs excluded):
|Ball in play||Out||1B||2B||3B|
So if the first random number dictated some sort of ball in play, a second random number was used to determine what type of hit the ball in play would be, which would of course depend on what type of batted ball it was in the first place. But wait, there’s more! How do the runners advance on different types of batted balls? Well, as one would expect, runners advance bases differently on singles than they do on doubles, but they also advance differently on, say, ground ball doubles than they do on fly ball doubles. So I had to find out how runners move on the basepaths for different types of balls in play and different types of hits. Here’s what I found:
|Hit Type||BIP||xxx -> xxx||xxx -> 1xx||xxx -> 12x||xxx -> 123||xxx -> 1×3||xxx -> x2x||xxx -> x23||xxx -> xx3|
|Hit Type||BIP||1xx -> xxx||1xx -> 1xx||1xx -> 12x||1xx -> 123||1xx -> 1×3||1xx -> x2x||1xx -> x23||1xx -> xx3|
|Hit Type||BIP||12x -> xxx||12x -> 1xx||12x -> 12x||12x -> 123||12x -> 1×3||12x -> x2x||12x -> x23||12x -> xx3|
|Hit Type||BIP||123 -> xxx||123 -> 1xx||123 -> 12x||123 -> 123||123 -> 1×3||123 -> x2x||123 -> x23||123 -> xx3|
|Hit Type||BIP||1×3 -> xxx||1×3 -> 1xx||1×3 -> 12x||1×3 -> 123||1×3 -> 1×3||1×3 -> x2x||1×3 -> x23||1×3 -> xx3|
|Hit Type||BIP||x2x -> xxx||x2x -> 1xx||x2x -> 12x||x2x -> 123||x2x -> 1×3||x2x -> x2x||x2x -> x23||x2x -> xx3|
|Hit Type||BIP||x23 -> xxx||x23 -> 1xx||x23 -> 12x||x23 -> 123||x23 -> 1×3||x23 -> x2x||x23 -> x23||x23 -> xx3|
|Hit Type||BIP||xx3 -> xxx||xx3 -> 1xx||xx3 -> 12x||xx3 -> 123||xx3 -> 1×3||xx3 -> x2x||xx3 -> x23||xx3 -> xx3|
(xxx = bases empty, 123 = bases loaded, x2x = runner on second, etc.). If you’re interested in those numbers, here is the download link for the Excel file, and here is the dowload link for the .csv file.
Anyways, I would generate a third random number to determine how the runners advanced. Say the first two random numbers dictated a single on a ground ball, and there were runners on first and third. If you look at the table above, you’ll see that in that situation and with a ground ball single, the baserunner situation changes to first and second about 75% of the time, to first and third (which isn’t really a change) about 21.5% of the time, and to other various things about 3.5% of the time. So if my third random number was below .75, the base state would change to first and second; if the number was between .75 and .965, the base state wouldn’t change; and so on. (Actually, to preserve my sanity and to avoid having to monotonously type so many things into a program, I rounded a little and removed events that almost never happened; here, I went with a 77-23 split and eliminated all the other small possibilities because they were so rare anyways.)
And of course, a run scored there. So I would add a run to the amount of runs that had scored. But sometimes yet more random numbers were needed — in cases where it was ambiguous whether people who got taken off the basepaths scored or got tagged/forced out. Another example: On fly ball outs where the base-state goes from “xx3” to “xxx”, it’s clear that a runner tried to tag up and score on a sacrifice fly. But how do we decide if the runner made it or not? I found the proportion of times where there was one run scored on the play and one out, and the proportion of times where there were no runs scored and two outs. (In this case, the split was actually a surprising 97.15% success rate for the runner tagging up — in a sample of 738 tries!) I then used my fourth random number to determine how many runs scored and how many outs were made on each play where it may have been unclear.
That’s pretty much how my simulator works. It runs until the desired amount of innings pitched has gone by and then gives an ERA, which is just the number of runs that scored divided by the innings times nine. But you’d think that with all the randomness that goes into the simulation, it has to be run many, many times in order to get a meaningful and stable result. And that’s precisely the point.
In a normal season, pitchers nowadays will get a maximum of roughly 250 innings pitched, and almost always fewer, especially if they are relievers. That’s part of what makes ERA so volatile; there’s so much randomness and luck that goes on in that relatively small amount of innings. This simulator, however, can simulate hundreds of thousands of innings in just seconds. That is enough to strip almost all of the luck out of the result, because eventually all of the random numbers will average out, something which they do not have time to do in a pitcher’s season.
Of course, this is all resting on one assumption, and that’s that pitchers don’t have control over their balls in play past what type they are. This we know not to be entirely true, and it really doesn’t make any sense, either: if pitchers can control the kind of balls that get put into play (which they can, something that this nifty tool shows us), who’s to say that they can’t control the quality of contact, at least to some extent? But until we find a way to quantify that, we’re going to have to go with what we know. My next article is going to discuss how to figure out how much pitchers can control what happens on their balls in play, and from there I will try to incorporate that into this model.
Additionally, this method entirely ignores the instability of HR/FB rate, and is more like FIP in that way — it doesn’t think about home runs being somewhat luck-driven, and instead assumes that the pitcher has complete control over them. Maybe in the future I’ll create another version of this simulator that is more similar to xFIP.
Ok, finally: here’s the Python script (in Python 2.7) for you to be able to run the simulator. If you don’t know how to use that, you can copy and paste the code into something like Evaluzio, but just know that that’s a lot slower. (Hit the “Try it now” button towards the right on the Evaluzio homepage to get to the code editor.)
If you want to be able to download the script but you don’t know how: download Python 2.7.9 (or whatever the latest version starting with 2.7 is) from here. Open Idle (which was downloaded as part of the Python download) and create a new window (command/control + N, depending on if you’re using Mac/Windows). Copy the Python file above and paste it into that window. Run the script (F5 button) and put in the inputs. It works pretty fast — like, simulating 100,000 innings in under 2 seconds fast.
And here is a table of pitchers with each of the stats needed for this simulator:
When you’re running the simulation, don’t input the player’s normal batted ball profile, because that includes home runs — this simulation regards home runs as totally separate from other fly balls, which is reflected in the table above. Also, I would advise running at least 100,000 innings for each simulation — that way, it will be fairly stable without taking too much time.
And as a reference, here is a table of pitchers with at least 340 total batters faced and what the simulator — do I need a name for this? I’ll call it SERA, for Simulated ERA — says their ERA should be. Each one has had 500,000 innings simulated:
|114||Jorge de la Rosa||4.16||4.34||4.10|
|150||Rubby de la Rosa||4.59||4.30||4.43|
For the most part, SERA, FIP, and ERA are all fairly close. But you can see that FIP and SERA are much more closely correlated than ERA and SERA:
Which makes sense, because what’s going into FIP is also going into SERA. There are, however, some pitchers whom SERA likes a lot more than FIP does…
- Edinson Volquez
- Alex Cobb
- Chris Young
- Danny Duffy
- Clay Buchholz
And also those whom FIP likes more:
- Edwin Jackson
- Masahiro Tanaka
- Ervin Santana
- Brandon McCarthy
- Tim Lincecum
- Adam Wainwright
- Hyun-Jin Ryu
- Phil Hughes
- Stephen Strasburg…
This list goes on much longer than that; generally, I think FIP tends to be more favorable towards better pitchers than SERA does, and specifically I think it is more favorable to low-walk pitchers (maybe we are overstating the negative impacts of walks? Worth thinking about). The average SERA among the pitchers in my 173-count sample was 3.89; for FIP it was 3.79 and for ERA 3.77. We can chalk some of the differences up to random variation in the simulation, because each simulation that gets run is going to be different, but over such a large sample (86.5 million total IP simulated), the difference can’t be all chance. For pitchers who have a large ERA-FIP divide, their SERA almost always comes between the two. The pitchers who had the largest split and didn’t have their SERA fall between their ERA and FIP were:
- Miguel Gonzalez (SERA lower than both)
- Carlos Villaneuva (lower)
- Robbie Ross (lower)
- Josh Beckett (lower)
- Justin Masterson (lower)
- Clay Buchholz (lower)
- Nathan Eovaldi (lower)
- Dan Otero (higher)
- Henderson Alvarez (higher)
- Alfredo Simon (higher)
All but Villaneuva and Ross are pitching, or pitched at one point in their careers, on the East Coast, which can’t possibly be a coincidence and must have some sort of meaning. But other than that, there’s no real obvious explanation. I would guess that there’s no underlying trend here (coastal sea breeze aside), and that this is all random.
Also for reference, here are the year-to-year correlations (r) for pitchers for each of SERA’s components (obtained with the article linked to earlier):
I think this table reinforces the fact that this simulator is more descriptive than it is predictive, just as FIP is. HR%, LD%, and IFFB% all have pretty low year-to-year correlations, meaning that a pitcher with, for example, a high LD% one year will have a low one the next year nearly as likely as a high one. Again, I plan on looking into the predictive capabilities of this model and how it can be adjusted to become more predictive.
Jonah is a baseball analyst and Red Sox fan. He would like it if you followed him on Twitter @japemstein, but can't really do anything about it if you don't.