You’ve surely heard the sentiment: that pitcher is boom-or-bust. When he’s dialed in, he’s unhittable, but sometimes he just doesn’t “have it.” It’s a non-falsifiable claim, of course. It’s nearly impossible to say what constitutes having it or not, and harder still to know if it’s predictive. For the most part, your talent level is your talent level. Great pitcher? You’ll have fewer blowup games. Bad pitcher? Random chance is going to give you your fair share of crooked numbers.
This unprovable fact, however, set me onto an interesting train of thought. What if run clustering isn’t a purely random process? What if some pitchers, not through any innate streakiness but merely by virtue of the outcomes they allow, give up runs in interesting patterns? Take a groundball-heavy pitcher, for example. When a run scores against him, it’s almost certainly due to a series of groundball singles and walks. If one run scores, there’s often another runner in scoring position right away. The state of the world upon giving up one run, for this Zack Britton-wannabe pitcher, is such that he’s immediately threatened with more runs.
Contrast that to a different type of pitcher, a Nick Anderson-style strikeouts and dingers fly ball pitcher. When our punch-outs and fly balls pitcher gives up a run, it’s often on a solo shot. When that’s the case, one run is in, but the resulting situation isn’t threatening anymore. The bases are empty, the damage done in a single instant. Wouldn’t it be reasonable to wonder whether the two allow runs in different bunches?
Still, those are a lot of words with no real evidence behind them. Who’s to say which of those pitchers allow more big innings? Who’s to say if they’re even equally good pitchers? The guy who allows a lot of home runs sounds like he might allow a lot of big innings, just by virtue of being someone who allows a lot of home runs. We need to be more precise to say anything with conviction.
Lots of research has been done on groundball and fly ball tendencies. The Book introduced the concept of a groundball/fly ball platoon split, with pitchers performing better against hitters with shared tendencies. Neil Weinberg tried to determine which archetype is better in the abstract. Our own Alex Chamberlain dove into what separates the two groups. Still, this robust body of research focuses on what makes groundball and fly ball pitchers different, which isn’t what we’re looking for. For this study, we want to find two pitchers with the same skill level overall, net of all of these effects.
Sadly, the amount of data we can collect on pitchers just isn’t going to cut it. We need pitchers with the same true talent run prevention — if they’re not equally good, our test will tell us more about which pitcher is better than which is uniquely suited to each situation. There simply aren’t enough innings to be sure what we’re actually measuring. Luckily, I had a solution in mind, a solution without which this article wouldn’t have been possible.
Focusing on the signal in real-life pitching lines is tremendously difficult. Pitchers don’t have one, consistent talent level. Sometimes they’re hurt. Sometimes they’re pitching on the wrong amount of rest. Sometimes they develop a new pitch, or just play differently from year to year. So instead of looking at real-life lines, I decided to invent some players who never experienced hot or cold streaks, who never got hurt or had a tired arm.
I wrote a Python script, the details of which I’ll cover in an appendix, to create fake players to help me test my hypothesis. In a nutshell, though, each pitcher is just a bundle of plate appearance outcomes. Plug in groundball, line drive, fly ball, walk, and strikeout rates, and the code will randomly determine the outcome of a plate appearance. It generates random plate appearances (and moves runners and scores runs and keeps track of outs) until the inning is over. Then, it repeats this process five million times.
There are disadvantages to abstracting baseball into such stark terms — I did away with steals, bunts, and baserunning outs, for example — but there are serious advantages, too. Our hypothetical pitchers have rubber arms — five million innings is about as many innings as have been thrown in baseball history. With this much data, we can be pretty confident that the talent level of our pitchers is accurately covered by how many runs they allow. After five million innings, you pretty much are what your ERA says you are.
Let’s take a look at the two pitchers I created, Gregory Groundball and Marty McFly. One note on these and all the distributions to follow: the groundball, line drive, and fly ball rates are as a percentage of all plate appearances, not per batted ball.
Both of these pitchers fare almost exactly the same in terms of run prevention (to simplify the code, I turned all errors into base hits, which is why I’m reporting RA/9), but they get there in extremely different ways. Gregory Groundball is an archetypical sinkerballer; his 2.87 GB/FB ratio would have ranked 11th among relievers last year. Marty McFly, on the other hand, is the very model of a modern reliever. Picture Chad Green or Josh Hader — his 0.5 GB/FB ratio would be third-lowest among 2018 relievers.
Neither of these pitchers is getting abnormally good or bad results on balls in play — their batted ball outcomes, by definition, exactly mirror the league-wide distribution of outcomes for each batted ball type in 2018. They don’t fare differently with runners on, or pitch to contact with big leads. The simulation doesn’t need to worry about any of that. It just selects a random outcome from the same table, over and over again, until we have our results.
Now, about those results. We’ve already seen that the two players have almost exactly identical ERA’s. How about streakiness, though? Is one type of pitcher more blowup-prone? Here’s the distribution of runs scored over those five million innings.
|Runs||Greg Groundball||Marty McFly|
These might not look very different. The grounder-heavy approach results in a clean inning only 0.6% of the time more often than the fly ball approach. A groundball-focused pitcher gives up big innings more often, but three or more runs cross the plate only 0.3% of the time more often against Gregory Groundball than they do against McFly. These differences aren’t huge — we’re talking a handful of wins a season at most. They’re very much real, though, and the results track with intuition.
Pitching to ground contact leads to more clean innings — most groundballs, after all, result in outs, and you can get away with one or two bad outcomes without giving up a run. It also leads to more big innings — putting a lot of balls in play on the ground often results in a lot of baserunners, and chaining baserunners together is the blueprint for a big inning. Pitching for fly balls, on the other hand, gives up more one-spots. Most fly balls become outs, so baserunners aren’t quite as common. When they do become hits, however, they often become home runs.
This is interesting, but it’s not the end of the story. Even if you buy the premise that this model gives an accurate count of how likely two 4.00-ERA true talent pitchers are to hold a lead, why are you bringing in league-average-level relievers to hold a lead? Let’s run the same experiment again, only with excellent hypothetical pitchers instead of average ones.
Meet our new contenders, Zack Robritton and Josh Haderbot. These robotic and idealized hurlers have similarly sterling ERAs, but they get there in markedly different ways.
These two pitchers are ludicrous. They have a true-talent 2.00 ERA. If your team has an elite closer like this, the vast majority of your leads are safe. The sheer number of zero-run outings, in fact, flattens the total distribution of runs allowed. Get this good, in other words, and it matters much less how you record your outs. The table below shows the run distribution, and it’s far more compressed than the average relievers above.
|Runs||Zack Robritton||Josh Haderbot|
We’ve described league-average pitchers with offsetting tendencies, and we’ve described elite pitchers cut from the same cloth. The rules of thumb they create are useful (grounders for a one run lead, fly balls for a three run lead) in direction but not enormous in magnitude. Just to complete the cycle, let’s look at what happens if your team brings a minor-leaguer to the table — we’ll run the experiment one last time, this time with poor pitchers.
Our new creations are really bad. No, like, really bad. They wouldn’t be good pitchers at Double-A, most likely. A true-talent 6.00 ERA is a gruesome thing to behold.
These pitchers are both terrible, but they’re both terrible in pretty similar ways. Their table of outcomes shows almost no difference in run distribution.
So, what’s the upshot of all of this? At the extremes, pitcher archetype doesn’t seem to matter much. Terrible pitchers allow so many baserunners and runs that there isn’t much difference in how they get on base. You can’t give up nearly a run an inning without allowing plenty of walks and singles, and at that point the home runs are largely incidental to the real problem, namely, the constant traffic on bases. On the other hand, dominant pitchers get so many outs by strikeout that their distributions also look quite similar. When your ERA is so low that you almost never allow a multi-run inning, it hardly matters how you do it.
In the middle, though, where the vast majority of major league pitchers exist, the conditions are just right for the two types of pitchers to distinguish themselves. Want to protect a slim lead in the sixth or seventh inning with a middle reliever? The groundball pitcher is more likely to get out unscathed. Up three in the fifth with a tiring starter? Johnny Fly Ball is just the ticket. The talent level of the pitchers is always going to matter more than their tendencies, but if you need a tiebreaker, look no further than this study. The way you pitch affects your runs-allowed distribution, even if you’re an inhuman, 5-million-inning-throwing robot.
The principle of the code I used to generate these results is that a pitcher has control over batted ball types, but not over what happens after those balls are in play. I took 2018 league-level batted ball data and applied that to each batted ball result, with some modifications (groundball errors became singles, line drive and fly ball errors doubles). I then created functions for each result (one base single, two base single, double, triple, home run, ground out, line out, fly out, walk, and strikeout). In the interest of it not getting too unwieldy, I eliminated the following events from my model: double plays, steals, caught stealing, baserunning outs, sacrifice bunts, three-base-advancement doubles, and hit by pitches. Some of these results are defined very simply (a double advances everyone by two bases), and some are slightly more complex (a one-base single will score a runner from second, but a runner from first will only advance to second; a fly out can advance runners on second and third, but not on first).
These simplifications definitely make baseball a little less rich but that’s a price I’m willing to pay to get a useable model. Each plate appearance is simulated by generating a random number and comparing it to the likelihood of each batted ball type. The script then calls the appropriate result. Each result function is an argument that takes a base state, the runs scored so far in the inning, and the outs. It then appropriately increments those three and returns them as a result. For example, if you call the “Double” function with a man on first, one run in so far, and no one out, that looks like this: Double([1,0,0],1,0). The result of that would be ([0,1,1],1,0) — second and third with one run in and no one out.
There are a few flourishes that I felt I could add while keeping the code relatively simple — half of groundball singles with two outs become two-base singles, for example, while all groundball singles with one or fewer outs are one-base. Some of these assumptions can be tuned further; this is just a first pass at a model. At its heart, though, there’s a simple principle at work. Take a random result, adjust the state of the game, and iterate. While the exact contours of what I’ve modeled don’t precisely match baseball, they come close enough that I’m happy with the results.
Ben is a contributor to Fangraphs. A lifelong Cardinals fan, he got his start writing for Viva El Birdos. He can be found on Twitter @_Ben_Clemens.