Simulating the Impact of Pitcher Inconsistency

by Steve Staude

August 21, 2013

I thought Matt Hunter’s FanGraphs debut article last week was really interesting. So interesting, in fact, that I’m going to rip it off right now. The difference is I’ll be using a Monte Carlo simulator I made for this sort of situation, which I’ll let you play with after you’re done reading (it’s at the bottom).

Matt posed the question of whether inconsistency could be a good thing for a pitcher. He brought up the example of Jered Weaver vs. Matt Cain in 2012 — two pitchers with nearly identical overall stats, except that Weaver was a lot less consistent. However, Weaver had a bit of an advantage in Win Probability Added (WPA), Matt points out. WPA factors in a bunch of things, e.g. how close the game is and how many outs are left in the game when events occur. Because of that, it’s a pretty noisy stat, heavily influenced by factors the pitcher doesn’t control much. It’s not a predictive stat. For that reason, I figured simulations might be fun and enlightening on the subject. They sort of accomplish the same thing that WPA does, except that they allow you to base conclusions off of a lot more possible conditions and outcomes than you’d see in a handful of starts (i.e., they can help de-noise the situation).

Weaver vs. Cain

Weaver, wearing hat and smiling Cain, eating a mean watermelon

The following chart will break down the frequencies of each of the above’s 2012 ERAs by game:

What you see here is that Weaver’s ERA was two or less in a third of his starts, a feat Cain accomplished in fewer than sixteen percent of his starts. Weaver’s problem, of course, is that he got shelled for an ERA over ten in ten percent of his games. Yet, overall, their ERAs were nearly identical: 2.81 for Weaver and 2.79 for Cain.

So, to look into which pitcher should theoretically be more useful to a team, I set up a few different types of simulations. One simply randomly matches up performances by Cain directly against Weaver’s results, assigns a run contribution to each of their bullpens for the games, and tells us who comes out on top more often. The next simulator matches up each of their performances against some normally-distributed level of offensive runs scored, to give each a win percentage. The last is a more generalizable way of comparing pitchers based on means and standard deviations.

Assumptions (of the simulation)

Bad assumptions can ruin a model. Garbage in, garbage out, as the saying goes. Hopefully mine aren’t too bad, but you’ll be able to change them in the web app lower in the article if you disagree.

One of the main things I figured I should factor in was the effect of placing a larger work load on the bullpen. I guessed that if the ‘pen only had to pitch an inning or two in a game, it would disproportionately be the closer and set-up man doing the pen’s pitching, and therefore the pen’s runs allowed per 9 innings (RA/9) would be lower in those games. Well, here’s what I found out for 2011-2012 games that went nine innings or less:

IP (by Bullpen)	Bullpen RA/9	Bullpen ERA	Starter RA/9	Starter ERA
0-2	4.27	3.88	2.72	2.48
2.1-4	4.00	3.66	5.23	4.80
>4	4.38	4.10	13.06	12.15

Yeah, it turns out that in those games where the starter leaves after 7+ (typically after having given up fewer than three runs), the bullpen doesn’t do so well. I don’t know if that’s because many of those games are blowouts being pitched by mop-up guys, or because opposing teams and managers tend to pull out all the stops at the end of the game. But, there you have it.

If you don’t exclude extra-inning games, the Bullpen’s RA/9 in games where they pitch over four innings drops to 3.95 (while the other numbers stay pretty much the same). But since my simulators limit games to nine innings, in the name of keeping things simple, I’ll stick with the numbers above.

What about the effect of tiring out relievers so that they can’t pitch the next day, or at least not pitch as well? OK, that’s a lot harder to figure out. I just had to make a guess when I decided what RA/9 to use for the various IP levels. Ultimately, I chose a RA/9 of 4 for when the bullpen pitched four innings max, and a RA/9 of 4.5 when they had to pitch more than that.

So, I had the bullpen average RA/9 out of the way, but now I had to work on standard deviations (how spread-out, or inconsistent they are). There’s a minimum expected level of standard deviation that’s predicted by the formula: SQRT(Chances * Rate * (1 – Rate)) / Chances. “SQRT” mean square root. “Rate” is the run-allowing rate, and “Chances” are determined by how many innings are left for the bullpen to pitch (but also by how many batters they let on base… long story). Using that, I figured the minimum RA/9 standard deviation for typical bullpen would be about 5.87 over 1 IP, 4.15 for 2 IP, 3.26 for 3 IP, and 2.85 for 4 IP. It looks like in actuality (in 2011-2012), the standard deviations were around 10%-25% higher than that (being more accurate for 1 or 2 IP). So I decided to adjust the standard deviations up 15% from the minimum expected level.

For the third simulation type, I had to look into the relationship between how many runs a starter allows vs. how soon until he gets the hook. I thought the results were surprisingly clean:

The trend here is extremely clear. It shows that a starter is probably going to get yanked by the manager if he gives up 5 runs around the first inning, or by the second if he allows around 6 runs total, etc. (the scale is runs per inning here). If he makes it deep into the 9th inning, he’s probably only allowed about 1 run total. Part of this is a function of pitch counts, and part of it is a function of performance, but it’s all pretty predictable, apparently. In the third sim type, after assigning a RA/9 rate to a starter (based on a randomly generated percentile of their individual bell-curve), I used the formula on this chart as the primary determinant of how many innings they’d likely pitch that game.

Still, there are undoubtedly some pitchers who have a tendency towards racking up higher pitch-counts over a given number of innings, or whose managers tend to let them rack up high pitch counts, for example. To address that, I also added the option of adjusting the starter’s IP towards some “base” IP level, weighted as strongly as you’d like.

Results

Simulation Type 1: Weaver vs. Cain, head-to-head

With the above assumptions, over a million simulations, Weaver’s team beat Cain’s 52.7% of the time (excluding extra inning games, which happened 12.8% of the time overall). This assumes the two are on a level playing field, with the same defenses behind them, and that they’re both facing a completely average offense.

Simulation Type 2: Weaver and Cain, with various levels of run support

This piggybacks off of Sim#1. From that, I obtained the following breakdown of win percentage according to how many runs each pitcher received in support:

Runs Scored by Offense	Win%
Runs Scored by Offense	Weaver	Cain
0	0.00%	0.00%
1	18.35%	13.45%
2	35.91%	30.56%
3	56.90%	52.38%
4	72.64%	68.48%
5	82.28%	82.20%
6	88.01%	91.94%
7	91.09%	97.02%
8	92.77%	99.15%
9	94.05%	99.79%
10	95.32%	99.92%
11	96.74%	99.94%
12	98.06%	99.94%
13	99.05%	99.94%
14	99.59%	99.94%
15	99.83%	99.94%

What you’re seeing has to do not only with Weaver’s inconsistency in runs allowed, but also in innings pitched (transferring more weight to the performance of the bullpen).

From there, it was a matter of coming up with the likelihood of the pitchers getting each of those levels of run support. Based on 2012 MLB averages, I assumed they’d each get an average of 4.3 runs of support, with a standard deviation of 3 runs. Results of a million simulations: Weaver wins 63.42% of his non-extra innings games; Cain wins 63.17%. After that many trials, that’s significant, but it’s of course a pretty minor advantage. Weaver’s greater success in low-scoring games gave him the edge in head-to-heads against fellow elites, but his propensity to lose high-scoring games counters that overall.

Simulation Type 3: General Simulation

OK, here’s the part where you get to play along at home. The default I’ve entered here approximates the Weaver vs. Cain battle (“Team 1” representing Weaver’s team). I found in my testing that Team 1 wins around 50.5% of the time. That’s not as dramatic as the 52.7% I found in Sim#1, but that may be because the actual distributions weren’t exactly normal, as the assumption is in this model. Anyway, you can change around the assumptions (white boxes with red borders) within the app here, or you can download the spreadsheet with the green icon at the bottom:

What you’re seeing here is the results of only 2,000 simulations at a time (trying to keep the file size down), but if you download it, you can copy the rows of the “Calculations” tab downwards to do a lot more at a time. At only 2000 sims, there’s a bit of a margin of error, as you’ll probably see if you start changing blank cells (which will come up with new simulations each time).

One thing I think you might notice is that for an inconsistent pitcher who sometimes gets taken out of games very early, having a bad bullpen behind him (especially long relievers) hurts more.

Well, hopefully this hasn’t been too confusing. I’ll be around to answer questions, just in case. Have fun!

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG