Commanding the Zone

December 7, 2011

Pitching coaches have long extolled the virtues of getting ahead of batters. And while this advice is wise, it’s also usually given in ambiguous terms. Be aggressive. Attack the hitter. The general notion of commanding the zone and being aggressive isn’t hard to conceptualize, but it is difficult to state in concrete, quantitative terms.

Statistics like K/PA and BB/PA tell us important information; after all, strikeouts and walks make up two of the three components of FIP. But they give us no insight into process of the at bat — they ignore everything except the result. And this is where pitch-type linear weights can help us.

Linear weights measure the average change in run expectancy for a given event. For example, home runs increase run expectancy by an average of about 1.4 runs. But linear weights can be calculated in a pitch-by-pitch level, as well. Just as a single, double, triple or home run increases run expectancy; so does going from a 0-0 count to a 1-0 count, where batters perform better. Of course the run value of an additional ball is often small, but over the course of a season, these pitch-by-pitch run values add up.

If we only look at the run values of pitches that are not put into play — just strikes and balls — we get a good grasp about how well a pitcher is attacking the zone. We can thank Jeremy Greenhouse for creating this method back in 2010.

If we add up the pitch-by-pitch linear weights of all pitches not put into play, the top 10 for 2011 are:

Roy Halladay      -52.5
Cliff Lee         -51.5
Justin Verlander  -46.6
Clayton Kershaw   -39.9
CC Sabathia       -36.7
Dan Haren         -36.2
Cole Hamels       -35.0
Madison Bumgarner -31.2
Matt Garza        -31.0
James Shields     -31.o

There are no surprises anywhere on this list. The top four players include both Cy Young award winners and the NL runner-up. Another way to look at this method is that it’s also the total run value of a pitcher, minus the run value of all pitches that are put into play.

If we turn the stat into a rate stat — linear weights per 100 pitches — relievers dominate the leaderboard:

Sergio Romo	    -2.8
Koji Uehara	    -2.1
Jonathan Papelbon   -1.9
Rafael Betancourt   -1.9
Craig Kimbrel	    -1.7
Mariano Rivera	    -1.6
Kenley Jansen       -1.5
Tyler Clippard      -1.5
Cliff Lee           -1.5
Octavio Dotel       -1.4

Sergio Romo’s score is amazing, and I’m sure it contributed to his incredible peripheral stats — 40% K/PA, 2.9 BB/PA. Cliff Lee’s showing here is also impressive, as he is the only starter in the top 10.

This method is clearly very strong based on anecdotal evidence — the leaderboards contain few, if any, surprises. But this metric is also powerful, according to more rigorous analysis. The correlation between count linear weights per 100 pitches and FIP is a robust .65. This means that the rate stat explains about 43% of the variation in FIP, which is extremely impressive. This actually beats K/BB ratio, which explains 33% of the variation in FIP. However, it does fall a little short of (K-BB)/PA. You can view the relationship between count linear weights per 100 pitches and FIP below:

The reason why this metric is so powerful is easy to understand. The bulk of count linear-weight run values come from the final pitch in each at bat. And while we aren’t looking at pitches that are put into play, the pitches with the largest absolute run values are the pitches that result in a walk in a strikeout. This makes this metric very similar to (K-BB)/PA, with a very strong correlation of -.94. I also should note that I’ve looked at this relationship for all pitchers in 2011 who faced at least 100 batters.

But until now, we have ignored balls that are put into play. It turns out that getting ahead in the count also affects the value of balls in play.

Effect of count on BABIP

We know that pitchers change their approaches depending on the count. They expand the zone when they are ahead in the count, and they throw more strikes with fastballs when they are behind. Given these changes, we should see some effect on BABIP. If we average BABIP based on count, we do see an effect*:

 count babip
   0-0 0.287
   0-1 0.285
   0-2 0.274
   1-0 0.296
   1-1 0.288
   1-2 0.279
   2-0 0.310
   2-1 0.307
   2-2 0.287
   3-0 0.293
   3-1 0.296
   3-2 0.298

*These values are calculated from my personal database and don’t match exactly with the values in B-ref — but they are pretty close. The difference probably stems from them handling sacrifices and me being lazy and including them as outs.

But these values fall likely prey to selection bias. As we saw before, better pitchers definitely get ahead in the count more often. If better pitchers also have lower BABIPs, then we’re also implicitly measuring the effect of being a good pitcher on BABIP. To test this assumption, I found the average FIP* of pitchers in each count. Here are the averages, sorted in ascending order:

 count   fip
   0-2 3.878
   1-2 3.896
   2-2 3.917
   0-1 3.923
   1-1 3.943
   0-0 3.944
   3-2 3.946
   2-1 3.961
   1-0 3.963
   2-0 3.987
   3-1 3.996
   3-0 4.011

A selection clearly bias exists, but it’s pretty small. To deal with this bias, I looked at every pitcher who faced at least 100 batters in 2011, and then I found the difference between their BABIP in each count and their overall BABIP. I then found the average difference, weighted by the number of balls in play for each pitcher. I should also note that the 3-0 count has a sample size problem; fewer than 300 pitches were put into play on 3-0 counts in 2011. Down to three decimals, the bias adjusted BABIPs were identical to the non-adjusted values reported earlier.

*I calculated FIP from my own database using the equation (13 * HR + 3 * BB – 1.93 * SO) / (PA * .23) + 3.10. It’s traditionally calculated with innings pitched in the denominator, and the constant I used may be a little off.

It appears that count definitely has an effect on BABIP, even after accounting for selection bias. So not only do count linear weights tell us a lot about a pitchers ability to command the zone — as measured by (K – BB)/PA — but they also might tell us a small amount about BABIP ability. Indeed, if I run a regression of BABIP on count linear weights per 100 pitches for all pitchers who faced at least 100 batters in 2011, I find that the relationship is significant at a 99.9% level — albeit with a low level of explanatory power. If I eliminate all relievers, the relationship is also significant, but this time at a 95% level.

All this is to say that pitching coaches are right. The best pitchers in the league — regardless of batted-ball profile — rate very well by count-based linear weights. It’s accurate to say that the ability to command the zone — as measured in this post — represents the single most important skill for a major league pitcher.

References and Resources

*PITCHf/x data from MLBAM via Darrel Zimmerman’s pbp2 database. Scripts by Joseph Adler/Mike Fast/Darrel Zimmerman

*Pitch type linear weights calculated by Harry Pavlidis

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG