One Last Refresher (On Strikeouts and Walks)

This is the last of a set of articles I’ve written over the past few weeks. Each one tries to determine what’s real and what’s noise when it comes to the outcome of a plate appearance. For the batted ball articles, the conclusions generally tracked. Variations in home run rate are largely due to the batter. Pitchers and batters both show skill in groundball rate. And line drives and popups are somewhere in between — batters exhibit a little more persistence in variation than pitchers, though neither does so strongly.

Strikeouts and walks are a different beast. It’s pretty clear that pitchers and batters can be good or bad at them. No one looks at Chris Davis or Tyler O’Neill and thinks “eh, that’s pretty unlucky to have all those strikeouts, I bet they’re average at it overall.” Likewise, Josh Hader isn’t just preternaturally lucky — he’s good at striking batters out.

So rather than attempt to prove that pitchers can be good or bad at striking out batters and vice versa, I’m interested in whether one side has the upper hand. I’m adapting a method laid out by Tom Tango here, but I’ll also repeat the same methodology I used in the previous pieces in this series.

First, let’s take a look at how pitchers did from year one to year two. As I did before, I divided every pitcher and batter into quartiles based on their 2018 strikeout rates, then I used those quartiles to group 2019 plate appearances. Then I weighted each batter by the minimum of pitchers they faced in any quartile. Here you can see that pitchers’ strikeout rates in year one do a reasonable job of predicting year two:

Batters vs. Different K% Pitchers
Year Quartile 1 Quartile 2 Quartile 3 Quartile 4
“2018” 15.4% 19.3% 23.2% 30.1%
2019 18.4% 20.0% 24.0% 27.7%

Of course, the same could be said for batters. Those who struck out a lot in 2018 still did so in 2019:

Pitchers vs. Different K% Batters
Year Quartile 1 Quartile 2 Quartile 3 Quartile 4
“2018” 15.1% 21.5% 26.1% 35.7%
2019 16.7% 21.6% 25.0% 31.7%

Don’t focus on the rates, as the numbers can be skewed because I’m equal-weighting each group rather than taking the real total rate of strikeouts. Rather, focus on the pattern between years. In both cases, roughly three quarters of the spread in year one remains in year two. Before we get into Tango’s method, here’s a grid of how each 2018 quartile did when facing each other in 2019. The batter quartiles go down the left side, and the pitcher quartiles across the top:

Batter/Pitcher Result Grid, K%
Quadrant 1 2 3 4
1 13.7% 14.5% 17.4% 20.3%
2 18.2% 19.7% 23.0% 27.0%
3 20.5% 23.1% 27.6% 31.3%
4 27.1% 29.6% 33.8% 38.8%

Now let’s use Tango’s method to regress these rates. First, we look at how much smaller the spread between the first and fourth quartiles is in year two as compared to year one. For the pitcher quartiles, that’s 37% — the spread declines, but it’s still quite large.

From there, we need the average least-n weight in our dataset, which comes out to 34. Chuck it into the formula, which is [(Regression%)/(1-Regression%)]*(Average Weight), and we get an answer of 20. What does this mean? Well, let’s say you have a pitcher and want to work out how much to regress their skill to the mean. The math is pretty easy; take that pitcher’s least-n weight and divide it by that weight plus 20. Jeff Hoffman’s least-n was 30; so if you want to decide how much of his skill to retain, you can simply take 30 and divide by 50 (30+20). Want to predict his strikeout rate in 2020 using 2019 data? Retain 60% of his difference from the mean.

On the batter side, we can do the same thing. Batters regress to the mean a bit less, and they did it in a smaller average sample size. Their ballast weight (the number you use to regress the data to the mean) is 8.5. Take Travis Shaw, with a least-n of 30. Want to project him in 2020? Retain 78% of his variation from the mean.

What this means is that you can believe a batter’s strikeout numbers, for a given observation, more than you’d believe a pitcher’s. That’s borne out by looking at the grid of batter-pitcher matchups. On average, moving a tier higher in batter strikeout rate (in 2019) added 5.3% to the strikeout rate. Moving a tier higher in pitcher strikeout rate increased strikeout rate by 3.2% on average.

Let’s do the same for walks. First, we’ll look at how pitchers varied in 2018 and 2019:

Batters vs. Different BB% Pitchers
Year Quartile 1 Quartile 2 Quartile 3 Quartile 4
“2018” 4.9% 7.3% 9.3% 13.4%
2019 6.3% 7.7% 8.9% 10.4%

Okay, there’s some spread there. Let’s do the same for batters:

Pitchers vs. Different BB% Batters
Year Quartile 1 Quartile 2 Quartile 3 Quartile 4
“2018” 4.5% 6.9% 9.5% 12.5%
2019 6.5% 8.0% 10.0% 11.8%

Neat! Batters have more explanatory power again. Let’s get it in grid form:

Batter/Pitcher Result Grid, BB%
Quadrant 1 2 3 4
1 4.0% 4.6% 5.2% 7.1%
2 5.0% 6.4% 7.2% 8.7%
3 6.3% 7.8% 8.9% 10.8%
4 8.8% 10.5% 12.1% 13.2%

We’ve got the same story as before — batters and pitchers both have input. I’ll save you the grunt work of doing the Tango-style math here; you should use ballast numbers of 37 for pitchers and 14 for batters. In other words, make something of a batter’s walk rate before you do the same for a pitcher. That makes some sense to me; pitchers are mostly in a narrow band of walk rates long-term while batters really aren’t. Mike Trout has a 19% walk rate over the last three years while Dee Gordon is at 3.1%. The biggest pitcher gap is Tyler Chatwood’s 14.5% and Mike Leake’s 4.1%. Batters just vary more.

If you’re still with me after all this time, thank you. It was a bit of a slog to write it but hopefully not one to read it. Today’s conclusion isn’t particularly surprising: you should care about batter and pitcher strikeout and walk rates, because both can tell you something. Maybe you can learn about batters’ rates a little faster, but you would be well served to use both.

In no time we’ll be back to our regularly scheduled articles talking about new pitches and new performance levels, or perhaps even teams’ competitive aspirations. But I’m glad I had a chance to go over some of the basics before games start up again and make sure that what we expect still roughly holds.

We hoped you liked reading One Last Refresher (On Strikeouts and Walks) by Ben Clemens!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs




Ben is a contributor to FanGraphs. A lifelong Cardinals fan, he got his start writing for Viva El Birdos. He can be found on Twitter @_Ben_Clemens.

newest oldest most voted
kbpms2
Member
kbpms2

I would rewrite (Regression%)/(1-Regression%)*(Average Weight) with additional brackets to make clearer the order of operations, because it’s ambiguous – in some journals multiplication has preference over division, so this formula would be interpreted incorrectly. [Regression%/(1-Regression%)]*Average Weight is clearer.