Lies, Damn Lies, and Year-to-Year Correlation

I’m going to spend about half this article trying to deceive you. I just thought you should know that upfront. Why? Alex Chamberlain wrote a wonderful article about Kyle Hendricks yesterday, and it reminded me of one of my favorite paradoxical findings about pitching. Now, I’m going to use that finding to bamboozle you — or at least, that’s the plan.

Hendricks, you see, is spectacular at throwing pitches in the shadow zone, the boundaries of the strike zone and the area just outside of it. That’s an obviously useful skill. When you watch a pitcher painting the corners, it doesn’t just feel like hitters are unlikely to make solid contact, it’s actually true. Batters have worse outcomes on the borderline part of the plate than in any other zone.

Here’s a statement that I don’t think is at all controversial: pitchers exercise a lot of control over where they throw the ball. It’s not like exit velocity — mostly batter-controlled — or walk rate, which depends on myriad factors that both pitcher and batter affect. Where a pitcher throws the ball should be up to, well, the pitcher.

It stands to reason that if pitchers control where the ball goes, there will be leaders and laggards at it. As it turns out, there are! As Chamberlain pointed out, no one has thrown a higher percentage of their pitches in the shadow zone (since 2015) than Kyle Hendricks; Marco Gonzales is narrowly behind in second place. Sure sounds like a skill to me.

With that in mind, here’s a 2018 leaderboard:

2018 Shadow Zone% Leaders
Pitcher Shadow% 2018 Rank
Jordan Zimmermann 47.1% 2
Aroldis Chapman 46.8% 3
CC Sabathia 46.8% 4
Marco Gonzales 46.5% 7
Kyle Hendricks 46.5% 8
Martín Pérez 46.5% 10
Ian Kennedy 46.4% 11
Kevin McCarthy 46.4% 13
Blake Treinen 46.1% 14
Dallas Keuchel 46.0% 15

I’ve only included pitchers who threw at least 750 pitches in both 2018 and 2019, for reasons that will become clear. There’s a healthy mix of pitchers on there, everyone from literally Aroldis Chapman down to Hendricks and Dallas Keuchel.

Let’s see how this cohort did on the same leaderboard in 2019. 373 pitchers qualified, just so you know how to think about these:

2018 Shadow Zone% Leaders
Pitcher 2018 Shadow% 2018 Rank 2019 Shadow% 2019 Rank
Jordan Zimmermann 47.1% 2 47.2% 4
Aroldis Chapman 46.8% 3 41.7% 206
CC Sabathia 46.8% 4 44.9% 31
Marco Gonzales 46.5% 7 46.0% 15
Kyle Hendricks 46.5% 8 47.5% 2
Martín Pérez 46.5% 10 44.9% 33
Ian Kennedy 46.4% 11 38.9% 344
Kevin McCarthy 46.4% 13 40.8% 286
Blake Treinen 46.1% 14 41.3% 237
Dallas Keuchel 46.0% 15 46.1% 12

Ian Kennedy went from excellent to horrible. Hendricks went from excellent to excellent. Clearly, being good at shadowing up (I’m working on the terminology, okay? I know this one isn’t great) isn’t necessarily repeatable.

In fact, the year-to-year correlation between shadow rates is awful. It’s a piddling 0.35, which is shocking for something that pitchers can exert so much control over. For comparison’s sake, I’m a big fan of this Matt Klaassen article, which shows year-to-year correlation for many pitching statistics. Shadow rate has the same year-to-year correlation as BABIP allowed. What a disaster.

Okay then. We’ve conclusively established that being good at throwing the ball in the shadow rate in year one tells us as much about the future as suppressing BABIP. That’s awful. Have you heard of defense independent pitching theory? How about FIP? Have you read anything by Voros McCracken? Do you go to FanGraphs at all?

Now we’re at an impasse. Hendricks is really good at throwing the ball in the shadow zone. Locating the ball in the shadow zone is a great way to prevent runs. Great, great, following so far. Being good at throwing the ball in the shadow zone in one year tells us almost nothing about whether a pitcher will do it again next year. I think I have a headache.

Okay, let’s go about this differently. Pitchers are good at this. I’m confident of it! Something, somewhere, is lying. Let’s start peeling away layers until we find some deception, some telltale sign that pitchers really do know where the ball is going.

How about shadow zone pitches in two-strike counts? One thing that could totally be happening, behind the scenes, is that a pitcher who finds himself in a huge number of hitters’ counts might not want to throw in the shadow zone quite so much. Get behind a few times, throw a few pitches over the heart of the plate, and you might find yourself slipping down the shadow leaderboard despite unchanged talent.

The problem with that theory? I ran the query again, this time only in 0-2, 1-2, and 2-2 counts — 3-2 feels like its own animal, and I didn’t want to introduce any extra confusion there. This time, I got a year-to-year correlation of 0.4. It’s better, but still quite poor. The only good news about it is that Hendricks was in the top 10 both years.

New theory: what if 2018 was just a fluke? We’re looking at all this data for 2018 vs. 2019, but there were some weird names on top of the list for 2018. Let’s run it again, this time for 2019 vs. 2020. You’ll be shocked (read: not shocked) to discover that this didn’t solve our problem, either. The correlation was roughly the same, at 0.39. In other words, our issue here isn’t some flukey year. The problem is that knowing that a pitcher was really good at hitting the shadow zone in year one just doesn’t say much about year two. Drat!

If you’re a statistically-minded person, you may have already seen where I’m deceiving you. If you aren’t, though, it’s time to reveal the lie. This will sound like an over-simplification, and to some degree it is, but here’s the deal: year-over-year correlation is a poor way of teasing out subtle skills.

I’ve been quoting you correlation coefficients, and those sound really powerful. Square the correlation coefficient, and you get r-squared, which certainly sounds legit. Per Wikipedia, r-squared is “the proportion of the variance in the dependent variable that is predictable from the independent variable(s).” Predictable! That sounds great. And here I am, telling you that not much of the variance in the dependent variable (pitch location in year two) can be predicted from the independent variable (pitch location in year one).

Here’s the problem: the fainter the signal relative to the noise, the more you’ll see little or nothing when there really is an underlying signal. Imagine a skill with real variation in the population: the worst pitcher is 10% below average at our mystery skill, the best pitcher is 10% above average, and everyone else falls in between in a clean linear fashion. This skill doesn’t change at all from year to year, so if a pitcher is the best at it this year, he’ll also be the best at it next year, by definition.

Here’s the kicker: random variation moves each pitcher’s observed results by many multiples of the skill difference. If you want a tongue-in-cheek example, let’s say we’re measuring fastball velocity, but each pitcher is throwing only once, in a wind tunnel whose power level and direction are determined randomly. Jordan Hicks might throw hard, but if he happened to throw into a 30 mph headwind, he’ll look like a soft tosser. Dan Haren with a monsoon at his back might throw 120 mph, not 88. It’s silly season.

Plenty of baseball skills fall into this category, because baseball is a noisy game. Take line drive rate, for example. From year to year, they’re mostly noise. Max Muncy had the second-lowest line drive rate in baseball last year, a puny 13.8%. In 2019, he checked in at 23.5%, with league average between 21% and 22%. He didn’t suddenly get terrible; the random variation, the noise, is simply much bigger than the signal.

How do you deal with the problem of faint but real signal? Bulk it up! Noise is, by definition, random. Back to our wind tunnel example: if you had each pitcher throw in the tunnel 200 times, or 2,000 times, you’d get a far more accurate reading. Over a multi-year sample, line drive rates really are a thing; it’s no accident that Freddie Freeman checks in at a career 27.8%. That’s why you always hear people talk about small sample sizes — more data means more reliable data — but sample size issues don’t magically disappear with a full year’s sample.

What does that mean for shadow rate? Let’s cut our data into bigger chunks and see what we get. This time, we’ll create two sets: 2015-2017 and 2018-2020. Here’s a similar table to the one from before, where we look at the top pitchers from the first half and see how they did in the second half:

’15-’17 Shadow Zone% Leaders
Player 15-’17 Shadow% 15-’17 Rank 18-’20 Shadow% 18-’20 Rank
Jason Vargas 45.8% 1 41.5% 111
Kyle Hendricks 45.6% 2 46.8% 2
Zach Davies 45.5% 3 46.1% 3
Tommy Milone 45.5% 4 44.9% 12
Alex Wood 45.5% 5 44.8% 15
Noah Syndergaard 45.3% 6 45.4% 9
Josh Tomlin 45.3% 7 42.9% 61
CC Sabathia 45.2% 8 46.0% 5
Aaron Nola 45.2% 9 45.0% 11
Yusmeiro Petit 45.2% 10 44.7% 17

Now we’re talking. Eight of the top 10 players were still in the top 20 over the next three years. The overall correlation is a relatively strong 0.59. In other words, we can predict roughly 35% of the variance in 2018-2020 shadow zone rate by using 2015-2017 shadow zone rate. It’s starting to sound like we would always expect: a skill, not a fluke.

Year-to-year correlation is a useful tool in any analyst’s belt, but it’s worth remembering: not every skill in baseball shows a strong correlation from one year to the next. The noisier the data, the more randomness comes into play, the more observations you’ll need to measure variations in skill level.

I’m not breaking new ground here, to be clear. Projection systems incorporate multi-year data for skills that exist but aren’t obvious in one-year samples. Tom Tango has been writing about hard-to-measure skills for 20 years. Platoon splits are so noisy that you might need half a career’s data to learn that a particular batter is a true lefty (or righty) killer, but those skillsets really do exist — it’s simply hard to tell because the noise is so loud relative to the skill.

My point, I guess, is that you should be really skeptical when someone claims a particular statistic is governed exclusively by randomness. It might take years to see any skill. It frequently does take years to see any skill. Line drive rate, platoon edge, BABIP, the shadow rate we looked at today, and even ERA-FIP gaps are all skill-influenced, even if they fluctuate wildly from one year to the next. The next time someone quotes you a tiny year-to-year correlation, take note: they’re not lying to you when they tell you that number, but if you think it means there’s no skill involved, you’re drawing the wrong conclusion.





Ben is a writer at FanGraphs. He can be found on Twitter @_Ben_Clemens.

26 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Anon
3 years ago

The final table is just a rerun of the prior 18-19 one. Looks like you inserted the wrong table.

Good article BTW