Basic Pitching Metric Correlation 1955-2012, 2002-2012

Last week, I took a look at year-to-year correlations for hitting metrics. This post follows up by doing the same thing with pitching metrics. Here, with a bit of commentary, are the results.

I am not presenting this as terrible original or groundbreaking. I do think this can at least be a helpful reference. I made a number of qualifications in the post on hitters, so if you have not read that post, I recommend you take a look at it.

Some metrics correlate better than others, but that does not necessarily tell us that one is “better” than the other, as I have included various metrics that have different uses. It is more precise to write that this tells us about relative sample size in relation to true talent. A metric with lower year-to-year correlation likely needs more regression to the mean when we are trying to estimate a player’s true talent. Finally, keep in mind that year-to-year correlation is not necessarily the only or always the best way to establish this sort of thing. It is, however, relatively easy to do for a basic study.

I neglected to include even a basic explanation correlation in last week’s post, probably because I typically assume I am the least mathematically knowledgeable person in any group. Better explanations are certainly Google-able, but for the lazy and non-picky: the range of possibilities for correlation is between 1 and -1. Results closer to 1 or -1 indicate that the two sets of numbers are strongly correlated (in the case of negative numbers, inversely correlated). Results closer to 0 indicates less of a relationship.

Deciding on what limits to put on my data was a bit more complicated with pitchers than with hitters. Last week I simply used batter seasons with at least 400 plate appearances. For pitchers, part of the problem was that many of them switch roles in and between seasons. We know that relieving almost always improves a pitcher’s across-the-board performance, so if a pitcher sees significant time relieving one season and not another, it could mess with the sample.

Without going through all my thought processes and options I considered, I ended up simply setting the minimum innings for single season at 140, and placing a strict limit on the relative proportion of non-starting appearances the pitcher made. Yes, that means that for all practical purposes, this is about starting pitchers, but we already know that relievers present their own set of issues (primarily around sample size). I wanted to keep this basic without shrinking my sample by excluding starters who made a few relief appearances during a season. As I did last week, I matched on team to mitigate the effect home park switches.

People naturally will want to compare the correlations of metrics held in common between this post on pitchers and the hitters post from last week. That is fine, but be cautious. It is not as if the samples are exactly equivalent — I somewhat arbitrarily picked the minimums (400 plate apparances for hitters, 140 innings pitched for pitchers), so it is not as if they are mathematically equivalent.

Okay, let’s look at some numbers, starting with some basic metrics from 1955 through 2012. As with last week, some of these metrics will look redundant, but I wanted to see how much difference the small differences make. “TBF” is “total batters faced,” the pitcher equivalent of plate appearances.

Pitching Metric Year-to-Year Correlation
SO/9 0.829
SO/TBF 0.823
uBB/TBF 0.721
BB/9 0.688
K/BB 0.674
FIP 0.620
HBP/TBF 0.513
HR/(TBF-BB-HBP-SO) 0.480
WP/TBF 0.474
HR/TBF 0.471
HR/9 0.470
WHIP 0.442
ERA 0.409
IBB/TBF 0.382
BABIP 0.351
LOB% 0.226

For some, this will contain few surprises. As we found with hitters, so with pitchers — strikeouts are the most consistent year-to-year. I guess it all starts with making contact (or some other empty platitude). However, while home runs followed strikeouts pretty closely in correlation for hitters, that is not the case for pitchers. This sort of thing is sometimes parsed as meaning hitters have “more control” over home runs (or whatever event) than pitchers, but that is not quite correct. Hopefully, I will explain this in a way that is not too distorting or confusing: it is more accurate to say (and this requires a different and more complex sort of mathematical investigation to demonstrate; certainly it is above my pay grade) that there is less variation in skill between pitchers than for hitters in this respect. In a given match up, the hitter and the pitcher equally contribute to the possibility of, say, a home run happening on contact. However, there is probably more variation in this skill between hitters than between pitchers. That shows up here as less year-to-year correlation for pitchers.

FIP correlating better year-to-year than ERA or WHIP is pretty much what we would expect, and seeing strikeouts, walks, home runs, and BABIP broken down separately just highlights that on the component level. Wild pitches and hit by pitches per plate appearance correlating more strongly that BABIP is at least kind of funny to me for some reason.

I wanted to get a nice big sample for the basic metrics, and thus went back to 1955 (the data set has some gaps before that), but I also wanted to compare more recently available metrics involving stuff like batted ball data to each other and the older metrics. As with last week, these results should not be taken as a commentary on their quality either way. As with last week, the different sample naturally means that the results for some metrics found in both tables will be different. Here are the 2002-2012 results.

Pitching Metric Year-to-Year Correlation
GB/FB 0.871
GB% 0.839
FB% 0.817
SwStr% 0.804
SO/TBF 0.803
SO/9 0.803
Contact% 0.789
O-Contact% 0.782
Swing% 0.747
Zone% 0.744
O-Swing% 0.730
uBB/TBF 0.711
Z-Swing% 0.701
xFIP 0.699
BB/9 0.692
Z-Contact% 0.664
F-Strike% 0.663
K/BB 0.630
tRA 0.589
FIP 0.584
WP/TBF 0.458
WHIP 0.430
IFFB% 0.422
HBP/TBF 0.404
HR/TBF 0.390
HR/9 0.390
ERA 0.373
IBB/TBF 0.358
HR/(TBF-BB-HBP-SO) 0.349
LOB% 0.238
BABIP 0.235
LD% 0.088
HR/FB -0.029

In what might be a bit of an upset, strikeout and contact rates are dethroned by not just ground ball and fly ball rate, but by ground ball/fly ball ratio. A lot of people love swinging strike percentage, and stuff like this shows why, although further steps would be needed to show whether and how swinging strike rate would help better predict strikeout rate. Another point of interest might be while ground ball and fly ball rates correlating more strongly for pitchers than we found for hitters, while line drive rate correlates even less strongly — indeed, it can barely be said to correlate at all for pitchers. Much has been written about this sort of issue, I will simply ask that people remember the distinction discussed above between the issues of how much variation there is in a population and how much control each participant in a plate appearance has over the result.

The other metric that seems to basically not correlate at all is home runs per fly ball. This is another issue that has been long discussed, and is also the reason Studes came up with xFIP and its high correlation relative to other pitching metrics. In addition to the control issue, remember that just because something does not correlate does not necessarily mean no control is involved. Correlation is just one way of trying to get at this issue. Furthermore, just about everything measurable in baseball involves some skill, there is just more variation between players than others. Some individuals may have this skill, but we simply cannot measure it (yet?) in the population as a whole.

This is a simple study, but there is much more that can be said and discussed. I hope that these findings are helpful and interesting to some.

Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.

Newest Most Voted
Inline Feedbacks
View all comments
9 years ago

Can you also report on the correlation for BB/K and GB/FB (and compare them to K/BB and FB/GB, respectively)?

9 years ago
Reply to  Tangotiger

Tango- Would you mind explaining why you are interested in those differences (or what this comparison will tell us)? Just curious. Thanks

9 years ago
Reply to  Nayr

It’s the ratio v rate discussion that I have all the time. What you choose as the denominator makes a huge difference, especially if it’s something that’s at a 10:1 ratio (or 1:10).

This is why you should not do a/b but a/(a+b). Because correlating a/(a+b) to a/(a+b) in two different sets will give you IDENTICAL results to correlating b/(a+b) to b/(a+b) in two different data sets.

But, a/b to a/b and b/a to b/a will not give you the same results, especially if a:b ratio is extreme.

9 years ago
Reply to  Nayr