Last week, I took a look at year-to-year correlations for hitting metrics. This post follows up by doing the same thing with pitching metrics. Here, with a bit of commentary, are the results.
I am not presenting this as terrible original or groundbreaking. I do think this can at least be a helpful reference. I made a number of qualifications in the post on hitters, so if you have not read that post, I recommend you take a look at it.
Some metrics correlate better than others, but that does not necessarily tell us that one is “better” than the other, as I have included various metrics that have different uses. It is more precise to write that this tells us about relative sample size in relation to true talent. A metric with lower year-to-year correlation likely needs more regression to the mean when we are trying to estimate a player’s true talent. Finally, keep in mind that year-to-year correlation is not necessarily the only or always the best way to establish this sort of thing. It is, however, relatively easy to do for a basic study.
I neglected to include even a basic explanation correlation in last week’s post, probably because I typically assume I am the least mathematically knowledgeable person in any group. Better explanations are certainly Google-able, but for the lazy and non-picky: the range of possibilities for correlation is between 1 and -1. Results closer to 1 or -1 indicate that the two sets of numbers are strongly correlated (in the case of negative numbers, inversely correlated). Results closer to 0 indicates less of a relationship.
Deciding on what limits to put on my data was a bit more complicated with pitchers than with hitters. Last week I simply used batter seasons with at least 400 plate appearances. For pitchers, part of the problem was that many of them switch roles in and between seasons. We know that relieving almost always improves a pitcher’s across-the-board performance, so if a pitcher sees significant time relieving one season and not another, it could mess with the sample.
Without going through all my thought processes and options I considered, I ended up simply setting the minimum innings for single season at 140, and placing a strict limit on the relative proportion of non-starting appearances the pitcher made. Yes, that means that for all practical purposes, this is about starting pitchers, but we already know that relievers present their own set of issues (primarily around sample size). I wanted to keep this basic without shrinking my sample by excluding starters who made a few relief appearances during a season. As I did last week, I matched on team to mitigate the effect home park switches.
People naturally will want to compare the correlations of metrics held in common between this post on pitchers and the hitters post from last week. That is fine, but be cautious. It is not as if the samples are exactly equivalent — I somewhat arbitrarily picked the minimums (400 plate apparances for hitters, 140 innings pitched for pitchers), so it is not as if they are mathematically equivalent.
Okay, let’s look at some numbers, starting with some basic metrics from 1955 through 2012. As with last week, some of these metrics will look redundant, but I wanted to see how much difference the small differences make. “TBF” is “total batters faced,” the pitcher equivalent of plate appearances.
|Pitching Metric||Year-to-Year Correlation
For some, this will contain few surprises. As we found with hitters, so with pitchers — strikeouts are the most consistent year-to-year. I guess it all starts with making contact (or some other empty platitude). However, while home runs followed strikeouts pretty closely in correlation for hitters, that is not the case for pitchers. This sort of thing is sometimes parsed as meaning hitters have “more control” over home runs (or whatever event) than pitchers, but that is not quite correct. Hopefully, I will explain this in a way that is not too distorting or confusing: it is more accurate to say (and this requires a different and more complex sort of mathematical investigation to demonstrate; certainly it is above my pay grade) that there is less variation in skill between pitchers than for hitters in this respect. In a given match up, the hitter and the pitcher equally contribute to the possibility of, say, a home run happening on contact. However, there is probably more variation in this skill between hitters than between pitchers. That shows up here as less year-to-year correlation for pitchers.
FIP correlating better year-to-year than ERA or WHIP is pretty much what we would expect, and seeing strikeouts, walks, home runs, and BABIP broken down separately just highlights that on the component level. Wild pitches and hit by pitches per plate appearance correlating more strongly that BABIP is at least kind of funny to me for some reason.
I wanted to get a nice big sample for the basic metrics, and thus went back to 1955 (the data set has some gaps before that), but I also wanted to compare more recently available metrics involving stuff like batted ball data to each other and the older metrics. As with last week, these results should not be taken as a commentary on their quality either way. As with last week, the different sample naturally means that the results for some metrics found in both tables will be different. Here are the 2002-2012 results.
|Pitching Metric||Year-to-Year Correlation
In what might be a bit of an upset, strikeout and contact rates are dethroned by not just ground ball and fly ball rate, but by ground ball/fly ball ratio. A lot of people love swinging strike percentage, and stuff like this shows why, although further steps would be needed to show whether and how swinging strike rate would help better predict strikeout rate. Another point of interest might be while ground ball and fly ball rates correlating more strongly for pitchers than we found for hitters, while line drive rate correlates even less strongly — indeed, it can barely be said to correlate at all for pitchers. Much has been written about this sort of issue, I will simply ask that people remember the distinction discussed above between the issues of how much variation there is in a population and how much control each participant in a plate appearance has over the result.
The other metric that seems to basically not correlate at all is home runs per fly ball. This is another issue that has been long discussed, and is also the reason Studes came up with xFIP and its high correlation relative to other pitching metrics. In addition to the control issue, remember that just because something does not correlate does not necessarily mean no control is involved. Correlation is just one way of trying to get at this issue. Furthermore, just about everything measurable in baseball involves some skill, there is just more variation between players than others. Some individuals may have this skill, but we simply cannot measure it (yet?) in the population as a whole.
This is a simple study, but there is much more that can be said and discussed. I hope that these findings are helpful and interesting to some.
Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.