Recently, one of the hot topics in baseball statistics has been the appearance of a measurement for hard-hit balls: here at FanGraphs, we added hard-hit rate to our leaderboards before this season, adding along with it a wealth of opportunities for analysis. An issue with any new statistic is that it can be cited without fully knowing its true use or impacts, and so hard-hit rate has been making the rounds in player analysis, generally cited in respect to how well or how poorly they have been performing.
For hitters, it might go without saying that hitting the ball harder is generally a good thing: the aim of hitting, in a certain sense, would seem to be to hit the ball as hard as possible as often as you can (except in the cases of bunting or other situational circumstances). However, it hasn’t been clear yet how hitting the ball hard impacts other rate and counting statistics, and that seems to be a hole in our understanding of a statistic that is undergoing a moment in the spotlight.
The aim today is, at the very least, to explore how hard-hit rate impacts a few of those stats, as well as to begin a conversation that more astute statistical minds may be able to take to deeper and exciting places. There are a couple levels to this piece today, but there are surely many more that I have not reached: I don’t intend to make hard conclusions, but rather to explore and provide a well-intentioned foray into the data. With that said, onward.
To begin, we should remind ourselves of some research that has already been done on these digital pages: year-to-year correlation of hard-hit rate among pitchers and batters was the subject of this piece. A brief summary of the results from that: hard-hit rate seems to be a skill for batters, but not so much for pitchers. That’s good news for us today, as we’ll strictly be looking at hitting statistics. Tony Blegino’s treatises on batted-ball data should also be seen as a preface.
Today we’ll be looking at all of the hard-hit data we have before this year: 2002-2014. One thing that should be noted: hard-hit rate was graded visually prior to 2010, whereas afterward it was graded by batted-ball type, hangtime, and distance. Further studies might look into the possibility of changes between pre-2010 data and post-2010 data, but I’m currently assuming they’re about on the same level. The sample size we’ll be looking at is qualified batters for each season. Because of the large sample (almost 2,000 data points), we should understand that our p-values associated with these models will be very low. We should primarily look to R-squared values, listed on the charts, to analyze the strength of the correlations.
We’ll start with HR/FB%. Let’s take a look at the relationship between hard-hit rate and the percentage of fly balls that go for home runs:
There is certainly something here, even though our r-squared shows that using hard-hit rate as a predictive statistic for increased home run/fly ball rate could be problematic. Another interesting thing to note: we could fit an exponential regression onto this dataset to slightly increase our predictive ability, but that also brings with it a few other issues — I went with a simple model instead. In case you were wondering, those three outliers at the top of the scatter are, from left to right, Jim Thome (2002) and Ryan Howard (2006 & 2007).
Next up we have ISO. Let’s check out the scatter of ISO vs. hard-hit rate:
Again, we have a significant relationship here, this time between an increased hard-hit rate and increased ISO. However, we can also see that it is again going to be problematic to use the relationship as a predictive tool, which comes up often when looking at one baseball statistic in relation to another. Just like it is difficult to use a singular statistic as a way of measuring a player’s entire worth (the reason we have metrics), there are simply too many variables beyond how hard a hitter hits a ball to adopt it as a concrete way of predicting outcomes in relation to another statistic. That doesn’t mean we can’t identify relationships that exist, however, as one seems to here.
Now let’s try a more traditional statistic, and one that is at least related to ISO: slugging. Does increased hard-hit rate correlate to increased slugging?
The explained variance is lower than ISO, as we might expect, as slugging is a noisier statistic than ISO. Still, there’s a relationship, and this gives us some hope that we might find at least some correlation to hard-hit rate for other traditional statistics in further studies.
Finally, we’ll look at a catch all, and the correlation that might be the most interesting to us when evaluating overall offensive performance in relation to hard-hit rate: wRC+. Does hitting the ball hard more often lead to higher overall offensive performance?
Once again we find that there is a relationship, but only just under 40% of the variance is explained by our model. Given the large sample and type of data, expecting very high R-squared values is probably not the best hope for us, and after staring at this for many hours, I’m happier and happier with a marginal victory between hard-hit rate and a metric that measures a larger idea of performance.
A final one, mostly for the sake of doing so: this pertains to the belief that hard-hit rate might mean that hitters are showing a propensity toward having more line drives in their batted-ball profile. Is that true? Let’s take a look:
Someone took a shotgun to the chart. Hitters with under 15% hard-hit rates can post line-drive rates of 26%, just as hitters with 45% hard-hit rates can post line drive rates of 17%. There seems to be effectively zero predictive value to using hard-hit rate in this way. Although it might seem obvious, hard-hit rate should not be confused with line-drive rate in any way, as ground balls and fly balls can be hit hard, just as line drives can be hit softly.
A Final Issue
One point that has to be included when looking into the data we have is the high variability in league averages between different years of hard-hit rate. We can see jumps of almost 6 percentage points in back-to-back years, making me wonder whether a league adjustment could be beneficial due to the possibility of outside influences. Take a look at the difference in the jumps in hard-hit rate between seasons for our data set compared to line-drive rate (with scale adjusted for proper comparison):
Line-drive rate, much like other rate stats we use, is fairly stable between years; hard-hit rate does not seem to share that trait (at least in the data we have). With that in mind, I’ve performed a league adjustment for each season of both hard-hit rate and the statistics/metrics we’ve compared it to (it does not include park adjustments), to effectively create “Hard-Hit Rate+”, “ISO+”, “HR/FB%+”, etc. for each player’s individual seasons.
This, though it may be quick and dirty, puts each season’s performance in the frame of the rest of the league for that particular year. I then reran our regression models for each comparison with this data to find out if our correlations would get stronger. In the interests of saving space and not including each scatter plot again, I’ve created a table with each R-squared value of both the non-adjusted and adjusted data sets. Here are the findings:
|Non-Adjusted R-squared||League Adjusted R-squared|
Here we see much stronger correlations, with the percent of the variance in our data explained rising by significant margins. We can still debate the level at which we want to accept our R-squared values as significant, but smoothing out some of those large fluctuations inherent in year-to-year hard-hit rate data seemed to provide us with better final correlations. A lot of baseball data is inherently noisy, with large and random variance: this makes it fundamentally difficult to apply statistical rules to it that we might use for data sets in other fields. All in all, I’m actually surprised at the moderate strength of how hard-hit data correlates to other statistics, given the large sample size.
There is certainly room for more study with respect to how hard-hit rate influences other aspects of the offensive game. I hope this preliminary foray — this balestra, if you will — has provided at the very least food for thought and discussion. If I have made any statistical errors in my analysis, I apologize, and know that it was with the purest of intentions that I set out to look into this topic. Given the overlaps and noise inherent to comparing sets of baseball statistics and data, we’ve found some interesting and meaningful correlations here. Preliminarily, we know that hard-hit rate’s impacts may be what we expected, and perhaps what we didn’t as well.
Owen Watson writes for FanGraphs and The Hardball Times. Follow him on Twitter @ohwatson.