Year-to-Year Predictability of Pitcher Ball-in-Play Data by Tony Blengino November 19, 2015 The introduction of batted-ball data, first to the clubs and then to the public, certainly has caused a revolution in player evaluation. While the entirety of HITf/x and now Statcast data isn’t likely to be available to the masses anytime soon, the portions that are, including fairly complete PITCHf/x data, have changed the way fans, analysts, club personnel, and, yes, even players look at the game. As I have often written on these pages, this data needs to be placed into context to be fully understood. There are ongoing issues with data capture, and the simple fact that not all hard or softly hit baseballs are created equal adds levels of nuance that must be understood before meaningful conclusions can be drawn. Another concern expressed by many is the uncertain predictive value of the batted-ball data, particularly with regard to pitchers. Today, let’s take a look at how this data correlates from year to year, from the pitcher’s perspective. It’s been fairly well established over the years that a “ground ball” or “pop up” pitcher is a real thing: ball-in-play (BIP) type frequencies correlate quite well from year to year. The same applies to strikeout (K) and walk (BB) rates, both from the hitter and pitcher’s perspective. How about batted-ball authority? To examine this issue, I identified the 45 starting pitchers who qualified for the ERA title in either league in both 2014 and 2015. (Players who were traded mid-year and qualified overall but not in either league specifically were omitted.) For each of these pitchers, the following statistics were scaled to 100 both for the 2014 and 2015 seaosns, with correlation coefficients calculated thereafter. First, the rate stats: Strikeout rate Walk rate Pop-up rate Fly-ball rate Line-drive rate Ground-ball rate Then metrics concerning projected production allowed (based on BIP authority): Fly ball/line drive combined Ground ball authority All ball-in-play authority And, finally, runs-allowed measures/estimators: Earned run average Fielding independent pitching “Tru” ERA (based on BIP frequency and authority) One note on the middle line item above. Fly balls and line drives were combined when calculating and scaling the Projected Production Allowed data. This was done due to the different manner in which was data was captured in 2014 (Sportvision) and 2015 (Statcast). In 2014, balls in play were classified as fly balls or line drives based on available vertical exit angle data. In 2015, such data was not available, so Statcast’s subjective classifications of BIP as either fly balls or liners had to be accepted. The usage of relative BIP authority data, scaled to 100, was a necessity, as the average velocities under the SportVision and Statcast systems are drastically different. Statcast didn’t get readings on over 23% of batted balls in 2015, and most of them were weakly hit; Sportvision’s “null” group was much smaller, close to 5%. Combined with the relatively “hot” measurements of the new system, the average velocity of a Sportvision batted ball is in the upper 70s, while Statcast’s is around 90. There is no perfect method to compare data between the two systems, but the best course is to project individual pitcher performance based on average MLB production given each pitcher’s BIP frequency and authority mix, and then scale it to 100. Here then, are the correlation coefficients for the aforementioned statistical categories for the 45 2014-15 ERA-qualifying starting pitchers. Correlation Coefficients, 2014-15 ERA Qualifiers Metric Coefficient K% 0.81 BB% 0.66 Pop% 0.53 Fly% 0.76 LD% 0.14 GB% 0.86 FL/LD 0.37 GB AUTH 0.25 BIP AUTH 0.37 ERA 0.45 FIP 0.65 TRU ERA 0.72 Without getting too deep, a 100% correlation (1.00 in the above table) is obtained when the two sets of data are totally identical. The closer to 1.00, the higher degree of correlation between the two data sets. Among the above statistics, these 45 pitchers’ grounder rates correlated the most closely from 2014 to 2015, with a 0.86 correlation coefficient. You’ll notice that the frequency statistics, with the exception of line drive rate (0.14), all correlate at 0.50 or higher. Now look at the statistics that measure BIP authority: fly ball/line drive and overall BIP authority (which relies in part on BIP frequency mix) correlate at 0.37, and ground ball authority correlates at 0.25. While these marks are clearly much lower than the frequency figures, 0.37 in particular represents at least a modest degree of correlation. Lastly, look at the three measures of run prevention. ERA, the old warhorse of these statistics, correlates the least from year to year (0.45 coefficient). The new run prevention statistic of choice, FIP, correlates quite a bit more from year to year, at 0.65, while my pet stat, “tru” ERA, which incorporates relative BIP authority into the mix, correlates even better at 0.72. Bottom line, the lower the correlation coefficient, the more random the statistic, the less that “true talent” can be credited or blamed for player performance. Line drive rate allowed is quite random, and largely based on luck. BIP authority management has some randomness to it, but there is a degree of true talent involved. BIP frequency management, excepting line drives, is heavily tied to true talent. Having established these correlations, tomorrow I’ll examine a collection of free agent pitchers and what their own ball-in-play data reveal about their respective futures.