Which Pitcher Stats Have Relevance This Early (With a Note on Clayton Kershaw) by Eno Sarris April 14, 2017 It’s very frustrating to do baseball analysis in the offseason — there’s no actual baseball to analyze! It’s very frustrating to do baseball analysis during spring training — the results don’t matter and we don’t get all the same stats! It’s very frustrating to do baseball analysis in the first few weeks — it’s all a small sample size! The lesson overall? It’s very frustrating to do baseball analysis. But it’s also very rewarding, and so we make a go of it even when we’ve barely completed 10 games of a 162-game season. One thing to which we turn at this point of the season is pitch velocity and movement. My personal sense is that these things become meaningful quickly. Very quickly. While there’s research that has pushed me in that direction, I hadn’t seen work that looked at precisely how quickly movement and velocity stats stabilize, or become meaningful. So I asked Brian Cartwright to run the numbers. First, what Cartwright did was to select players who had at reached a certain sample threshold for a metric. For each, he then selected a random number of events at each step (20, 40, 60…) then cut the sample in half and compared the two halves. This process ensured that, at each step, the exact same players, each with the exact same sample size, were used in the comparison. Then he summed the results for all players, and used a Pearson test to find correlations between each cumulative half. These are random samples, so he then reran each test 20 times before returning to the mean. What you see below is the threshold for each stat at which the stat itself predicts more of the player’s future than the league average (the r-squared is over .5). In other words, the relevant stat becomes meaningful for future projections. It’s not immutable, it may change in many cases, but it’s stable and believable. Stabilization Rates for PITCHf/x Stats Stat Denominator Stable At Sinker Velocity Sinkers 10 Sinker Horizontal Move Sinkers 10 Sinker Vertical Move Sinkers 10 Changeup Velocity Changeups 10 Changeup Horizontal Move Changeups 10 Changeup Vertical Move Changeups 10 Contact% Swings 40 Changeup Contact% Changeup swings 50 Sinker Contact% Sinker swings 70 O-Zone Swing% Pitches outside of zone 120 First Strike First pitches 250 Zone% Pitches 330 SOURCE: 2013 PITCHf/x There are, of course, more than a few caveats with this sort of work. The spread on velocity stats — on the actual, most seen velocity — goes from 88 to 95 or so. A half-tick is meaningful, and so narrowly missing the mark, predictively, might be more meaningful with velocity and movement than it would be with other stats. There’s no guarantee, of course, that a pitcher is certain to maintain the same sinker velocity or contact rate after reaching the sample size at which those two metrics become stable. Rather, what these thresholds signify are points at which the numbers produced by that pitcher so far describe more of the variance going forward than league-average numbers would. In other words, these stats have become meaningful at these (rounded to the nearest ten) benchmarks. For analysis, I treat it as a simple nudge towards the types of stats we can look at in smaller samples, not some sort of bible. It’s also worth mentioning that different methods have produced different results. Jonah Pemstein looked at some of the stats above and found many similar results for velocity and movement and spin rate. But he also found, for example, that pitcher swinging-strike rate stabilized around 440 pitches, which would be closer to 200 swings rather than the 40 swings we found. Why is one different than another? I don’t know why, go ask my brother — or, in this case, my cousin Brian. Also: wow. Jeff Zimmerman once found that one game of velocity readings was meaningful for starting pitchers returning from the disabled list, so maybe we already knew this, but it’s really remarkable to see it in this format. After five in-game sinkers, you can predict more than half of the variance in the remaining sample of sinkers from that player. Five fastballs, and you have a good sense of how hard a pitcher is throwing. Cartwright wanted to point out that it technically even reaches the stable level after one pitch, but I was free to round. That’s probably a good idea, if just to make sure you were looking at five sinkers and you didn’t mistakenly look at a hard changeup in the group. But when Jeff Sullivan reports that Shelby Miller has added more than two ticks, or that Jake Arrieta is down almost three ticks, that stuff is meaningful. Tyler Skaggs is down almost 2 mph, Dan Straily is up a tick-plus. Lance Lynn is down one to two ticks. It’s not the sexiest content, it might get repetitive, but it’s meaningful and backed by the data. Context is important, of course. You want to compare like velocities, not velocities from two different sources. You want to make sure the pitcher hasn’t changed roles. You want to make sure he hasn’t just hit the disabled list. The movement numbers are a little messed up right now, but our best minds are at work recalibrating the different parks for the new pitch-tracking equipment, but perhaps you can expect a post on the most improved changeup of the early going, considering that — calibration issues aside — the actual movement numbers become meaningful super early. But look at contact rate. That might surprise you. A typical pitcher would elicit more than 42 swings in two starts. In fact, all 85 qualified starters have gotten more than 42 swings on their pitches so far. Qualified Starters Leaders & Laggards in Contact Rate Change Name 2017 Contact% 2016 Contact% Difference Clayton Kershaw 82.6% 70.2% 12.4% Jeremy Hellickson 88.1% 77.2% 10.9% Corey Kluber 83.2% 73.9% 9.3% Jhoulys Chacin 89.6% 81.2% 8.4% Patrick Corbin 87.1% 78.8% 8.3% Cole Hamels 81.9% 74.7% 7.2% Yu Darvish 80.3% 73.4% 6.9% Jon Gray 81.7% 75.1% 6.6% Justin Verlander 81.0% 75.9% 5.1% Madison Bumgarner 80.6% 75.9% 4.7% Kendall Graveman 77.3% 83.4% -6.1% Tyler Anderson 71.7% 78.7% -7.0% Alex Cobb 76.6% 83.7% -7.1% Derek Holland 75.0% 83.8% -8.8% Miguel Gonzalez 74.5% 83.5% -9.0% Jeff Samardzija 71.0% 80.9% -9.9% Masahiro Tanaka 67.9% 78.3% -10.4% Danny Duffy 63.6% 74.9% -11.3% Brandon Finnegan 66.7% 78.7% -12.0% Sean Manaea 60.9% 77.1% -16.2% Now it’s my duty to tell you that Clayton Kershaw has had the biggest downturn in contact rate (+12 points) among qualified starting pitchers in baseball this year. He hasn’t lost any velocity, but it’s notable that his slider has lost more than three inches of drop relative to his four-seamer, a relationship which should remain accurate even with the the early-season calibration issues. It seems too early to take this to the bank, especially with the differences in the results surrounding contact rate. Let’s all watch his next start. And those by Jeremy Hellickson (+11 points) and Corey Kluber (+9), too. We’ve already pointed out how great Sean Manaea (tops with -16 points) and Brandon Finnegan (-12) have looked, but next on the list are Danny Duffy (-11) and Masahiro Tanaka (-10). Interesting deviations from their norm, in somewhat believable samples. Maybe this isn’t so frustrating, after all. There are a few things to explore, at least.