Which Pitcher Stats Have Relevance This Early (With a Note on Clayton Kershaw)

April 14, 2017

It’s very frustrating to do baseball analysis in the offseason — there’s no actual baseball to analyze! It’s very frustrating to do baseball analysis during spring training — the results don’t matter and we don’t get all the same stats! It’s very frustrating to do baseball analysis in the first few weeks — it’s all a small sample size! The lesson overall? It’s very frustrating to do baseball analysis.

But it’s also very rewarding, and so we make a go of it even when we’ve barely completed 10 games of a 162-game season. One thing to which we turn at this point of the season is pitch velocity and movement. My personal sense is that these things become meaningful quickly. Very quickly.

While there’s research that has pushed me in that direction, I hadn’t seen work that looked at precisely how quickly movement and velocity stats stabilize, or become meaningful. So I asked Brian Cartwright to run the numbers.

First, what Cartwright did was to select players who had at reached a certain sample threshold for a metric. For each, he then selected a random number of events at each step (20, 40, 60…) then cut the sample in half and compared the two halves. This process ensured that, at each step, the exact same players, each with the exact same sample size, were used in the comparison. Then he summed the results for all players, and used a Pearson test to find correlations between each cumulative half. These are random samples, so he then reran each test 20 times before returning to the mean.

What you see below is the threshold for each stat at which the stat itself predicts more of the player’s future than the league average (the r-squared is over .5). In other words, the relevant stat becomes meaningful for future projections. It’s not immutable, it may change in many cases, but it’s stable and believable.

Stabilization Rates for PITCHf/x Stats

Stat	Denominator	Stable At
Sinker Velocity	Sinkers	10
Sinker Horizontal Move	Sinkers	10
Sinker Vertical Move	Sinkers	10
Changeup Velocity	Changeups	10
Changeup Horizontal Move	Changeups	10
Changeup Vertical Move	Changeups	10
Contact%	Swings	40
Changeup Contact%	Changeup swings	50
Sinker Contact%	Sinker swings	70
O-Zone Swing%	Pitches outside of zone	120
First Strike	First pitches	250
Zone%	Pitches	330

SOURCE: 2013 PITCHf/x

There are, of course, more than a few caveats with this sort of work. The spread on velocity stats — on the actual, most seen velocity — goes from 88 to 95 or so. A half-tick is meaningful, and so narrowly missing the mark, predictively, might be more meaningful with velocity and movement than it would be with other stats.

There’s no guarantee, of course, that a pitcher is certain to maintain the same sinker velocity or contact rate after reaching the sample size at which those two metrics become stable. Rather, what these thresholds signify are points at which the numbers produced by that pitcher so far describe more of the variance going forward than league-average numbers would. In other words, these stats have become meaningful at these (rounded to the nearest ten) benchmarks. For analysis, I treat it as a simple nudge towards the types of stats we can look at in smaller samples, not some sort of bible.

It’s also worth mentioning that different methods have produced different results. Jonah Pemstein looked at some of the stats above and found many similar results for velocity and movement and spin rate. But he also found, for example, that pitcher swinging-strike rate stabilized around 440 pitches, which would be closer to 200 swings rather than the 40 swings we found. Why is one different than another? I don’t know why, go ask my brother — or, in this case, my cousin Brian.

Also: wow. Jeff Zimmerman once found that one game of velocity readings was meaningful for starting pitchers returning from the disabled list, so maybe we already knew this, but it’s really remarkable to see it in this format. After five in-game sinkers, you can predict more than half of the variance in the remaining sample of sinkers from that player. Five fastballs, and you have a good sense of how hard a pitcher is throwing.

Cartwright wanted to point out that it technically even reaches the stable level after one pitch, but I was free to round. That’s probably a good idea, if just to make sure you were looking at five sinkers and you didn’t mistakenly look at a hard changeup in the group.

But when Jeff Sullivan reports that Shelby Miller has added more than two ticks, or that Jake Arrieta is down almost three ticks, that stuff is meaningful. Tyler Skaggs is down almost 2 mph, Dan Straily is up a tick-plus. Lance Lynn is down one to two ticks. It’s not the sexiest content, it might get repetitive, but it’s meaningful and backed by the data.

Context is important, of course. You want to compare like velocities, not velocities from two different sources. You want to make sure the pitcher hasn’t changed roles. You want to make sure he hasn’t just hit the disabled list.

The movement numbers are a little messed up right now, but our best minds are at work recalibrating the different parks for the new pitch-tracking equipment, but perhaps you can expect a post on the most improved changeup of the early going, considering that — calibration issues aside — the actual movement numbers become meaningful super early.

But look at contact rate. That might surprise you. A typical pitcher would elicit more than 42 swings in two starts. In fact, all 85 qualified starters have gotten more than 42 swings on their pitches so far.

Qualified Starters Leaders & Laggards in Contact Rate Change

Name	2017 Contact%	2016 Contact%	Difference
Clayton Kershaw	82.6%	70.2%	12.4%
Jeremy Hellickson	88.1%	77.2%	10.9%
Corey Kluber	83.2%	73.9%	9.3%
Jhoulys Chacin	89.6%	81.2%	8.4%
Patrick Corbin	87.1%	78.8%	8.3%
Cole Hamels	81.9%	74.7%	7.2%
Yu Darvish	80.3%	73.4%	6.9%
Jon Gray	81.7%	75.1%	6.6%
Justin Verlander	81.0%	75.9%	5.1%
Madison Bumgarner	80.6%	75.9%	4.7%
Kendall Graveman	77.3%	83.4%	-6.1%
Tyler Anderson	71.7%	78.7%	-7.0%
Alex Cobb	76.6%	83.7%	-7.1%
Derek Holland	75.0%	83.8%	-8.8%
Miguel Gonzalez	74.5%	83.5%	-9.0%
Jeff Samardzija	71.0%	80.9%	-9.9%
Masahiro Tanaka	67.9%	78.3%	-10.4%
Danny Duffy	63.6%	74.9%	-11.3%
Brandon Finnegan	66.7%	78.7%	-12.0%
Sean Manaea	60.9%	77.1%	-16.2%

Now it’s my duty to tell you that Clayton Kershaw has had the biggest downturn in contact rate (+12 points) among qualified starting pitchers in baseball this year. He hasn’t lost any velocity, but it’s notable that his slider has lost more than three inches of drop relative to his four-seamer, a relationship which should remain accurate even with the the early-season calibration issues.

It seems too early to take this to the bank, especially with the differences in the results surrounding contact rate. Let’s all watch his next start. And those by Jeremy Hellickson (+11 points) and Corey Kluber (+9), too. We’ve already pointed out how great Sean Manaea (tops with -16 points) and Brandon Finnegan (-12) have looked, but next on the list are Danny Duffy (-11) and Masahiro Tanaka (-10). Interesting deviations from their norm, in somewhat believable samples.

Maybe this isn’t so frustrating, after all. There are a few things to explore, at least.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG