Pitcher zStats at the Quarter-Mark

May 21, 2021

Not everyone is interested in projecting the future, but one common thread in much of modern analytics in this regard is the attempt to describe a volatile thing, such as a play in baseball, using something less volatile, such as an underlying ability. This era arguably began with Voros McCracken’s DIPS research that he released 20 years ago to a wider audience than just us usenet dorks. Voros’ thesis has been modified with new information, and people tend to say (mistakenly) that he was arguing that pitchers had no control over balls in play, but DIPS and BABIP changed how we looked at pitcher/defense interaction more than any peripheral-type of number preceding it.

One of the things I want to try to project is what types of performance lead to the so-called Three True Outcomes (home run, walk, strikeout) rather than just tallying those outcomes. For example, what type of performances lead to strikeouts? I’m not just talking about velocity and stuff, but the batter-pitcher interactions at the plate — things like a pitcher’s contact percentage, which for pitchers with 100 batters faced in consecutive years from 2002 has a similar or greater r^2 to itself (0.53) than either walk rate (0.26) or strikeout rate (0.51) does. Contact rate alone has an r^2 of 0.37 when comparing it to the future strikeout rate.

As it turns out, you can explain actual strikeout rate from this synthetic estimate quite accurately, with an r^2 in the low 0.8 range.

Statcast era data works slightly better; the version of zSO which has that data is at 0.84, and the one that predates Statcast data is at 0.80. Cross-validating using repeated random subsampling (our data is limited, as there’s no “other” MLB to compare it to) yields the same results.

Like the various x measures in Statcast, these numbers shouldn’t be taken as projections in themselves. While zSO projects future strikeout rate slightly more accurately than the actual rate itself does, a mixture of both gets a better r^2 (0.59 for the sample outlined above) than either does on its own. Looking at zSO alone as a useful leading indicator, however, gives us an idea of which players may be outperforming or underperforming their strikeout rates so far this season. All numbers are through Wednesday night.

zSO Underperformers

Name	SO	zSO	Diff
Matthew Boyd	37	48.9	11.9
Zach Plesac	37	48.1	11.1
John Means	59	69.4	10.4
Matt Shoemaker	25	34.2	9.2
Julio Urías	60	69.2	9.2
Joe Ross	33	41.7	8.7
Josh Tomlin	14	22.5	8.5
Michael Fulmer	29	37.5	8.5
Kyle Gibson	44	52.1	8.1
Joel Payamps	10	18.0	8.0
Chase Anderson	26	33.9	7.9
Sandy Alcantara	51	58.6	7.6
Wandy Peralta	15	22.6	7.6
Bryse Wilson	11	18.6	7.6
J.P. Feyereisen	20	27.3	7.3

zSO Overperformers

Name	SO	zSO	Diff
Trevor Bauer	77	62.4	-14.6
Cristian Javier	53	38.5	-14.5
Blake Snell	60	49.8	-10.2
Zach Eflin	57	46.8	-10.2
Steven Matz	46	36.4	-9.6
Gerrit Cole	85	75.7	-9.3
Nick Pivetta	42	33.1	-8.9
Tyler Matzek	27	18.3	-8.7
Adrian Houser	34	25.5	-8.5
Drew Steckenrider	21	12.7	-8.3
David Peterson	46	37.7	-8.3
Caleb Thielbar	28	19.8	-8.2
Jake McGee	26	17.8	-8.2
Lance Lynn	46	37.8	-8.2
Eduardo Rodriguez	48	40.0	-8.0

For me, John Means is the most interesting name on this list, in that while he seems to be performing over his head on an overall basis (1.70 ERA versus a FIP just over 3.24), there may be strikeouts left to gain. He’s not a particularly hard thrower in terms of overall velocity — though he can hit the mid-90s far more than he did in his rookie season — but his swinging-strike and overall contact rates are right at the back of the top 10 in baseball among qualifying pitchers. You can also see why the Yankees were interesting in acquiring Wandy Peralta a few weeks ago.

On the flip side, zSO sees Steven Matz’s strikeout rate coming back to earth. Matz throws harder than many appreciate, but he’s also not a particularly good swing-and-miss guy. Gerrit Cole’s strikeout rate also comes down in this measure, though in his case, “merely” to 12 strikeouts a game.

zBB Underperformers

Name	BB	zBB	Diff
Justin Dunn	22	13.0	-9.0
Lucas Giolito	21	12.7	-8.3
Nick Pivetta	22	13.9	-8.1
Triston McKenzie	25	17.0	-8.0
Trevor Williams	18	10.5	-7.5
John Gant	28	20.9	-7.1
Jose Quintana	19	11.9	-7.1
Corey Kluber	20	13.2	-6.8
Joe Jiménez	9	2.3	-6.7
Kenley Jansen	15	8.3	-6.7
Mitch Keller	20	14.1	-5.9
Nick Neidert	11	5.2	-5.8
Austin Gomber	22	16.4	-5.6
Erick Fedde	18	12.4	-5.6
Tyler Webb	14	8.6	-5.4

zBB Overperformers

Name	BB	zBB	Diff
Gerrit Cole	5	14.7	9.7
Zach Eflin	5	12.6	7.6
Max Scherzer	12	18.7	6.7
Corbin Burnes	2	8.7	6.7
Cristian Javier	15	21.6	6.6
Matt Peacock	3	8.8	5.8
Clay Holmes	6	11.8	5.8
Chad Green	3	8.6	5.6
Aaron Nola	9	14.5	5.5
Jack Flaherty	17	22.5	5.5
Yu Darvish	13	18.5	5.5
Johnny Cueto	5	10.4	5.4
Craig Stammen	4	9.2	5.2
Walker Buehler	7	12.2	5.2
Antonio Senzatela	12	17.2	5.2

As simplistic as it sounds, a lot of avoiding walks is simply getting off to 0–1 counts instead of 1–0. The percentage of times a pitcher gets off a first strike has a significantly stronger relationship to walk rate (r^2 of 0.32) than something more traditionally associated with walks, zone rate (0.05). I suspect this is one of those instances, such as clutch performance, in which people too liberally apply the lesson of lower levels of baseball to the majors. To a 12-year-old, avoiding walks is basically just being good at throwing in the strike zone. At the major league level, every pitcher can hit the strike zone (except maybe Brad Pennington), and it becomes more a battle for initiative and timing.

Lucas Giolito isn’t having anywhere near as strong a season as he had the last two years, but zBB doesn’t think it has much to do with his walks continuing to drift in a negative direction. ZiPS is more negative than its very optimistic projection at the start of the season, but that’s more due to strikeout rate decline where ZiPS saw the potential for improvement. I’m a little perturbed that zBB dares to give Corbin Burnes a whopping eight walks, but that’s still the best walk rate in baseball.

Of note, just missing the list of overperformers is poor Sam Selman, who walked nearly 20% of the batters he faced over five appearances. zBB thinks he should actually have done worse, with eight walks. This is a small sample, but zBB was quite cruel in this instance.

zHR Underperformers

Name	HR	zHR	Diff
Cristian Javier	7	0.9	-6.1
Robbie Ray	11	5.3	-5.7
Kyle Hendricks	11	5.9	-5.1
Jorge López	8	3.4	-4.6
Adbert Alzolay	8	3.7	-4.3
Patrick Corbin	10	5.9	-4.1
Tarik Skubal	12	8.1	-3.9
Yusei Kikuchi	9	5.2	-3.8
Logan Allen	7	3.2	-3.8
Buck Farmer	6	2.3	-3.7
Griffin Canning	7	3.4	-3.6
Adam Wainwright	9	5.4	-3.6
Matt Shoemaker	10	6.5	-3.5
Dean Kremer	8	4.5	-3.5
Jameson Taillon	9	5.6	-3.4

zHR Overperformers

Name	HR	zHR	Diff
José Ureña	2	6.2	4.2
Trevor Rogers	3	6.8	3.8
Carlos Martinez	2	5.7	3.7
Zac Gallen	1	4.6	3.6
Madison Bumgarner	6	9.2	3.2
Corey Kluber	4	7.1	3.1
Matthew Boyd	2	5.1	3.1
Nathan Eovaldi	0	3.1	3.1
Nick Pivetta	3	6.1	3.1
Martín Pérez	2	5.0	3.0
Taijuan Walker	1	4.0	3.0
César Valdez	0	2.9	2.9
Scott Barlow	0	2.8	2.8
Dane Dunning	2	4.7	2.7
Will Vest	1	3.7	2.7

Projecting home runs for pitchers is notoriously difficult and will remain so. Without knowing each individual hit — I’m not trying to outStatcast Statcast — I can only get a pitcher’s zHR rate’s r^2 to around the 0.4 range. The usual stuff correlates here: hitting the ball hard and hitting the ball in the air, with more minor roles for things like velocity and pull tendency. The percentage of non–four-seam fastballs going up also tends to suppress home run rates. But these will always be hard to predict.

zFIP Underperformers

Name	ERA	zFIP	ER	zER	Diff
Luis Castillo	7.44	3.07	35	14.4	-20.6
Dylan Bundy	6.02	3.21	29	15.5	-13.5
José Quintana	8.53	3.96	24	11.1	-12.9
Jorge López	6.35	3.11	24	11.8	-12.2
Nathan Eovaldi	4.50	2.53	25	14.1	-10.9
Germán Márquez	5.56	3.49	28	17.6	-10.4
Yency Almonte	11.77	4.58	17	6.6	-10.4
Daniel Lynch	15.75	4.68	14	4.2	-9.8
Trevor Cahill	6.81	4.40	27	17.4	-9.6
Mitch Keller	7.16	4.57	26	16.6	-9.4
Patrick Corbin	6.10	4.08	28	18.7	-9.3
Jose De Leon	8.35	3.92	17	8.0	-9.0
Trevor Williams	6.27	3.84	23	14.1	-8.9
Joey Lucchesi	9.19	4.14	16	7.2	-8.8
Andrew Heaney	5.31	3.31	23	14.3	-8.7

zFIP Overperformers

Name	ERA	zFIP	ER	zER	Diff
Yu Darvish	1.81	3.57	11	21.7	10.7
Lance Lynn	1.55	3.89	7	17.6	10.6
Anthony DeSclafani	2.03	3.75	12	22.2	10.2
Trevor Rogers	1.74	3.49	10	20.0	10.0
Ryan Weathers	1.37	4.73	4	13.8	9.8
Trevor Bauer	2.20	3.72	14	23.7	9.7
Sam Selman	7.36	29.95	3	12.2	9.2
John Gant	2.04	4.10	9	18.1	9.1
Aaron Civale	3.30	4.61	22	30.8	8.8
John Means	1.70	3.03	11	19.7	8.7
Kevin Gausman	1.66	2.90	11	19.3	8.3
Jack Flaherty	2.53	3.90	15	23.1	8.1
Alex Reyes	0.39	3.56	1	9.1	8.1
Tyler Rogers	0.70	3.46	2	9.9	7.9
Kyle Gibson	2.32	3.55	14	21.5	7.5

(Poor Sam Selman.)

Combine it all together and you get zFIP, which is more stable long-term and projects itself better than either ERA or FIP do. zFIP shouldn’t be used alone to make projections, but by itself, it projects the future as well as SIERA does, with year-to-year r^2 in the 0.16 to 0.20 range. Of interest is that zFIP projects future FIP better than actual FIP does. To my mild surprise, it also projects Statcast’s future xERA better than actual xERA does.

Can these numbers be used as projections by themselves? You can, but it’s not recommended. Actual results do help us project future results more accurately when they’re part of the equation. But numbers like these should be looked at as leading indicators of what’s to come.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG