Predicting 2012’s Strikeout Improvements

If heteroscedasticity lasts longer than three hours, consult your physician immediately.

“Honey, I think I’ve got heteroscedasticity,” I said to my wife when she walked in the door. As a writer who works at home, I spend the majority of my time locked away in my windowless home office, concocting ways to frighten my dear wife who works all day.

“And it’s ruining my spreadsheets,” I finally added, after she had stood wide-eyed and wordless for a few moments.

On Tuesday, we examined the fantastic and bizarre case of rookie right-hander Jeremy Hellickson, whose high swinging-strike rate has not translated into an equally high strikeout rate (K%). Today, let’s expand the scope of that investigation.

As we see above, the relationship between K% and swinging-strike rates is an interesting one. The above data includes pitchers — both starters and relievers — over the last four years who have pitched at least 150 innings.

I imagine readers with only the thickest of glasses will immediately note the data’s heteroscedasticity — the triangular or conical shape of the data. Heteroscedasticity sounds like an infectious diseases, but it is merely a stats word that means the error term is not consistent. As we can see above, there appears to be greater volatility in the higher swinging strike range than in the low SwStr% range.

There are some statistical tricks to solve a problem like Maria heteroscedasticity, but let’s skip all the math and just cheat a little here (please don’t tattle to Tango).

Let’s just look at on of the worst possible regressions — and by “worst,” I mean “least spectacular.” Connecting the two orange dots (the dots of Paul Byrd and Michael Wuertz), we find an equation of y = 2.2619x + 0.0163.

This is essentially the worst-case scenario for pitchers with a given swinging strike rate. Only the big exceptions will perform worse than ol’ Byrdie and Wuertz… Wuertzy…?

So, if we apply this formula to the 2011 season, say 100 IP minimum, we should be able to effectively predict a true talent floor, so to speak, and assemble a cache of player poised for more strikeouts (and thereby a higher FIP).

Obviously, some fellas, such as old man Tim Wakefield, will be exceptions. Wakefield ranks among the top of this improvement group, but we all well know that he and his knuckling twirl have long passed their 20% K-rate days.

So do realize, these numbers lack a degree of precision — each player involved may or may not have legit reasons to beat or under-perform their expected K% (and likewise their adjusted FIP). Moreover, the Should FIP or adjusted FIP below was made pretty much by multiplying their total batters faced by their new strikeout rate. Presumably, though, their TBF total would be different if they were indeed striking out more batters.

So caveats all around. The main lesson here: This is not precise, but this should be fun.

NOTE: Because I am using a worst-case formula, I have only included the players who we expect should improve next year. So even those pitchers with 0% expected change may see K% upticks in 2012.

Name SwStr% K% FIP xK% DIFF Should FIP FIP DIFF
Francisco Liriano 11.4% 19.0% 4.54 27.4% 8.4% 3.80 -0.74
Tim Wakefield 8.9% 13.7% 4.99 21.8% 8.1% 4.29 -0.70
Jeremy Hellickson 9.7% 15.1% 4.44 23.6% 8.5% 3.75 -0.69
Chris Narveson 10.4% 18.0% 4.06 25.2% 7.2% 3.44 -0.62
Carl Pavano 7.1% 10.7% 4.10 17.7% 7.0% 3.50 -0.60
Fausto Carmona 7.9% 13.1% 4.56 19.5% 6.4% 3.99 -0.57
Jaime Garcia 10.5% 18.9% 3.23 25.4% 6.5% 2.68 -0.55
Jeff Francis 7.0% 11.3% 4.10 17.5% 6.2% 3.56 -0.54
Dillon Gee 9.1% 16.2% 4.65 22.2% 6.0% 4.12 -0.53
Randy Wells 8.2% 14.1% 5.11 20.2% 6.1% 4.58 -0.53
Phil Coke 8.3% 14.6% 3.57 20.4% 5.8% 3.06 -0.51
Freddy Garcia 8.7% 15.3% 4.12 21.3% 6.0% 3.61 -0.51
John Lannan 7.6% 13.1% 4.28 18.8% 5.7% 3.78 -0.50
Hiroki Kuroda 10.3% 19.2% 3.78 24.9% 5.7% 3.31 -0.47
Daniel Hudson 9.9% 18.4% 3.28 24.0% 5.6% 2.81 -0.47
Shaun Marcum 10.3% 19.2% 3.73 24.9% 5.7% 3.26 -0.47
Edwin Jackson 9.2% 17.2% 3.55 22.4% 5.2% 3.10 -0.45
Edinson Volquez 10.9% 21.3% 5.29 26.3% 5.0% 4.84 -0.45
Josh Tomlin 7.7% 13.4% 4.27 19.0% 5.6% 3.82 -0.45
Ricky Nolasco 8.9% 16.6% 3.54 21.8% 5.2% 3.09 -0.45
Carlos Carrasco 8.5% 15.9% 4.28 20.9% 5.0% 3.85 -0.43
Philip Humber 9.1% 17.2% 3.58 22.2% 5.0% 3.16 -0.42
Luke Hochevar 8.2% 15.3% 4.29 20.2% 4.9% 3.88 -0.41
Jeff Karstens 7.7% 14.4% 4.29 19.0% 4.6% 3.91 -0.38
Chris Capuano 10.5% 21.0% 4.04 25.4% 4.4% 3.66 -0.38
Brett Cecil 8.4% 16.4% 5.10 20.6% 4.2% 4.73 -0.37
Charlie Morton 7.4% 14.3% 3.77 18.4% 4.1% 3.41 -0.36
Guillermo Moscoso 7.4% 14.1% 4.23 18.4% 4.3% 3.88 -0.35
John Danks 9.3% 18.5% 3.82 22.7% 4.2% 3.47 -0.35
Tom Gorzelanny 10.5% 21.3% 4.19 25.4% 4.1% 3.84 -0.35
Roy Oswalt 8.0% 15.7% 3.44 19.7% 4.0% 3.09 -0.35
Cole Hamels 11.3% 22.8% 3.05 27.2% 4.4% 2.71 -0.34
Jason Vargas 7.8% 15.3% 4.09 19.3% 4.0% 3.75 -0.34
R.A. Dickey 7.8% 15.3% 3.77 19.3% 4.0% 3.44 -0.33
Jair Jurrjens 7.4% 14.4% 3.99 18.4% 4.0% 3.66 -0.33
Ricky Romero 9.6% 19.4% 4.20 23.3% 3.9% 3.88 -0.32
Homer Bailey 9.3% 18.9% 4.06 22.7% 3.8% 3.74 -0.32
A.J. Burnett 10.0% 20.7% 4.77 24.2% 3.5% 4.46 -0.31
Jason Hammel 6.5% 12.7% 4.83 16.3% 3.6% 4.52 -0.31
Dan Haren 9.9% 20.2% 2.98 24.0% 3.8% 2.67 -0.31
Joel Pineiro 5.2% 9.8% 4.43 13.4% 3.6% 4.12 -0.31
Carlos Villanueva 7.5% 15.0% 4.10 18.6% 3.6% 3.79 -0.31
Derek Lowe 8.1% 16.5% 3.70 20.0% 3.5% 3.39 -0.31
Mark Buehrle 6.5% 12.7% 3.98 16.3% 3.6% 3.68 -0.30
Jason Marquis 6.5% 13.0% 4.05 16.3% 3.3% 3.75 -0.30
CC Sabathia 11.2% 23.4% 2.88 27.0% 3.6% 2.58 -0.30
Matt Garza 11.2% 23.5% 2.95 27.0% 3.5% 2.65 -0.30
Alexi Ogando 8.9% 18.2% 3.65 21.8% 3.6% 3.36 -0.29
Alfredo Simon 8.1% 16.6% 4.42 20.0% 3.4% 4.13 -0.29
Aaron Harang 8.4% 17.3% 4.17 20.6% 3.3% 3.88 -0.29
Michael Pineda 11.8% 24.9% 3.42 28.3% 3.4% 3.14 -0.28
Chris Volstad 7.9% 16.3% 4.32 19.5% 3.2% 4.04 -0.28
Bud Norris 10.5% 22.1% 4.02 25.4% 3.3% 3.74 -0.28
Chris Carpenter 9.2% 19.2% 3.06 22.4% 3.2% 2.79 -0.27
Josh Collmenter 7.9% 16.1% 3.80 19.5% 3.4% 3.53 -0.27
Brian Duensing 7.8% 16.2% 4.27 19.3% 3.1% 4.00 -0.27
Jo-Jo Reyes 6.6% 13.6% 4.90 16.6% 3.0% 4.63 -0.27
John Lackey 7.0% 14.5% 4.71 17.5% 3.0% 4.44 -0.27
Joe Saunders 6.2% 12.4% 4.78 15.7% 3.3% 4.51 -0.27
Jake Westbrook 6.3% 12.9% 4.25 15.9% 3.0% 3.98 -0.27
Tim Hudson 8.6% 17.9% 3.39 21.1% 3.2% 3.13 -0.26
Livan Hernandez 6.4% 13.2% 3.96 16.1% 2.9% 3.71 -0.25
Zach Britton 7.0% 14.6% 4.00 17.5% 2.9% 3.75 -0.25
Brad Penny 4.6% 9.2% 5.02 12.0% 2.8% 4.77 -0.25
Max Scherzer 9.8% 20.9% 4.14 23.8% 2.9% 3.89 -0.25
Kevin Correia 5.7% 11.7% 4.85 14.5% 2.8% 4.61 -0.24
Johnny Cueto 7.9% 16.5% 3.45 19.5% 3.0% 3.21 -0.24
Rick Porcello 6.3% 13.3% 4.06 15.9% 2.6% 3.83 -0.23
Ivan Nova 6.6% 13.9% 4.01 16.6% 2.7% 3.79 -0.22
Scott Baker 10.4% 22.5% 3.45 25.2% 2.7% 3.23 -0.22
Trevor Cahill 7.6% 16.3% 4.10 18.8% 2.5% 3.88 -0.22
James Shields 10.7% 23.1% 3.42 25.8% 2.7% 3.20 -0.22
Matt Harrison 7.6% 16.3% 3.52 18.8% 2.5% 3.31 -0.21
Josh Beckett 10.5% 22.8% 3.57 25.4% 2.6% 3.37 -0.20
Matt Cain 9.1% 19.7% 2.91 22.2% 2.5% 2.71 -0.20
Mat Latos 10.6% 23.2% 3.16 25.6% 2.4% 2.96 -0.20
Roy Halladay 10.8% 23.6% 2.20 26.1% 2.5% 2.00 -0.20
Jake Peavy 9.2% 20.2% 3.21 22.4% 2.2% 3.02 -0.19
Randy Wolf 6.8% 14.8% 4.29 17.0% 2.2% 4.11 -0.18
Bronson Arroyo 5.8% 12.6% 5.71 14.7% 2.1% 5.53 -0.18
Jhoulys Chacin 8.2% 18.1% 4.23 20.2% 2.1% 4.06 -0.17
Kyle McClellan 5.7% 12.5% 4.92 14.5% 2.0% 4.75 -0.17
Mike Leake 7.7% 17.0% 4.22 19.0% 2.0% 4.05 -0.17
Mike Pelfrey 5.5% 12.2% 4.47 14.1% 1.9% 4.30 -0.17
Bruce Chen 6.7% 14.8% 4.39 16.8% 2.0% 4.23 -0.16
Anibal Sanchez 10.9% 24.3% 3.35 26.3% 2.0% 3.19 -0.16
Alfredo Aceves 7.6% 16.9% 4.03 18.8% 1.9% 3.87 -0.16
Ervin Santana 8.4% 18.8% 4.00 20.6% 1.8% 3.84 -0.16
Wade Davis 5.9% 13.2% 4.67 15.0% 1.8% 4.52 -0.15
Brad Bergesen 6.0% 13.5% 4.92 15.2% 1.7% 4.77 -0.15
Gavin Floyd 8.4% 18.9% 3.81 20.6% 1.7% 3.67 -0.14
Anthony Swarzak 5.5% 12.5% 4.04 14.1% 1.6% 3.90 -0.14
Brandon Morrow 11.5% 26.1% 3.64 27.6% 1.5% 3.51 -0.13
Javier Vazquez 8.9% 20.3% 3.57 21.8% 1.5% 3.45 -0.12
James McDonald 8.2% 18.8% 4.68 20.2% 1.4% 4.56 -0.12
Tim Lincecum 10.7% 24.4% 3.17 25.8% 1.4% 3.05 -0.12
Jeremy Guthrie 6.3% 14.6% 4.48 15.9% 1.3% 4.37 -0.11
Nick Blackburn 4.8% 11.3% 4.84 12.5% 1.2% 4.74 -0.10
Justin Masterson 7.5% 17.4% 3.28 18.6% 1.2% 3.18 -0.10
Brandon McCarthy 7.7% 17.8% 2.86 19.0% 1.2% 2.76 -0.10
Felipe Paulino 9.6% 22.2% 3.69 23.3% 1.1% 3.59 -0.10
Ted Lilly 8.5% 19.8% 4.21 20.9% 1.1% 4.12 -0.09
Ryan Dempster 9.3% 21.7% 3.91 22.7% 1.0% 3.82 -0.09
Brett Myers 7.4% 17.5% 4.26 18.4% 0.9% 4.18 -0.08
Dustin Moseley 5.3% 12.7% 3.99 13.6% 0.9% 3.91 -0.08
Carlos Zambrano 6.7% 15.9% 4.59 16.8% 0.9% 4.52 -0.07
Kyle Kendrick 5.1% 12.3% 4.55 13.2% 0.9% 4.48 -0.07
Jered Weaver 9.1% 21.4% 3.20 22.2% 0.8% 3.13 -0.07
Jordan Zimmermann 7.9% 18.7% 3.16 19.5% 0.8% 3.10 -0.06
Danny Duffy 7.7% 18.4% 4.82 19.0% 0.6% 4.76 -0.06
Kyle Lohse 5.9% 14.3% 3.67 15.0% 0.7% 3.62 -0.05
Jonathan Sanchez 9.7% 23.0% 4.30 23.6% 0.6% 4.25 -0.05
Jake Arrieta 7.4% 17.8% 5.34 18.4% 0.6% 5.29 -0.05
Chad Billingsley 7.6% 18.3% 3.83 18.8% 0.5% 3.79 -0.04
Paul Maholm 5.7% 14.1% 3.78 14.5% 0.4% 3.75 -0.03
Travis Wood 6.7% 16.4% 4.06 16.8% 0.4% 4.03 -0.03
Tyler Chatwood 4.6% 11.7% 4.89 12.0% 0.3% 4.86 -0.03
Gio Gonzalez 9.5% 22.8% 3.64 23.1% 0.3% 3.61 -0.03
Wandy Rodriguez 8.5% 20.5% 4.15 20.9% 0.4% 4.12 -0.03
Jonathon Niese 8.2% 19.9% 3.36 20.2% 0.3% 3.33 -0.03
Derek Holland 7.9% 19.2% 3.94 19.5% 0.3% 3.92 -0.02
Doug Fister 6.7% 16.7% 3.02 16.8% 0.1% 3.01 -0.01
Colby Lewis 8.2% 20.1% 4.54 20.2% 0.1% 4.54 0.00
Jeff Niemann 7.4% 18.4% 4.13 18.4% 0.0% 4.13 0.00

I must credit again Mike Podhorzer who got me thinking about this.





21 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
mike
12 years ago

really interesting; i’d love to see this on an xfip, sierra or tera basis, too. not sure how involved all of that is, from a math standpoint.