(Re) Introducing Hitter Volatility

December 17, 2012

I suspect many researchers and writers have their own white whale or unicorn; an idea or concept that they are always chasing, regardless of how fruitless or costly that search may ultimately be.

My unicorn is the concept of volatility. I spent a large part of my tenure at Beyond the Box Score exploring the topic for both hitters and pitchers. I even looked at the concept in relation to team performance earlier this year at FanGraphs and other outlets.

Essentially, the idea is to understand whether there are appreciable differences in how players distribute their daily performances over the course of a season. For example, if you have two hitters that are roughly equal in terms of overall skill (i.e. both are 25% better offensively than the league average) is there a difference in terms of how much each is likely to vary from their overall performance on a game to game basis? Is one hitter more consistent day in and day out, while the other mixes in phenomenal performances with countless 0-4 days?

My initial work had some problematic issues (as most initial work does), but thanks to some great feedback from readers and colleagues alike I am ready to roll out the new and improved version of Volatility (VOL), starting with hitters.

The biggest issue with my initial formulation was that it assumed that hitters daily performances (measured by weighted on-base average — wOBA) were normally distributed.

As with team run scoring, it turns out that is not the case. To illustrate, here is the distribution of all daily performances for 2012 (note that I am including stolen bases and caught stealing in the wOBA calculation since I want overall offensive production, not just what hitters do at the plate):

This meant that simply looking at something like the standard deviation of daily performances risked creating a metric that was biased against hitters with a higher seasonal wOBA. I tried a few different things, but I still ended up with metrics that were highly correlated with seasonal wOBA.

Enter colleague and mathematical wizard Matt Swartz. Matt suggested an approach that he used in an older study on team-level run scoring where he transformed a team’s seasonal run scoring average exponentially until the correlation between run scoring and the new variable was close to zero.

Using this technique I managed to come up with a metric that only has a .005 correlation to a player’s seasonal wOBA:

VOL = STD(daily_wOBA)/Yearly_wOBA^.52

Where;

VOL = volatility

STD(daily_wOBA) = the standard deviation of a player’s daily batting performance, measured by wOBA

Yearly_wOBA^.52 = a player’s yearly wOBA raised to the .52 power

Armed with this new metric we can now ask a whole slew of questions. I’ll start with some basic descriptive data and get into more inferential analysis in future articles.

Here are the players with the 25 lowest VOL scores for 2012 (min >= 300 plate appearances); VOL- is simply VOL indexed so that league average is 100 (not park adjusted):

Name	Plate Appearances	Yearly wOBA*	VOL	VOL-
Derek Jeter	740	0.343	0.380	76
Elvis Andrus	711	0.306	0.384	77
Jon Jay	502	0.342	0.388	77
Jose Reyes	716	0.335	0.389	78
Willie Bloomquist	338	0.303	0.400	80
Shane Victorino	666	0.316	0.400	80
Ryan Hanigan	371	0.323	0.403	80
Shin-Soo Choo	686	0.365	0.405	81
Martin Prado	690	0.347	0.408	81
Denard Span	568	0.331	0.409	81
Joey Votto	475	0.448	0.409	82
Carlos Lee	615	0.308	0.411	82
Alejandro De Aza	585	0.327	0.411	82
Mike Trout	639	0.416	0.412	82
Dustin Pedroia	623	0.347	0.414	83
David Wright	670	0.374	0.415	83
Alex Gordon	721	0.354	0.415	83
Chase Headley	699	0.375	0.417	83
Dustin Ackley	668	0.272	0.417	83
Angel Pagan	659	0.325	0.417	83
Michael Young	651	0.295	0.417	83
Chase Utley	362	0.347	0.418	83
Brett Lawrie	536	0.311	0.420	84
Jayson Werth	344	0.362	0.421	84
Jordan Pacheco	505	0.320	0.421	84

The least volatile player in 2012 was Derek Jeter. This shouldn’t be surprising, since it turns out that Jeter is the least volatile player since 1974 for hitters with at least ten seasons with >= 300 plate appearances in those seasons. Over 17 seasons, Jeter posted an average .397 VOL, four points better than Brett Butler (.401 – 14 seasons).

For reference, here’s the 30 least volatile hitters since 1974 (min 10 seasons with >= 300 PAs):

Rank	Name	# Seasons	Ave VOL	Ave wOBA*
1	Derek Jeter	17	0.397	0.362
2	Brett Butler	14	0.401	0.334
3	Chuck Knoblauch	12	0.404	0.345
4	Pete Rose	12	0.405	0.333
5	Luis Castillo	11	0.406	0.324
6	Willie Randolph	17	0.406	0.322
7	Ichiro Suzuki	12	0.410	0.344
8	Tony Gwynn	17	0.412	0.363
9	Wade Boggs	18	0.415	0.365
10	Steve Sax	11	0.419	0.305
11	Rickey Henderson	23	0.420	0.371
12	Jason Kendall	15	0.421	0.329
13	Ozzie Smith	17	0.424	0.296
14	Paul Molitor	19	0.424	0.356
15	Vince Coleman	10	0.426	0.302
16	Tony Phillips	14	0.427	0.334
17	Roberto Alomar	16	0.429	0.351
18	Rod Carew	12	0.429	0.358
19	David Eckstein	10	0.430	0.307
20	Mike Hargrove	12	0.430	0.344
21	Mark Grace	15	0.431	0.357
22	Rafael Furcal	12	0.431	0.321
23	Tim Raines	17	0.431	0.363
24	Todd Helton	14	0.433	0.405
25	Kenny Lofton	16	0.433	0.351
26	Buddy Bell	15	0.433	0.326
27	Bobby Abreu	14	0.434	0.380
28	Jose Offerman	10	0.435	0.320
29	Frank Thomas	15	0.435	0.418
30	Tom Herr	10	0.435	0.311

Joey Votto logged a little less than 500 plate appearances, but posted a .409 VOL. That’s incredible when you think about the fact that he had a .448 wOBA for the season. Basically, he was just as consistent as Denard Span, but with a wOBA that was 35% higher than Span’s.

The leader board should illustrate the general point that consistent doesn’t always mean better. For example, Michael Young was 17% more consistent that the league average last year, but he was abysmal at the plate overall. In his case, greater consistency meant that the Rangers didn’t benefit from as many “boom” type games as a less consistent hitter might have provided.

For completeness, here’s the 25 most volatile hitters from 2012:

NAME	Plate Appearances	Yearly wOBA	VOL	VOL-
Alexi Casilla	326	0.277	0.595	119
James Loney	465	0.275	0.596	119
Alex Presley	370	0.285	0.604	120
Elliot Johnson	331	0.262	0.605	121
Gerardo Parra	430	0.302	0.608	121
Casey McGehee	352	0.288	0.611	122
Ty Wigginton	360	0.286	0.616	123
Matt Carpenter	340	0.340	0.617	123
Mitch Moreland	357	0.331	0.618	123
Jarrod Saltalamacchia	448	0.309	0.620	124
Tyler Colvin	452	0.345	0.627	125
Juan Rivera	339	0.287	0.629	125
Tyler Greene	330	0.281	0.637	127
Carlos Gomez	452	0.348	0.643	128
Gaby Sanchez	326	0.269	0.651	130
Bryan LaHair	380	0.328	0.654	130
Greg Dobbs	342	0.304	0.656	131
Logan Morrison	334	0.333	0.659	131
Nyjer Morgan	322	0.246	0.663	132
Brian Bogusevic	404	0.283	0.667	133
Eric Chavez	313	0.329	0.693	138
Alexi Amarista	300	0.287	0.713	142
Jesus Guzman	321	0.329	0.724	144
Scott Hairston	398	0.348	0.738	147
Justin Maxwell	352	0.347	0.787	157

The one bit of inferential analysis I’ve completed was a look at the year to year correlation of VOL. Turns out, this new formulation has a higher correlation year to year than my previous one (.39 vs. .23). Overall, it’s still low — basically, it’s as reliable year to year as batting average — but there is a decent relationship and we do see evidence in the data that, like BABIP, over the course of a career players will sort by generally higher or lower VOL. For example, the correlation between a hitter’s average VOL for years one and two and a hitter’s volatility in year 3 is .42. This is something that definitely needs to be examined further, which brings me to next steps.

Well, there is a lot to do.

First, I want to look at what traits might lead a hitter to be more or less volatile. From my earlier research, and observations from others, my initial guess is high on-base, low strikeout, solid contact hitters will tend to have lower volatility. From the initial leader boards I am seeing these might still be the most significant variables, but of course it needs to be verified empirically.

Second, there is the larger question of whether the volatility of hitters matters all that much. How does it factor in to team construction? There is evidence that more consistent offenses tend to perform better over the course of the year (i.e. beat their pythagorean expectation in terms of wins), but the relationship between individual-level volatility and team-level volatility still needs to be addressed.

I’ll turn to these questions (and I’m sure a few more) in the coming months. Until then, comments and suggestions are welcome.

Oh, and here’s the complete VOL and VOL- leader board for 2012 (min >= 300 PAs) — you may need to refresh the page to see it. And, yes, you are welcome, Eno Sarris:

——————

*I used average constants from 2002-2012 in order to conduct some of the year to year correlational analysis. Also, as I mentioned earlier in the article I included stolen bases and caught stealing.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG