(Re) Introducing Hitter Volatility

I suspect many researchers and writers have their own white whale or unicorn; an idea or concept that they are always chasing, regardless of how fruitless or costly that search may ultimately be.

My unicorn is the concept of volatility. I spent a large part of my tenure at Beyond the Box Score exploring the topic for both hitters and pitchers. I even looked at the concept in relation to team performance earlier this year at FanGraphs and other outlets.

Essentially, the idea is to understand whether there are appreciable differences in how players distribute their daily performances over the course of a season. For example, if you have two hitters that are roughly equal in terms of overall skill (i.e. both are 25% better offensively than the league average) is there a difference in terms of how much each is likely to vary from their overall performance on a game to game basis? Is one hitter more consistent day in and day out, while the other mixes in phenomenal performances with countless 0-4 days?

My initial work had some problematic issues (as most initial work does), but thanks to some great feedback from readers and colleagues alike I am ready to roll out the new and improved version of Volatility (VOL), starting with hitters.

The biggest issue with my initial formulation was that it assumed that hitters daily performances (measured by weighted on-base average — wOBA) were normally distributed.

As with team run scoring, it turns out that is not the case. To illustrate, here is the distribution of all daily performances for 2012 (note that I am including stolen bases and caught stealing in the wOBA calculation since I want overall offensive production, not just what hitters do at the plate):

This meant that simply looking at something like the standard deviation of daily performances risked creating a metric that was biased against hitters with a higher seasonal wOBA. I tried a few different things, but I still ended up with metrics that were highly correlated with seasonal wOBA.

Enter colleague and mathematical wizard Matt Swartz. Matt suggested an approach that he used in an older study on team-level run scoring where he transformed a team’s seasonal run scoring average exponentially until the correlation between run scoring and the new variable was close to zero.

Using this technique I managed to come up with a metric that only has a .005 correlation to a player’s seasonal wOBA:

VOL = STD(daily_wOBA)/Yearly_wOBA^.52

Where;

VOL = volatility

STD(daily_wOBA) = the standard deviation of a player’s daily batting performance, measured by wOBA

Yearly_wOBA^.52 = a player’s yearly wOBA raised to the .52 power

Armed with this new metric we can now ask a whole slew of questions. I’ll start with some basic descriptive data and get into more inferential analysis in future articles.

Here are the players with the 25 lowest VOL scores for 2012 (min >= 300 plate appearances); VOL- is simply VOL indexed so that league average is 100 (not park adjusted):

Name Plate Appearances Yearly wOBA* VOL VOL-
Derek Jeter 740 0.343 0.380 76
Elvis Andrus 711 0.306 0.384 77
Jon Jay 502 0.342 0.388 77
Jose Reyes 716 0.335 0.389 78
Willie Bloomquist 338 0.303 0.400 80
Shane Victorino 666 0.316 0.400 80
Ryan Hanigan 371 0.323 0.403 80
Shin-Soo Choo 686 0.365 0.405 81
Martin Prado 690 0.347 0.408 81
Denard Span 568 0.331 0.409 81
Joey Votto 475 0.448 0.409 82
Carlos Lee 615 0.308 0.411 82
Alejandro De Aza 585 0.327 0.411 82
Mike Trout 639 0.416 0.412 82
Dustin Pedroia 623 0.347 0.414 83
David Wright 670 0.374 0.415 83
Alex Gordon 721 0.354 0.415 83
Chase Headley 699 0.375 0.417 83
Dustin Ackley 668 0.272 0.417 83
Angel Pagan 659 0.325 0.417 83
Michael Young 651 0.295 0.417 83
Chase Utley 362 0.347 0.418 83
Brett Lawrie 536 0.311 0.420 84
Jayson Werth 344 0.362 0.421 84
Jordan Pacheco 505 0.320 0.421 84

The least volatile player in 2012 was Derek Jeter. This shouldn’t be surprising, since it turns out that Jeter is the least volatile player since 1974 for hitters with at least ten seasons with >= 300 plate appearances in those seasons. Over 17 seasons, Jeter posted an average .397 VOL, four points better than Brett Butler (.401 – 14 seasons).

For reference, here’s the 30 least volatile hitters since 1974 (min 10 seasons with >= 300 PAs):

Rank Name # Seasons Ave VOL Ave wOBA*
1 Derek Jeter 17 0.397 0.362
2 Brett Butler 14 0.401 0.334
3 Chuck Knoblauch 12 0.404 0.345
4 Pete Rose 12 0.405 0.333
5 Luis Castillo 11 0.406 0.324
6 Willie Randolph 17 0.406 0.322
7 Ichiro Suzuki 12 0.410 0.344
8 Tony Gwynn 17 0.412 0.363
9 Wade Boggs 18 0.415 0.365
10 Steve Sax 11 0.419 0.305
11 Rickey Henderson 23 0.420 0.371
12 Jason Kendall 15 0.421 0.329
13 Ozzie Smith 17 0.424 0.296
14 Paul Molitor 19 0.424 0.356
15 Vince Coleman 10 0.426 0.302
16 Tony Phillips 14 0.427 0.334
17 Roberto Alomar 16 0.429 0.351
18 Rod Carew 12 0.429 0.358
19 David Eckstein 10 0.430 0.307
20 Mike Hargrove 12 0.430 0.344
21 Mark Grace 15 0.431 0.357
22 Rafael Furcal 12 0.431 0.321
23 Tim Raines 17 0.431 0.363
24 Todd Helton 14 0.433 0.405
25 Kenny Lofton 16 0.433 0.351
26 Buddy Bell 15 0.433 0.326
27 Bobby Abreu 14 0.434 0.380
28 Jose Offerman 10 0.435 0.320
29 Frank Thomas 15 0.435 0.418
30 Tom Herr 10 0.435 0.311

Joey Votto logged a little less than 500 plate appearances, but posted a .409 VOL. That’s incredible when you think about the fact that he had a .448 wOBA for the season. Basically, he was just as consistent as Denard Span, but with a wOBA that was 35% higher than Span’s.

The leader board should illustrate the general point that consistent doesn’t always mean better. For example, Michael Young was 17% more consistent that the league average last year, but he was abysmal at the plate overall. In his case, greater consistency meant that the Rangers didn’t benefit from as many “boom” type games as a less consistent hitter might have provided.

For completeness, here’s the 25 most volatile hitters from 2012:

NAME Plate Appearances Yearly wOBA VOL VOL-
Alexi Casilla 326 0.277 0.595 119
James Loney 465 0.275 0.596 119
Alex Presley 370 0.285 0.604 120
Elliot Johnson 331 0.262 0.605 121
Gerardo Parra 430 0.302 0.608 121
Casey McGehee 352 0.288 0.611 122
Ty Wigginton 360 0.286 0.616 123
Matt Carpenter 340 0.340 0.617 123
Mitch Moreland 357 0.331 0.618 123
Jarrod Saltalamacchia 448 0.309 0.620 124
Tyler Colvin 452 0.345 0.627 125
Juan Rivera 339 0.287 0.629 125
Tyler Greene 330 0.281 0.637 127
Carlos Gomez 452 0.348 0.643 128
Gaby Sanchez 326 0.269 0.651 130
Bryan LaHair 380 0.328 0.654 130
Greg Dobbs 342 0.304 0.656 131
Logan Morrison 334 0.333 0.659 131
Nyjer Morgan 322 0.246 0.663 132
Brian Bogusevic 404 0.283 0.667 133
Eric Chavez 313 0.329 0.693 138
Alexi Amarista 300 0.287 0.713 142
Jesus Guzman 321 0.329 0.724 144
Scott Hairston 398 0.348 0.738 147
Justin Maxwell 352 0.347 0.787 157

The one bit of inferential analysis I’ve completed was a look at the year to year correlation of VOL. Turns out, this new formulation has a higher correlation year to year than my previous one (.39 vs. .23). Overall, it’s still low — basically, it’s as reliable year to year as batting average — but there is a decent relationship and we do see evidence in the data that, like BABIP, over the course of a career players will sort by generally higher or lower VOL. For example, the correlation between a hitter’s average VOL for years one and two and a hitter’s volatility in year 3 is .42. This is something that definitely needs to be examined further, which brings me to next steps.

Well, there is a lot to do.

First, I want to look at what traits might lead a hitter to be more or less volatile. From my earlier research, and observations from others, my initial guess is high on-base, low strikeout, solid contact hitters will tend to have lower volatility. From the initial leader boards I am seeing these might still be the most significant variables, but of course it needs to be verified empirically.

Second, there is the larger question of whether the volatility of hitters matters all that much. How does it factor in to team construction? There is evidence that more consistent offenses tend to perform better over the course of the year (i.e. beat their pythagorean expectation in terms of wins), but the relationship between individual-level volatility and team-level volatility still needs to be addressed.

I’ll turn to these questions (and I’m sure a few more) in the coming months. Until then, comments and suggestions are welcome.

Oh, and here’s the complete VOL and VOL- leader board for 2012 (min >= 300 PAs) — you may need to refresh the page to see it. And, yes, you are welcome, Eno Sarris:


——————

*I used average constants from 2002-2012 in order to conduct some of the year to year correlational analysis. Also, as I mentioned earlier in the article I included stolen bases and caught stealing.





Bill leads Predictive Modeling and Data Science consulting at Gallup. In his free time, he writes for The Hardball Times, speaks about baseball research and analytics, has consulted for a Major League Baseball team, and has appeared on MLB Network's Clubhouse Confidential as well as several MLB-produced documentaries. He is also the creator of the baseballr package for the R programming language. Along with Jeff Zimmerman, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Twitter @BillPetti.

36 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Sylvan
11 years ago

It amuses me that Ruben Tejada and Prince Fielder are tied in this stat.