If you’ve spent any time observing some of the nerdier battles over baseball statistics in the last decade or two, you’re probably familiar with the arguments made for and against certain metrics. Beginning with the relatively simple matter of batting average versus on-base percentage, these debates tend generally to take the same shape. And generally, one recurring blind spot of such debates is that they tend to dwell on what certain statistics don’t do instead of best identifying what they do do.*
*Author’s note: /Nailed It
The last few years has seen the release, by MLB Advanced Media (MLBAM), of a flurry of new data and statistics, generally referred to as “Statcast data.” We’ve also seen advances in the measurement of catcher-framing by the people at Baseball Prospectus, who have also continued making improvements in the evaluation of pitchers in the form of Deserved Run Average (DRA). When new data and metrics emerge, there is inevitably a period of uncertainty that follows. What does this stat mean? What’s the best way to use this data set? Equally inevitable is the misapplication of new statistics. That aspect of potential statistical innovation is not really new.
Today, what is new is xwOBA — and, in part due to the wide proliferation of Statcast data by means of telecasts and MLB itself, more fans are finding and using stats like xwOBA than might have been in previous generations. As with other new metrics, we are still attempting to identify how xwOBA might best be used.
One such study into the potential utility of xwOBA was recently published by Jonathan Judge at Baseball Prospectus. The study is a good one, with Judge focusing on xwOBA against pitchers. While not ultimately his point, Judge does, along the way, object to the “x” in xwOBA, as he feels that “expected” implies predictive power. While I have always interpreted the “expected” to mean “what might have been expected to happen given neutral park and defense” — that is, without assuming a predictive measure — it does appear that reasonable people can disagree on that interpretation.
As for the study, Judge examines the predictive capability, the descriptive capability, and the reliability of xwOBA, comparing it to other, popular metrics like FIP, wOBA, and DRA. In terms of predictability, Judge finds no difference between xwOBA, FIP, and DRA on next year’s wOBA. In terms of reliability, as measured by its consistency year over year, xwOBA sits between DRA (.51) and FIP (.40), substantially better than wOBA itself. As to descriptive capability, xwOBA has the highest correlation to wOBA, followed closely by FIP, with DRA a bit further behind.
Judge concludes by questioning the utility of xwOBA for pitchers, as it is no better than FIP, a metric which has been around for quite some time. I agree with Judge’s conclusion that xwOBA for pitchers doesn’t tell us much more than FIP in terms of pitcher value or predictability, though I disagree that the statistic lacks utility. FIP and xwOBA are bound to be similar: both use strikeouts and walks as inputs, with FIP using homers as a proxy for batted balls and xwOBA using the launch angle and exit velocity of all batted balls. Further supporting Judge’s point is my own research, in which I found that, while xwOBA on contact was somewhat descriptive, it had little predictive value. Like Judge, I also found that FIP and xwOBA operated similarly in terms of predictive ability and reliability. Due to those findings, if I am looking at pitcher performance, I’m likely to rely on FIP, which is on the easy-to-understand ERA scale, as opposed to xwOBA.
Of course, one of the things I often do when looking at FIP is to compare it to ERA to gauge both the potential perception of how a pitcher is performing and the results in terms of runs the pitcher has given up. When they are different, I look at some potential “luck” factors, like BABIP (where hitters have significantly more control) over the outcome, or sequencing via left-on-base percentage. Often, these factors will help explain the difference between the two.
We can use xwOBA similarly, by comparing it to wOBA. Sequencing is not a factor in either xwOBA or wOBA, so here we are determining the quality of contact a pitcher has conceded compared to the on-field results. Here, the highly descriptive nature of xwOBA can be helpful. While xwOBA might not be more predictive than FIP, it can help explain how a pitcher has arrived at his runs-allowed total, providing greater detail than BABIP alone without the potential confusion of sequencing. Comparing xwOBA with wOBA helps explain whether a pitcher truly earned the results against him or whether there were significant factors mostly outside his control. While it might not be better than FIP, because it rates well descriptively, it tells us something different that FIP might not.
Descriptively and predictively, DRA and FIP have been found to be pretty similar, while DRA is a bit more reliable year-over-year in the studies Judge performed. DRA uses mixed models and incorporates much more granular pitch data — with location and type — to flesh out a pitcher’s skill and separate the role a defense might play in terms of potential outcomes. DRA underwent significant changes from 2016 to 2017 to make it as descriptive as FIP without losing reliability or predictive power. Those reasons provide support for using DRA as another good pitching statistic, but given their similarities and FIPs relative simplicity — it is harder to track and identify changes with DRA — they are both useful tools in the evaluation of pitchers, an endeavor in which xwOBA can play a role as well.
The bulk of this post has been devoted to xwOBA for pitchers. At some level, this is misleading, as xwOBA is considerably less useful for pitchers than it is for hitters. But there’s also less debate over the metric’s efficacy regarding hitters. For the latter group, there’s evidence to believe that xwOBA is descriptive, predictive, and reliable, maybe even in small samples. While speed and the shift are likely factors in a player’s under- or overperformance of xwOBA — with the direction of the ball on the field likely playing an important role — the stat appears useful despite these drawbacks. We might need to keep in mind how fast hitters can outperform their xwOBA, but xwOBA is likely good enough for hitters both to identify unlucky batters and assume some predictive value. We know about the fly-ball revolution and how it can help batters. With players making changes, we can see whether those changes are helping or not. More study is needed, but xwOBA for batters looks generally promising.
With any new statistic, there is going to be a period of transition during which the community discovers how best to employ it. As with most pursuits, increased knowledge and continued intellectual curiosity will help further understanding. Acknowledging strengths and flaws promotes productive discussion necessary to move everyone forward.
Craig Edwards can be found on twitter @craigjedwards.