Yes, Hitter xStats Are Useful

August 29, 2023

Some of the most frustrating arguments involving baseball statistics revolve around the use of expected stats. Perhaps the most frequently cited of these metrics are Statcast’s xStats, which use Statcast data for hitters to estimate the batting average, on-base percentage, slugging percentage, and wOBA you’d “expect” a hitter to achieve. Investigating how predictive xStats are compared to their corresponding actual stats has been a common research exercise over the last few years. While it depends on the exact dataset used, xStats by themselves generally aren’t much better than the actual stats at predicting the next year’s actual stats. But that doesn’t mean we should simply discard expected stats when trying to evaluate players.

While I’m not going spend too much time talking about how predictive xStats are versus the actual ones, I do want to briefly touch on some of the existing work on the subject. Jonathan Judge at Baseball Prospectus examined many of the expected metrics back in 2018. He also spoke with MLBAM’s Tom Tango about the nature of expected stats and their usage:

Earlier this week, we reached out to BAM with our findings, asking if they had any comment.

MLBAM Senior Database Architect of Stats Tom Tango promptly responded, asking that we ensure we had the most recent version of the data, due to some recent changes being made. We refreshed our data sets, found some small changes, and retested. The results were the same.

Tango then stressed that the expected metrics were only ever intended to be descriptive, that they were not designed to be predictive, and that if they had been intended to be predictive, they could have been designed differently or other metrics could be used.

One of my colleagues, Jeff Zimmerman, wrote about xStats in the fantasy context in 2018. Justin Mason looked into the data in 2021 and found that xStats are less predictive than actual ones.

It’s always good to have the most up-to-date information, so let’s start there. I pulled every player with consecutive 200 PA seasons since 2015; there were just over 1,800 season-pairs. I then ran the r-squared (the coefficient of determination) for xBA, xOBP, xSLG, xwOBA and their observed counterparts in the first season, and compared it to the second season:

R-Squared, xStats vs. Actual

Relationship	R-Squared
xBA to Next Year BA	0.173
BA to Next Year BA	0.163
xOBP to Next Year OBP	0.236
OBP to Next Year OBP	0.210
xSLG to Next Year SLG	0.226
SLG to Next Year SLG	0.189
xwOBA to Next Year wOBA	0.221
wOBA to Next Year wOBA	0.179

One thing that’s worth noting with this data set, which comes right up to Monday morning, is that I’m getting slightly better correlations than others have gotten in the past. The cause of that is tricky to identify, though one possible explanation is that the change from Trackman to Hawk-Eye in 2020 has helped to improve these metrics.

Still, the relationship between the expected stats and the actual ones is only slightly stronger than the alternative. That doesn’t mean, however, that the expected stats don’t matter when making evaluations.

When constructing a model, a developer will engage in a process called “dimensionality reduction.” There are many methods for doing this, but the basic idea is to take a dataset and reduce the number of features while still preserving the validity of the model. One thing that all the methods share, however, is that they don’t simply throw out variables because they have similar or even lesser correlations with the dependent variable. Even a variable that performs worse than another can still contribute to making a model more accurate than it would be otherwise. This is not an uncommon occurrence.

Imagine you’re trying to model someone’s life expectancy. Age is an extremely important variable. But factors such as whether the person is a smoker, their socioeconomic status, and their health history are also variables that, if known, serve to make the model more accurate than simply using age alone. The key is determining whether those lesser variables are capturing some useful information that age alone is not. Let’s use a baseball example to demonstrate this.

Since the divisional era started in 1969, the r-squared for OBP vs. runs per game is 0.73, basically meaning that 73% of the observed variance in OBP explains the observed variance in runs per game. For SLG, that number is 0.82. Team OBP has a weaker relationship with runs per game than Team SLG, but using both makes the model far better. OPS’ r-squared is 0.905, and OBP*SLG is 0.911. Now take the last part of the triple-slash, batting average. The r-squared for BA and runs per game is 0.53, but in this case, it’s not adding information that OBP and SLG aren’t already capturing; the r-squared for a model of runs scored per game only improves to 0.914 when incorporating BA. Indeed, when OBP and SLG are used, BA is actually a very slight negative factor, because OBP/SLG combinations slightly underrate walks.

So the pertinent question isn’t whether xStats are better than actual stats at predicting future performance, but whether they improve our ability to predict future performance when used in conjunction with actual stats. Below are the RMSE (root-mean squared error) for the relevant stats:

RMSE for Expected Stats vs. Actual

Stat	RMSE
BA to Next Year BA	0.0347
xBA to Next Year BA	0.0317
xBA and BA to Next Year BA	0.0312
OBP to Next Year OBP	0.0370
xOBP to Next Year OBP	0.0350
xOBP and OBP to Next Year OBP	0.0345
SLG to Next Year SLG	0.0760
xSLG to Next Year SLG	0.0739
xSLG and SLG to Next Year SLG	0.0719
wOBA to Next Year wOBA	0.0403
xwOBA to Next Year wOBA	0.0385
xWOBA and wOBA to Next Year wOBA	0.0378

[Chart fixed, it was very late and I flipped the xStats and Actuals -DS]

Knowing the error ranges of these stats doesn’t directly tell a user how to treat this data. So rather than ask what the error of each stat is, I went back to the full dataset and instead asked what linear combination of the xStat and the actual stat have worked best:

Mixing xStats and Actual Stats

Stat	xStat	Actual Stat
Next Year BA	73%	27%
Next Year OBP	70%	30%
Next Year SLG	59%	41%
Next Year wOBA	65%	35%

As a simple rule of thumb, you won’t do too badly if you simply regress xStats a third of the way towards the actual ones. But as you might have guessed, that changes depending on the player’s number of plate appearances.

For BA, if you only look at the players with at least 600 plate appearances in the first season, the ideal BA mix is 37% BA, 63% xBA. When you only look at the players with between 200 and 300 plate appearances, that becomes 10% BA and 90% xBA, a drastically different number. Naturally, this reflects the fact that the longer a player outperforms their xStats, the closer to the actual stats you expect them to be in the future. But let’s calculate that, too.

Adding multiple year inputs into the mix, I calculated the stabilization point for each of these four stats. This is the number of plate appearances at which the xStat and the actual stat have equal predictive power:

Stabilization Points for Expected Stats vs. Actual Stats

Stat	Stabilization Point (PA)
BA	1154
OBP	1007
SLG	607
wOBA	766

Sticking with using only xStats and actual ones to predict the future, you can approximate how much of the actual stat to use with the formula Actual % = PA / (PA + Stabilization Point).

If you’re trying to renovate your home, you can’t use a screwdriver for every task. But if you throw away your screwdriver because there’s a lot it can’t do, you’ll regret it when you encounter a screw. xStats aren’t a predictive model by themselves, but they can be a crucial part of a predictive model. The zStats used in ZiPS look at things like spray tendencies and speed to improve accuracy, but xStats are still a useful tool.

I’ll look at the pitcher side of the equation in a future piece.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG