zStats for Pitchers, June Update

Among the panoply of stats created by Statcast and similar tracking tools in recent years are a whole class of stats sometimes called the “expected stats.” These types of numbers elicit decidedly mixed feelings among fans – especially when they suggest their favorite team’s best player is overachieving – but they serve an important purpose of linking between Statcast data and the events that happen on the field. Events in baseball, whether a single or a homer or strikeout or whatever, happen for reasons, and this type of data allows us to peer a little better into baseball on an elemental level.
While a lucky home run or a seeing-eye single still count on the scoreboard and in the box score, the expected stats assist us in projecting what comes next. Naturally, as the developer of the ZiPS projection tool for the last 20 (!) years, I have a great deal of interest in improving these prognostications. Statcast has its own methodology for estimating expected stats, which you’ll see all over the place with a little x preceding the stats (xBA, xSLG, xwOBA, etc). While these data don’t have the status of magic, they do help us predict the future slightly less inaccurately, even if they weren’t explicitly designed to optimize predictive value. What ZiPS uses is designed to be as predictive as I can make it. I’ve talked a lot about this for both hitters and for pitchers. The expected stats that ZiPS uses are called zStats; I’ll let you guess what the “z” stands for!
It’s important to remember that these aren’t predictions in themselves. ZiPS certainly doesn’t just look at a pitcher’s zSO from the last year and go, “Cool, brah, we’ll just go with that.” But the data contextualize how events come to pass, and are more stable for individual players than the actual stats. That allows the model to shade the projections in one direction or the other. And sometimes it’s extremely important, such as in the case of homers allowed for pitchers. Of the fielding-neutral stats, homers are easily the most volatile, and home run estimators for pitchers are much more predictive of future homers than are actual homers allowed. Also, the longer a pitcher “underachieves” or “overachieves” in a specific stat, the more ZiPS believes the actual performance rather than the expected one.
One example of the last point is Tyler Anderson. He has a history of greatly underperforming what ZiPS expects, to the extent that ZiPS barely believes the zStats at this point (more on Anderson below). Expected stats give us useful information; they don’t conjure up magic.
What’s also interesting to me is that zHR is quite surprised by this year’s decline in homers. There have been 2,076 home runs hit in 2024 as I type this, yet before making the league-wide adjustment for environment, zHR thinks there “should have been” 2,375 home runs hit, a difference of 299. That’s a massive divergence; zHR has never been off by more than 150 home runs league-wide across a whole season, and it is aware that these home runs were mostly hit in April/May and the summer has yet to come. That does make me wonder about the sudden drop in offense this year. It’s not a methodology change either, as I re-ran 2023 with the current model (with any training data from 2023 removed) and there were 5,822 zHR last year compared to the actual total of 5,868 homers.
Let’s start the pitchers off with the summary data. Read the rest of this entry »