Checking Out 2022 zStats for Pitchers After Two Months of Play

As anyone who does a lot of work with projections could likely tell you, one of the most annoying things about modeling future performance is that results themselves are a small sample size. Individual seasons, even full ones over 162 games, still feature results that are not very predictive, such as a hitter or a pitcher with a BABIP low or high enough to be practically unsustainable. For example, if Luis Arraez finishes the season hitting .350, we don’t actually know that a median projection of .350 was the correct projection going into the season. There’s no divine baseball exchequer to swoop in and let you know if he was “actually” a .350 hitter who did what he was supposed to, a .320 hitter who got lucky, or a .380 hitter who suffered misfortune. If you flip heads on a coin eight times out of 10 and have no reason to believe you have a special coin-flipping ability, you’ll eventually see the split approach 50/50 given a sufficiently large number of coin flips. Convergence in probability is a fairly large academic area that we thankfully do not need to go into here. But for most things in baseball, you never actually get enough coin flips to see this happen. The boundaries of a season are quite strict.
What does this have to do with projections? This volatile data becomes the source of future predictions, and one of the things done in projections is to find things that are not only as predictive as the ordinary stats, but also more predictive based on fewer plate appearances or batters faced. Imagine, for example, if body mass index was a wonderful predictor of isolated power. It would be a highly useful one, as changes to it over the course of a season are bound to be rather small. The underlying reasons for performance tend to be more stable than the results, which is why ERA is more volatile than strikeout rate, and why strikeout rate is more volatile than the plate discipline stats that result in strikeout rate. Read the rest of this entry »