The ZiPS Midseason Standings Update

We’re now three months past the last ZiPS projected standings, which ran before the season, and as one should expect, reality has caused a whole lot of changes to the prognostications. Most of the times when I run ZiPS standings, I use data from the in-season player projection model, which is simpler based on the fact that a full batch run of the 3,500 or so players projected, even if I split it up among my two most powerful computers, would take a total of about 30 hours to finish. But I always do the whole shebang in the middle of every month, and baseball’s pause for All-Star Week provides me an opportunity to run projected standings with the best possible model I can come up with, and not have it be a couple days out of date.
So that’s what we’re going to do. With one exception, the methodology remains identical to the one described in the final preseason projections.
I’ve spent the last week working on and testing an addition to the ZiPS standings model to factor in the problem that preseason projections have with temporality. Basically, you can project teams based on who they have in the organization at the time of the projection, but you can’t easily do it for players not in the organization who will eventually be. If I knew at the start of 2022 that the Padres would have Juan Soto for most of the summer, it would have had an effect on the preseason projections! Like any model that people continually work on, ZiPS doesn’t have substantial bias in almost all categories: there’s no systematic tendency to overrate or underrate any specific type of team (bias in exercises like this is easier to iron out than inaccuracy). But there’s an exception: ZiPS in the preseason slightly underrates teams that will eventually add value to the major league roster in the form of trade and overrates those do the opposite.
This is something I’ve long wanted to try to deal with in as effective a way as I could. So what I’ve done is gone back and re-projected every team at June 15, July 1, and July 15 since I started ZiPS, then, with the data of players each team added at the major league level, used the playoff projections at that date, the team’s payroll (it does have a factor), the weakness of the team’s worst positions, the time since last playoff appearance, and the team’s farm system ranking (where possible) to make a probabilistic model of increases and decreases in roster strength due to the trade deadline. Overfitting is a concern, so I’ve cross-validated to do my best to ensure that isn’t an issue, and while it’s less than a half-win in final accuracy, any shaving off of error is a helpful thing. So these standings represent some increased chances that teams like the Orioles and Rangers have a slightly stronger roster than what is currently available from August 1 on, and that teams like the A’s and Tigers have weaker ones. The changes in projections are small because this is a noisy, inaccurate thing, but I’ll be tracking in future years both standings with and without this model to see how they fare. Read the rest of this entry »