The 2020 ZiPS projection season starts Friday, and before it does, I wanted to offer a brief refresher of what ZiPS is and is not.
ZiPS is a computer projection system, initially developed by me from 2002-2004, and “officially” released in 2004. As technology and data availability have improved over the last 15 years, ZiPS has continually evolved. The current edition of ZiPS can’t even run on the Pentium 4 3.0 processor I used to develop the original version starting in 2002 (I checked). There are a lot more bells and whistles, but at its core, ZiPS engages in two fundamental tasks when making a projection: establishing a baseline for a player, and estimating what their future looks like using that baseline.
ZiPS uses multi-year statistics, with more recent seasons weighted more heavily; in the beginning, all the statistics received the same yearly weighting, but eventually, this became more varied based on additional research. Research is a big part of ZiPS and every year, I run literally hundreds of studies on various aspects of the system to determine their predictive value and better calibrate the player baselines. What started with the data available in 2002 has expanded considerably; basic hit, velocity, and pitch data began playing a larger role starting in 2013, and data derived from StatCast has been included in recent years as I got a handle on the predictive value and impact of those numbers on existing models. I believe in cautious, conservative design, so data is only included once I have confidence in improved accuracy; there are always builds of ZiPS that are still a couple of years away. Additional internal ZiPS tools like zBABIP, zHR, zBB, and zSO are used to better establish baseline expectations for players. These stats work similarly to the various flavors of “x” stats, with the z standing for something I’d wager you’ve already figured out!
When estimating a player’s future production, ZiPS compares their baseline performance, both in quality and shape, to the baseline of every player in its database at every point in their career. This database consists of every major leaguer since the Deadball era — the game was so different prior to then that I’ve found pre-Deadball comps make projections less accurate — and every minor league translation since what is now the late 1960s. Using cluster analysis techniques (Mahalanobis distance is one of my favorite tools), ZiPS assembles a cohort of fairly similar players across history for player comparisons, something you see in the most similar comps list. Non-statistical factors include age, position, handedness, and, to a lesser extent, height and weight compared to the average height and weight of the era (unfortunately, this data is not very good). ZiPS then generates a probable aging curve — both midpoint projections and range — on the fly for each player.
This method has been used by PECOTA and by the Elias Baseball Analyst in the late 1980s, and I think it is the best approach. After all, there is little experimental data in baseball; the only way we know how plodding sluggers age is from observing how plodding sluggers age.
One of the tenets of projections I follow is that no matter what the projection says, that’s the ZiPS projection. Even if inserting my opinion would improve a specific projection, I’m philosophically opposed to doing so. ZiPS is most useful when people know that it’s purely data-based, not some unknown mix of data and my opinion. Over the years, I like to think I’ve taken a clever approach to turning more things into data — for example, ZiPS’ use of basic injury information — but some things just aren’t in the model. ZiPS doesn’t know if such-and-such a pitcher wasn’t allowed to throw his slider coming back from injury, or if a left fielder suffered a family tragedy in July. I consider these things outside a projection system’s purview, even though they can affect on-field performance.
It’s also important to remember that the bottom-line projection is, in layman’s terms, only a midpoint. You don’t expect every player to hit that midpoint; 10% of players are “supposed” to fail to meet their 10th percentile projection and 10% of players are supposed to pass the 90th percentile forecast. This point can create a surprising amount of confusion. ZiPS gave .300 BA projections to two players in 2019: Daniel Murphy (oops!) and José Altuve. But that doesn’t mean ZiPS thought there would only be two .300 hitters. ZiPS actually projected, on average, 21.3 qualified .300 hitters (there were 19). ZiPS didn’t think any given hitter was more likely to hit .325 than not, but it expected someone to.
|Vladimir Guerrero Jr.||7.5%|
Another crucial thing bear in mind is that the basic ZiPS projections are not playing-time predictors. By design, ZiPS has no idea who will actually play in the majors in 2020. ZiPS is essentially projecting equivalent production; a batter with a .240 projection may “actually” have a .260 Triple-A projection or a .290 Double-A projection. How an Adley Rutschman would hit in the majors full-time in 2020 is a far more interesting use of a projection system than it telling me that he won’t play in the majors. For the depth charts that go live in every article, I use the FanGraphs Depth Charts to determine the playing time for individual players. Since we’re talking about team construction, I can’t leave ZiPS to its own devices for an application like this. It’s the same reason I use the depth charts for team projections in-season.
The to-do list never shrinks. One of the things still on the drawing board is better run/RBI projections. ZiPS wasn’t originally designed as a fantasy baseball tool — fantasy baseball analysts have been making fantasy-targeted projections for a long times — but given that ZiPS is frequently used by fantasy players, more sophisticated models are in the works. Saves, on the other hand, are a particularly difficult issue. As of now, the only thing I tell ZiPS about a player’s role is if it is going to change, which determines if ZiPS sees future Mike Moustakas as second or third baseman. I’ve tried a lot of shortcuts, like trying to model the manager’s decision about who the closer would be, using both statistics and things like age, salary, and save-history. While it generally does a good job projecting who will be the closer, the misses are gigantic, and renders save projections ineffective; a managerial decision can turn a 35-save pitcher into a five-save pitcher. I’m still figuring out how to approach this problem.
Have any questions, suggestions, or concerns about ZiPS? I’ll try to reply to as many as I can reasonably address in the comments below.
Dan Szymborski is a senior writer for FanGraphs and the developer of the ZiPS projection system. He was a writer for ESPN.com from 2010-2018, a regular guest on a number of radio shows and podcasts, and a voting BBWAA member. He also maintains a terrible Twitter account at @DSzymborski.