The 2020 ZiPS projection season starts Friday, and before it does, I wanted to offer a brief refresher of what ZiPS is and is not.
ZiPS is a computer projection system, initially developed by me from 2002-2004, and “officially” released in 2004. As technology and data availability have improved over the last 15 years, ZiPS has continually evolved. The current edition of ZiPS can’t even run on the Pentium 4 3.0 processor I used to develop the original version starting in 2002 (I checked). There are a lot more bells and whistles, but at its core, ZiPS engages in two fundamental tasks when making a projection: establishing a baseline for a player, and estimating what their future looks like using that baseline.
ZiPS uses multi-year statistics, with more recent seasons weighted more heavily; in the beginning, all the statistics received the same yearly weighting, but eventually, this became more varied based on additional research. Research is a big part of ZiPS and every year, I run literally hundreds of studies on various aspects of the system to determine their predictive value and better calibrate the player baselines. What started with the data available in 2002 has expanded considerably; basic hit, velocity, and pitch data began playing a larger role starting in 2013, and data derived from StatCast has been included in recent years as I got a handle on the predictive value and impact of those numbers on existing models. I believe in cautious, conservative design, so data is only included once I have confidence in improved accuracy; there are always builds of ZiPS that are still a couple of years away. Additional internal ZiPS tools like zBABIP, zHR, zBB, and zSO are used to better establish baseline expectations for players. These stats work similarly to the various flavors of “x” stats, with the z standing for something I’d wager you’ve already figured out!
When estimating a player’s future production, ZiPS compares their baseline performance, both in quality and shape, to the baseline of every player in its database at every point in their career. This database consists of every major leaguer since the Deadball era — the game was so different prior to then that I’ve found pre-Deadball comps make projections less accurate — and every minor league translation since what is now the late 1960s. Using cluster analysis techniques (Mahalanobis distance is one of my favorite tools), ZiPS assembles a cohort of fairly similar players across history for player comparisons, something you see in the most similar comps list. Non-statistical factors include age, position, handedness, and, to a lesser extent, height and weight compared to the average height and weight of the era (unfortunately, this data is not very good). ZiPS then generates a probable aging curve — both midpoint projections and range — on the fly for each player. Read the rest of this entry »