Projection Hindsight Is 20/20 and It’s Totally Awesome

March 23, 2020

One of the things you have to get used to when you work with projections is being wrong. Like, All. Of. The. Time. While I’d like to believe that the projections are accurate and it’s just real life that mucked things up, that isn’t quite how they work. There are always events you didn’t see coming, assumptions you made erroneously, and just plain old irreducible error, all of which are going to thwart you.

On a basic level, you’re supposed to be wrong. Imagine a world in which you knew, for an exact fact, that every team was a coin flip to win every game. With this perfect knowledge, you’d still expect nearly a quarter of the league to win either 73 games or fewer, or 89 games or more, through nothing but luck. For the math-inclined, this is a hypergeometric distribution, not a binomial one; the coin flips are not independent because the win totals will still add up to 2,430 and one team’s win invariably is another team’s loss. Here’s a quick table for some of the win totals, showing the probability of a team winning exactly X games and how many of the teams you’d expect to have won up to X games:

Win Probabilities, Major League Coin-Flipping

Wins	Probability	1-in-X Chance of Occurring	Cumulative
70	1.4%	73	5%
71	1.8%	56	6%
72	2.3%	44	9%
73	2.8%	35	12%
74	3.4%	29	15%
75	4.0%	25	19%
76	4.6%	22	24%
77	5.2%	19	29%
78	5.7%	18	35%
79	6.1%	17	41%
80	6.3%	16	47%
81	6.4%	16	53%
82	6.3%	16	60%
83	6.1%	17	66%
84	5.7%	18	71%
85	5.2%	19	76%
86	4.6%	22	81%
87	4.0%	25	85%
88	3.4%	29	89%
89	2.8%	35	91%
90	2.3%	44	94%
91	1.8%	56	95%

As an example, you’d expect 3.4% of those coin flip teams to win exactly 74 games, with 15% of all teams winning up to 74 games.

But we don’t have anywhere near perfect knowledge about how good a team will be. We’re not even in the same zip code as “near perfect”; we just hope to be on the right continent. As a result, our error bars are going to be significantly larger than even the rather erroneous results you still get with omniscient projections.

One of the questions I get a lot is whether ZiPS overrates or underrates a particular type of team. For example, does ZiPS miss more on younger teams? Older teams? Teams with more speed? Teams with good bullpens? Unfortunately, it’s nowhere near that simple. Attributes like that are calibration errors and, compared to other sources of error, they’re comparatively easy to identify and iron out. I’d be giddy if there were that much delicious, low-hanging fruit to pluck from the vine.

That’s not to say there aren’t teams that ZiPS has missed on more often than others. Let’s start with a table of yearly team win misses across ZiPS’ history:

ZiPS Win Misses by Franchise

Team	Average Miss	Probability
Cubs	-3.4	97.7%
Giants	-2.6	93.2%
Orioles	-2.2	90.0%
Diamondbacks	-2.0	87.6%
Mariners	-1.9	85.8%
Tigers	-1.4	79.4%
Rockies	-1.4	79.4%
Reds	-1.2	75.6%
Padres	-1.0	71.5%
Phillies	-0.8	67.0%
Mets	-0.8	67.0%
Nationals	-0.8	67.0%
Red Sox	-0.6	64.1%
Twins	-0.3	55.8%
Rays	-0.2	54.2%
Royals	-0.2	54.2%
Blue Jays	-0.1	50.8%
Pirates	0.1	47.5%
Braves	0.3	42.5%
Indians	0.4	39.2%
Dodgers	1.0	27.1%
Athletics	1.1	25.8%
Marlins	1.2	23.0%
White Sox	1.4	20.6%
Brewers	1.6	17.2%
Cardinals	1.9	12.3%
Rangers	2.1	10.0%
Astros	2.6	5.7%
Yankees	3.0	3.6%
Angels	3.5	1.9%

ZiPS has missed on teams, but that’s not the same thing as those misses being predictive. ZiPS has overrated the Cubs by 3.4 wins a year on-average, but that’s not as exciting as it sounds: given the number of seasons for which ZiPS has made projections, you’d expect to miss that often on 2.3% of teams. ZiPS has overrated nine teams by at least a win per season (2005-2019), but you expect to miss by at least a win per season on roughly 30% of the league, or for … nine teams.

Put a different way, if you simply look at the correlation between an organization’s projection error one year and the next, the year-to-year correlation is -0.004. In other words, year-to-year errors for the same organization are essentially random.

Errors in a team’s projection come primarily from two sources. The first is missing on the projections of the individual players. When your projections see a Marcus Semien as a three-win player instead of a seven-win player, you’ve already “denied” the A’s four wins. Get enough of those big misses in one direction or the other, and you’re going to miss badly on the team.

The other major source of error comes from players’ actual playing time. We know, of course, that Mike Trout is going to get a lot more playing time than Michael Hermosillo if teams were to play an infinite number of seasons, but to the chagrin of anyone who tries to model baseball, they only actually play one season per year. (And sometimes less…)

Injuries and trades can eviscerate playing time projections very quickly. Imagine if the Red Sox, instead of trading Mookie Betts during the winter, decided to send him to Los Angeles after Opening Day. Suddenly, you’ve carved out a five-win swing for two franchises relative to your original assumptions. A 200-inning workhorse can become a zero-inning injured-list resident after nothing but a twinge in the forearm and an awkward discussion with James Andrews or Neal ElAttrache.

So, how much did ZiPS miss on teams in 2019? Let’s start with the final preseason projections and the actual win totals. There was a game unplayed between the White Sox and Tigers, so I’m arbitrarily adding a half-win to both teams; we’re making hamburgers here, not performing heart surgery:

ZiPS Win Projections, 2019

Team	2019 Wins	ZiPS Projected Wins	Miss
Twins	101	83	-18
Dodgers	106	93	-13
Rays	96	84	-12
Athletics	97	86	-11
Braves	97	87	-10
Rangers	78	68	-10
Astros	107	97	-10
Diamondbacks	85	77	-8
Giants	77	70	-7
Cardinals	91	86	-5
White Sox	73	68	-5
Yankees	103	98	-5
Brewers	89	85	-4
Marlins	57	56	-1
Nationals	93	93	0
Mets	86	87	1
Cubs	84	87	3
Indians	93	96	3
Reds	75	80	5
Orioles	54	59	5
Phillies	81	87	6
Rockies	71	79	8
Blue Jays	67	75	8
Mariners	68	76	8
Royals	59	68	9
Pirates	69	78	9
Angels	72	81	9
Red Sox	84	94	10
Padres	70	81	11
Tigers	47	68	21

The typical miss was 7.8 wins. That’s about an average result; some years ZiPS does better, some years ZiPS does worse. But the important question — the one I need to answer when doing a season’s post-mortem — is whether the error was due to getting the players wrong or getting the playing time assumptions wrong, as the latter error can sometimes be a Very Big Deal.

The largest team error in ZiPS history that was based purely on playing-time assumptions belongs to the 2014 Texas Rangers. Coming off their fourth-straight 90-win season, a run that featured two World Series appearances, ZiPS projected the team to go 88-74 in 2014. 88 wins would have been the team’s worst performance since 2009, but the bottom dropped out and quickly, and the Rangers finished at 67-95. The culprit? A stunning number of injuries. The Rangers combined for 2,116 days on the disabled list, which was, according to Jeff Zimmerman’s quite in-depth research at the time, the most days lost to injury for any team from 2002 to 2014, and nearly 100 more games lost than the next-worst team.

Knowing who actually got the playing time in Texas that season, and without changing a single projection for an individual player, ZiPS would have projected the Rangers to go 65-97, a 23-win difference. In other words, Texas underperformed the expectations terribly thanks to losing so much time due to injury, despite ZiPS actually underrating the individual players slightly.

So, let’s reshuffle the 2019 projections, this time with 20/20 hindsight about who actually played:

ZiPS Win Projections, 2019, Perfect Playing Time Knowledge

Team	2019 Wins	ZiPS Projected Wins	Reconfigured Projected Wins	New Miss
Tigers	47	68	57	10
Red Sox	84	94	90	6
Pirates	69	78	75	6
Rockies	71	79	77	6
Royals	59	68	64	5
Angels	72	81	77	5
Blue Jays	67	75	72	5
Marlins	57	56	62	5
Mariners	68	76	72	4
Phillies	81	87	85	4
Cubs	84	87	88	4
Mets	86	87	90	4
Padres	70	81	73	3
Orioles	54	59	57	3
Reds	75	80	77	2
Giants	77	70	77	0
Nationals	93	93	92	-1
Indians	93	96	91	-2
Brewers	89	85	87	-2
Cardinals	91	86	89	-2
Dodgers	106	93	104	-2
Diamondbacks	85	77	82	-3
Braves	97	87	94	-3
White Sox	73	68	69	-4
Rays	96	84	91	-5
Rangers	78	68	72	-6
Astros	107	97	100	-7
Athletics	97	86	89	-8
Yankees	103	98	90	-13
Twins	101	83	87	-14

Unsurprisingly, this knowledge improves the accuracy of the projections greatly. Knowing who actually plays dropped the average error from 7.8 wins to 4.8 wins, and improved the win projections for 25 of the 30 teams. The biggest outlier in the latter group was the Yankees, a team ZiPS projected for 98 wins and that actually won 103. But that projection looks more accurate than it truly was; knowing who the Yankees actually played would have dropped the preseason projections to 90 wins.

On the flip side, ZiPS overrated the Tigers, Padres, and Red Sox by an average of 14 wins in 2019, an error that is cut nearly in half once you know who actually took the field. I under-projected the plate appearances of Fernando Tatis Jr. and Luis Urías by about 40%, failed to anticipate players like Gordon Beckham getting so much playing time, and greatly overestimated the health of Boston’s non-Eduardo Rodriguez pitchers and Dustin Pedroia.

More data means I can better model these playing time assumptions, but things like future injuries and eventual trades fall into that irreducible error category. This is part of the reason I’ve learned to not agonize over every missed projection or pull out my hair; I’d be even balder thanks to the unstoppable forces of math and fate.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG