WAR: It Works

We use Wins Above Replacement around here a lot, as one of the focuses of the site is to accurately quantify the value each player produces, and WAR is the best tool we have to do that. However, it faces a decent amount of skepticism from people who don’t trust various components for a variety of reasons – they don’t like the numbers that UZR spits out for defense, they don’t believe in replacement level, or they believe that pitchers do have control over their BABIP rates.

So, the question is, does WAR work? If it’s designed well, there should be a pretty strong correlation between a team’s total WAR and their actual record. Fans of WAR rejoice – there is.

For 2009, the correlation between a team’s projected record based on their WAR total and their actual record was .83. This is a robust number, especially considering that WAR is almost completely context independent and currently includes some notable omissions – base running (besides SB/CS, which are included in wOBA) and catcher defense are both ignored in the calculations. We also don’t have an adjustment for differences in leagues, so we’re not accounting for the fact that the AL is better than the NL.

Despite these imperfections, WAR still performs extremely well. One standard deviation of the difference between WAR and actual record is 6.4 wins, and every single team is within two standard deviations. Only four teams were more than 10 wins away from their projected total by WAR, with Tampa Bay ending up the furthest away from our expectation (96.6 projected wins, 84 actual wins), and 18 of the 30 teams were within six wins of their projected WAR total.

For comparison, the correlation between pythagorean expected record and actual record is .91, and pythag includes some aspects of context (performance with men on base, for instance) that impact runs scored and allowed, so we would expect it to predict actual record somewhat better than a context independent metric like WAR. The fact that WAR is even close to somewhat-context-included pythag is impressive in its own right.

WAR isn’t perfect. But given the known limitations and the variations in how contextual situations impact final record, it does an awfully impressive job of projecting wins and losses.

Dave is the Managing Editor of FanGraphs.

Newest Most Voted
Inline Feedbacks
View all comments
12 years ago

Try taking extra data (from 2002 on) and plot it out. Still, anything with > 80% correlation is usually good to go.