Nine years ago next month, we introduced a new stat to the pages of FanGraphs. We called it Win Values, and on the player pages and leaderboards, it went by the acronym WAR. We wouldn’t actually start calling it that, or use the words for which the acronym stood (Wins Above Replacement) for a little while, since we thought Win Values sounded cooler. And as the people who bring you WPA/LI and RE24, we’re clearly the experts on statistical naming coolness.
Over the last nine years, WAR has become something of a flagship metric, not just for us, but for the analytical community at large. Baseball-Reference introduced their own version, while Baseball Prospectus modernized their version of WARP — their version adds the word player to the name, thus the P — to provide something that scaled a bit more like what was presented here and at B-R. Because WAR is a framework for combining a number of different metrics into a single-value stat, there are also quite a few other versions of WAR out there, each with their own calculations.
But while everyone uses different inputs — and therefore arrives at slightly different results — almost all of the regularly updated WAR metrics are built on some version of linear weights, which assigns an average run value to each event in which a player is involved, regardless of what actually happened on the play. If you hit a single, you get credit for hitting a single. It’s worth some fraction of a run, regardless of whether you hit it with two outs and the bases empty in a the first inning of an eventual blowout, or whether it was a walk-off two-run single to give your team the lead. In most versions of WAR, the value of a player’s contribution is calculated independent of the situation in which it occurred.
Bill James is not a fan of that decision.
We come, then, to the present moment, at which some of my friends and colleagues wish to argue that Aaron Judge is basically even with Jose Altuve, and might reasonably have been the Most Valuable Player. It’s nonsense. Aaron Judge was nowhere near as valuable as Jose Altuve. Why? Because he didn’t do nearly as much to win games for his team as Altuve did. It is NOT close. The belief that it is close is fueled by bad statistical analysis—not as bad as the 1974 statistical analysis, I grant, but flawed nonetheless. It is based essentially on a misleading statistic, which is WAR. Baseball-Reference WAR shows the little guy at 8.3, and the big guy at 8.1. But in reality, they are nowhere near that close. I am not saying that WAR is a bad statistic or a useless statistic, but it is not a perfect statistic, and in this particular case it is just dead wrong. It is dead wrong because the creators of that statistic have severed the connection between performance statistics and wins, thus undermining their analysis.
James strongly believes that the metric falls apart by building up from runs, rather than working backwards from wins, since the context-neutral nature of the metric means that what WAR estimates a group of players are worth won’t add up to how many wins their team actually won. In his mind, the decision to make WAR context-neutral isn’t a point on which reasonable people can disagree; it’s just a mistake.
Read the rest of this entry »