On Christmas Eve of 2008, David Appelman gave the world a present – “win values” on the pages of FanGraphs. It wasn’t labeled WAR for a little while longer, though it was an implementation of the model Tom Tango laid out at The Book Blog a few months prior. Over these last four years, the model has become quite popular, and even those who are not fans of analytics know what WAR stands for. Over time, the model grew in popularity, and in 2010, Baseball Reference added it to their collection of statistics. Because WAR is essentially a model of player value, there are decisions that have to be made about the way it is constructed that don’t have an obviously correct answer. In places where we had made one decision, Sean Forman (and Sean Smith, who assisted with their original implementation) made some other decisions, and the calculations differ in some significant ways.
We know that this is a source of frustration for some folks, having two sites both publicly display different calculations for a statistic of the same name. Often, the differences between the two have been used to discredit the entire model. For instance, Jim Caple wrote this on ESPN.com a few months back:
Actually, we know it isn’t always accurate because depending on your source — FanGraphs or Baseball-reference.com — you can get wildly different WAR scores… For example:
Does (Jack) Morris, in fact, belong in the Hall of Fame? No, he doesn’t, according to baseball-reference.com, which gives him a WAR score of 39.3, tied for 145th all time among pitchers. Maybe he does, according to FanGraphs, which gives him a 56.9 WAR, 75th all time.
When Caple wrote it, I wasn’t exactly sure why Morris’ value differed so much, but since we measure pitching in very different ways, I assumed that the 17.6 win gap was due to some differences between Morris’ FIP and his runs allowed. But, then, I looked it up, and Morris’ career ERA (3.90) was almost an exact match for his FIP (3.94). Adjusted for park, Morris’ career FIP- was 97, while his ratio of RA9 to league average on Baseball-Reference is 96. Even with very different inputs, both models came to the same conclusion about Morris – he was a slightly above average pitcher who had a very long career. So, why did we give him credit for an additional 17.6 wins?
The answer, quite simply, lies with replacement level. Our model used a lower baseline than Baseball-Reference did, so the same performance would result in a higher WAR in our model than in theirs. Over very long careers — like Morris’, for instance, or many of the old time pitchers who threw forever — this could really begin to add up, and give the appearance of large disagreements when the two systems didn’t actually see things all that differently. In the case of guys with substantial careers, many of the large discrepancies were simply driven by the fact that the two sites had a different definition of replacement level.
Read the rest of this entry »