Unifying Replacement Level by Dave Cameron March 28, 2013 On Christmas Eve of 2008, David Appelman gave the world a present – “win values” on the pages of FanGraphs. It wasn’t labeled WAR for a little while longer, though it was an implementation of the model Tom Tango laid out at The Book Blog a few months prior. Over these last four years, the model has become quite popular, and even those who are not fans of analytics know what WAR stands for. Over time, the model grew in popularity, and in 2010, Baseball Reference added it to their collection of statistics. Because WAR is essentially a model of player value, there are decisions that have to be made about the way it is constructed that don’t have an obviously correct answer. In places where we had made one decision, Sean Forman (and Sean Smith, who assisted with their original implementation) made some other decisions, and the calculations differ in some significant ways. We know that this is a source of frustration for some folks, having two sites both publicly display different calculations for a statistic of the same name. Often, the differences between the two have been used to discredit the entire model. For instance, Jim Caple wrote this on ESPN.com a few months back: Actually, we know it isn’t always accurate because depending on your source — FanGraphs or Baseball-reference.com — you can get wildly different WAR scores… For example: Does (Jack) Morris, in fact, belong in the Hall of Fame? No, he doesn’t, according to baseball-reference.com, which gives him a WAR score of 39.3, tied for 145th all time among pitchers. Maybe he does, according to FanGraphs, which gives him a 56.9 WAR, 75th all time. When Caple wrote it, I wasn’t exactly sure why Morris’ value differed so much, but since we measure pitching in very different ways, I assumed that the 17.6 win gap was due to some differences between Morris’ FIP and his runs allowed. But, then, I looked it up, and Morris’ career ERA (3.90) was almost an exact match for his FIP (3.94). Adjusted for park, Morris’ career FIP- was 97, while his ratio of RA9 to league average on Baseball-Reference is 96. Even with very different inputs, both models came to the same conclusion about Morris – he was a slightly above average pitcher who had a very long career. So, why did we give him credit for an additional 17.6 wins? The answer, quite simply, lies with replacement level. Our model used a lower baseline than Baseball-Reference did, so the same performance would result in a higher WAR in our model than in theirs. Over very long careers — like Morris’, for instance, or many of the old time pitchers who threw forever — this could really begin to add up, and give the appearance of large disagreements when the two systems didn’t actually see things all that differently. In the case of guys with substantial careers, many of the large discrepancies were simply driven by the fact that the two sites had a different definition of replacement level. After reading Caple’s article, David Appelman and I began discussing the idea of reaching out to Sean Forman and seeing if he was interested in agreeing to a unified replacement level. Before we could actually even send that email, Sean reached out to us with the exact same idea. And so, today, we’re pleased to announce that Baseball-Reference and FanGraphs have adopted that unified replacement level, allowing our two models to now measure players on the same scale. As David noted a few minutes ago, this new unified replacement level is now set at 1,000 WAR per 2,430 Major League games, which is the number of wins available in a 162 game season played by 30 teams. Or, an easier way to put it is that our new replacement level is now equal to a .294 winning percentage, which works out to 47.7 wins over a full season. Conveniently, this number is almost exactly halfway in between our previous replacement level (.265) and Baseball-Reference’s previous replacement level (.320), though the number wasn’t chosen solely as an equal compromise. In Tango’s original methodology post back in 2008, the model he laid out used a replacement level equal to 1,009 wins, or a .292 winning percentage, so this is in essence a return to WAR’s roots. In that post, Tango notes: Replacement is defined very specifically for my purposes: it’s the talent level for which you would pay the minimum salary on the open market, or for which you can obtain at minimal cost in a trade.” There are a variety of ways you can measure what kind of expected performance you might get from a replacement level player. A few months ago, I looked at the performance of position players who were acquired via minor league contract or waiver claim this winter, and over the last two seasons, those 24 players had accumulated almost exactly zero WAR in over 10,000 plate appearances. So, that suggests that the baseline has always been in the right neighborhood, at least. That’s not the only way to figure out where replacement level should be, however. We can also look at the worst performances of players who have long Major League careers, and see what the minimum level of production teams have required in order to keep a player in the league for an extended number of years rather than simply swapping them out for someone else. Major League teams don’t always evaluate talent perfectly, but if they were continually employing players that were below our established replacement level for 10 to 15 years, it would be a pretty good sign that our replacement level was too high, and that they couldn’t simply replace these guys with someone better with minimal effort. That’s not what we see, however. If you use .294 as the replacement level, 627 of the 628 players with at least 6,000 Major League plate appearances — that is, the equivalent of 10 full seasons of regular playing time — have a career WAR north of 0.0. The only player who falls below replacement level with this baseline is Alfredo Griffin, coming in at -1.0 WAR in 7,331 plate appearances, which works out to -0.08 WAR per full season. For all intents and purposes, that’s zero. You can calculate replacement level a number of different ways, but in the end, it always leads back to a number in this vicinity. Baseball-Reference arrived at a number a little higher than what Tango had used, while we came up with one a little lower. Because they were at opposite ends of the defensible spectrum, the different baselines gave a false sense of difference in the actual calculations. Now, with an agreed upon replacement level, those differences that are solely due to scale will go away. The net effect of this change is that players will get a little less WAR per season in our method (and a little more in B-R’s) than they used to. On an individual season level, you’re barely going to notice the shifts. For instance, Mike Trout’s career +10.8 WAR in 774 plate appearances under our old calculation will become +10.7 WAR with the new changes. However, at the completed career level, you’re going to see some bigger drops. Luis Aparicio, with his 11,230 career plate appearances, drops 14.2 WAR, going from +63.5 down to +49.3. Likewise, Hank Aaron, Brooks Robinson, and Carl Yastrzemski all lose 14 WAR off their career totals. Long career players take the largest hit, as you would expect. The higher baseline brings our scale down slightly, but we think that change is worth making, as a unified replacement level will allow for comparisons of our apples versus their apples, and will eliminate needless confusion based around an area that didn’t need to cause confusion. These changes weren’t made lightly, and we know that there is always some resistance to any sort of change, but we hope that you see the unification of replacement level between the two sites as a positive overall. While there will never be one single agreed upon WAR calculation — I’d call that a feature and not a bug, but that’s another post — the common baseline will give us a better opportunity to explore where the real differences are, rather than being tricked into seeing big gaps where none actually exist. So, that’s the short version of the story behind this change. We’ll have more on this going forward, including a post coming later this afternoon on why we need replacement level to begin with, but for now, we hope you guys see this as a step forward for WAR as a metric.