Jeff Passan is one of the most aggressive advocates for FanGraphs in the mainstream media, regularly citing data and concepts from our leaderboards and helping to educate the masses about different ways of viewing baseball. He’s certainly not an old-school guy who wants to be left alone with his pitcher wins and RBIs, and he’s more than happy to embrace new ideas supported by data. But he still has some problems with WAR, and specifically, the defensive component that can allow lesser hitters to be listed as among the most valuable players in the game alongside some of baseball’s greatest sluggers. To get an entire sense of his argument, read the whole piece, but here’s a selection that sums up his argument:
Defense does have its place in WAR. Just not in its present incarnation, not until we know more. Not until we can account for positioning on the field. Not until we can find out the exact speed a ball leaves a bat and how quickly the fielder gets a jump and the angle on the ball and the efficiency with which he reaches it. Not until we understand more about fielding, which will allow us to understand how to properly mete out value on a defensive play, which may take years, yes, but look how long it took us to get to this point, where we know more about hitting and pitching than anyone ever thought possible.
The hackneyed Luddites who bleat “WAR, what is it good for, absolutely nothing” should not see this as a sympathetic view. On the contrary, WAR is an incredible idea, an effort to democratize arguments over who was best. Bringing any form of objectivity to such singularly subjective statements is extremely challenging and worthwhile work.
Which is why this at very least warrants more of a conversation among those who are in charge of it. They’ve changed WAR formulas before. They’ll change them again. And when they do, hopefully the reach of defensive metrics will be minimized.
I don’t agree with everything Passan wrote in the piece, but his criticisms of the metric aren’t entirely off base. It is easier to evaluate run scoring than run prevention. WAR is flawed and an imperfect model. Some of the assumptions in the construction of the model may be entirely incorrect, and as we get more information, we may very well find that some of the conclusions that WAR suggested were incorrect, and maybe not by a small amount. Just as the statistical community is quick to highlight the problems with pitcher wins and RBIs, it is fair for Passan to highlight the problems with WAR, especially if the purpose of that discussion is to help improve the model.
So let’s talk about Passan’s suggestion to improve WAR. Primarily, he suggests lowering the value of defense in the calculation, perhaps by regressing a player’s calculated value by some degree. This isn’t the first time this has been suggested, and there are plenty of people who I respect who hold a similar opinion. It’s not a crazy suggestion, and it might even be a better alternative. But let’s work through the implication of that change so that we can evaluate the two methods side by side.
Right now, we hand out 1,000 WAR per 2,430 games — 30 teams each playing 162 games — and split it so that 570 of those 1,000 WAR go to position players, with the remaining 430 credited to the pitchers. This 57/43 split accounts for the fact that hitters are responsible the entirety of the half of the game that is scoring runs, and also some portion of the half of the game that is preventing runs. The fact that we’re giving position players 57% of the pie implies that we think run prevention is 86% pitching and 14% defense, but those numbers weren’t handed down on stone tablets and reasonable people could argue for a different proportioning between pitchers and fielders.
If, for instance, the defensive component of WAR was simply halved — suggesting that pitchers are 93% responsible for run prevention, with defenders making up just 7% of the pie — we’d have to take the credit for the runs prevented and move them from the position players to the pitchers, so instead of a 57/43 split, we’d have a 53.5/46.5 overall split between position players and pitchers in WAR. Perhaps that’s preferable, but is there evidence for a smaller gap between position players and pitchers than what we are currently using in WAR?
One way of testing this is to look at MLB’s actual payroll allocations. Back in February, Wendy Thurm helpfully broke down each team’s payroll, noting the totals and percentages that went to position players, starting pitchers, and relief pitchers. If we combine the totals for starters and relievers, and then combine line-up and bench, we can see what MLB teams have settled on as the position player/pitcher split, at least in terms of pay.
Overall, her numbers added up to just over $3.3 billion in total league expenditure on 2014 player salaries. Of that $3.3 billion, $1.9 billion was allocated to hitters and $1.4 billion was allocated to pitchers. The payroll split that teams have decided upon this year? 57/43, the same proportion we are currently using in WAR. While this is not any kind of definitive answer, we do not find evidence that the teams themselves are spending their money in a way that suggests that pitchers are closer in value to position players, which would be a necessary conclusion of constraining the defensive calculations.
And it’s not like the idea that position players are significantly more valuable than pitchers is a novel sabermetric concept. Beat writers have long argued that even the most elite pitchers are not strong MVP candidates because they don’t play everyday, and thus aren’t as valuable as a position player who both hits and fields every day for six months. If we think that constraining defensive value would improve WAR, we have to simultaneously argue that pitchers have been dramatically underrated (and underpaid) for quite some time, as the historical batter/pitcher split in payroll has persisted over the years.
That may be the correct position, but it is worth understanding that diminishing the value of defense in WAR means that we have to explain why teams are overvaluing position players and undervaluing pitchers when it comes to spending. Maybe they are, but I think it’s worth considering that we don’t have much in the way of evidence that teams buy into a smaller position player/pitcher split than what is currently modeled. The fact that the division of WAR matches the division of payroll isn’t a smoking gun, but it is at least a point that should make us pause before we consider whether or not the imperfections of WAR could be improved by moving value from the position players’ side of the ledger to the pitchers’ side.
In fact, there appears to be as much evidence that the 57/43 split underweights defensive value as there is that it overweights it. Robert Arthur noted the following a few weeks ago when he modeled the value of defensive metrics in a linear regression attempting to encapsulate ERA.
Just as with BP’s FRAA, the UZR-based dWAR of FanGraphs contributes some accuracy to our model of ERA. And, as with BP’s defensive metric, if any error is being committed, it’s that we are not weighting defense enough. For optimal accuracy, we should be accentuating the differences between players’ defensive statistics, not regressing them.
These results shouldn’t be entirely surprising. Defensive WAR is not a truth revealed from on high; it was designed (by very capable sabermetricians) with full knowledge of the fact that it improved our understanding of runs allowed. The coefficients which translate defensive play into runs weren’t chosen arbitrarily from a hat or a random number generator, but rather calibrated with at least some attention given to the resulting models’ ability to fit things like ERA. For this reason, we shouldn’t be surprised to find that our defensive metrics are well-suited to predicting ERA. Indeed, I would bet that the small error observed in both models (FG and BP), in which defensive metrics are perhaps slightly underutilized, is by design.
Considering this experiment, I don’t think that there exists any particular issue with the weighting of defensive WAR as a whole, despite Passan’s argument. There might be a problem with Alex Gordon’s dWAR in particular (or Adeiny Hechavarria’s, or whoever’s). Yet, the overall weighting of dWAR is reasonably accurate, or it would have been discarded for something different.
It is entirely correct to state that we lack the confidence in our defensive estimates that we have in our offensive estimates. However, that statement remains true even if we regress defensive metrics, and the reality may very well be that constraining the defensive component of WAR might very well make the metric less correct, not more so. Any total value metric, like WAR, will have to make some assumption about the value of a positional player’s defensive contribution. A smaller range of contributions might be more palatable, but may in fact be less correct.
Uncertainty goes both ways. There is not currently strong evidence that defensive metrics themselves are too aggressive in assigning 14% of the value of run prevention to position players. Just as the correct number might be 10% or 12%, it might also be 16% or 18%. I would suggest that there is more evidence in favor of something between 10-15% than there is for something between 5-10% (or 15-20%, for that matter), so we should at least be aware of the possibility that constraining defensive value would make WAR a worse model of player value, not a better one.
That said, this is not any kind of declaration that the 57/43 split is gospel and cannot be changed, or that we are not open to adjusting the defensive component of WAR in any way. We want the model to reflect the best possible use of the data we have, and of the public understanding of how baseball players should be valued. Certainly, there are flaws with the model that we would like to rectify, and we have had numerous discussions about how to implement changes that could address some of these issues.
Catcher framing, for instance, is something that will be a significant addition to WAR, and we have spent a lot of time thinking about the proper way to implement the very high values suggested by the framing metrics into WAR. We have not yet implemented it because we are not yet convinced that we have the best solution, so we have left the model incomplete and wildly incorrect for some players, while attempting to acknowledge that limitation along the way. The addition of framing — which probably isn’t too terribly far away at this point — will also have ramifications for how we think about pitcher WAR, as the acceptance of framing runs saved necessarily requires pitchers to receive altered credit for their contributions to walks and strikeouts at the least.
The reality is that disentangling run prevention is difficult and there is no magical solution that erases all of the models problems. Regressing the defensive component might be a worthwhile endeavor, and it’s something we have considered and will continue to consider. We understand that a lot of people prefer a runs allowed basis for pitching WAR, and have had many conversations about whether to alter the calculations that we use for pitchers. Passan’s critiques of the model might go too far, but he’s not wrong that the model has issues, and that there are areas to be improved upon.
We are not attempting to stand as gatekeepers of the current calculation, keeping better calculations out because this is what we have now. We’ve made a great number of changes to WAR over the years, from adding things like baserunning on non-stolen base plays for position players to crediting pitchers for their infield flies. Last year, we settled on a unified replacement level with Baseball Reference. The model is constantly being reviewed and updated, and our hope is that it continues to improve over time.
So I will put this question to our readers. Passan has suggested that WAR would be better if the defensive component was minimized, and that some of the credit for run prevention was moved from position players to pitchers. Do you agree? There’s a simple yes/no poll below, but I’m interested in hearing more in-depth responses as well. If you agree that a 57/43 split puts too much emphasis on position players contributions to run prevention, what is the batter/pitcher split that you would prefer? If you would prefer that we used a regressed version of a defensive component in WAR, how much would you regress the number, and to what mean would you regress it?
Our hope is that WAR is always as good a model as it can reasonably be. How would you make it better, considering the ramifications of the suggested changes? And can you show that changing the model would indeed make it better, and not just more palatable to our current perception of player value? If improvements to the model can be shown to be reasonable, they will be made. We are not ignorant of WARs imperfections, nor do we want them to continue any longer than need be. Our goal is to push the model forward, and we are open to suggestions on how to do just that.
Dave is the Managing Editor of FanGraphs.