A Discussion About Improving WAR

by Dave Cameron

September 8, 2014

Jeff Passan is one of the most aggressive advocates for FanGraphs in the mainstream media, regularly citing data and concepts from our leaderboards and helping to educate the masses about different ways of viewing baseball. He’s certainly not an old-school guy who wants to be left alone with his pitcher wins and RBIs, and he’s more than happy to embrace new ideas supported by data. But he still has some problems with WAR, and specifically, the defensive component that can allow lesser hitters to be listed as among the most valuable players in the game alongside some of baseball’s greatest sluggers. To get an entire sense of his argument, read the whole piece, but here’s a selection that sums up his argument:

Defense does have its place in WAR. Just not in its present incarnation, not until we know more. Not until we can account for positioning on the field. Not until we can find out the exact speed a ball leaves a bat and how quickly the fielder gets a jump and the angle on the ball and the efficiency with which he reaches it. Not until we understand more about fielding, which will allow us to understand how to properly mete out value on a defensive play, which may take years, yes, but look how long it took us to get to this point, where we know more about hitting and pitching than anyone ever thought possible.

The hackneyed Luddites who bleat “WAR, what is it good for, absolutely nothing” should not see this as a sympathetic view. On the contrary, WAR is an incredible idea, an effort to democratize arguments over who was best. Bringing any form of objectivity to such singularly subjective statements is extremely challenging and worthwhile work.

Which is why this at very least warrants more of a conversation among those who are in charge of it. They’ve changed WAR formulas before. They’ll change them again. And when they do, hopefully the reach of defensive metrics will be minimized.

I don’t agree with everything Passan wrote in the piece, but his criticisms of the metric aren’t entirely off base. It is easier to evaluate run scoring than run prevention. WAR is flawed and an imperfect model. Some of the assumptions in the construction of the model may be entirely incorrect, and as we get more information, we may very well find that some of the conclusions that WAR suggested were incorrect, and maybe not by a small amount. Just as the statistical community is quick to highlight the problems with pitcher wins and RBIs, it is fair for Passan to highlight the problems with WAR, especially if the purpose of that discussion is to help improve the model.

So let’s talk about Passan’s suggestion to improve WAR. Primarily, he suggests lowering the value of defense in the calculation, perhaps by regressing a player’s calculated value by some degree. This isn’t the first time this has been suggested, and there are plenty of people who I respect who hold a similar opinion. It’s not a crazy suggestion, and it might even be a better alternative. But let’s work through the implication of that change so that we can evaluate the two methods side by side.

Right now, we hand out 1,000 WAR per 2,430 games — 30 teams each playing 162 games — and split it so that 570 of those 1,000 WAR go to position players, with the remaining 430 credited to the pitchers. This 57/43 split accounts for the fact that hitters are responsible the entirety of the half of the game that is scoring runs, and also some portion of the half of the game that is preventing runs. The fact that we’re giving position players 57% of the pie implies that we think run prevention is 86% pitching and 14% defense, but those numbers weren’t handed down on stone tablets and reasonable people could argue for a different proportioning between pitchers and fielders.

If, for instance, the defensive component of WAR was simply halved — suggesting that pitchers are 93% responsible for run prevention, with defenders making up just 7% of the pie — we’d have to take the credit for the runs prevented and move them from the position players to the pitchers, so instead of a 57/43 split, we’d have a 53.5/46.5 overall split between position players and pitchers in WAR. Perhaps that’s preferable, but is there evidence for a smaller gap between position players and pitchers than what we are currently using in WAR?

One way of testing this is to look at MLB’s actual payroll allocations. Back in February, Wendy Thurm helpfully broke down each team’s payroll, noting the totals and percentages that went to position players, starting pitchers, and relief pitchers. If we combine the totals for starters and relievers, and then combine line-up and bench, we can see what MLB teams have settled on as the position player/pitcher split, at least in terms of pay.

Overall, her numbers added up to just over $3.3 billion in total league expenditure on 2014 player salaries. Of that $3.3 billion, $1.9 billion was allocated to hitters and $1.4 billion was allocated to pitchers. The payroll split that teams have decided upon this year? 57/43, the same proportion we are currently using in WAR. While this is not any kind of definitive answer, we do not find evidence that the teams themselves are spending their money in a way that suggests that pitchers are closer in value to position players, which would be a necessary conclusion of constraining the defensive calculations.

And it’s not like the idea that position players are significantly more valuable than pitchers is a novel sabermetric concept. Beat writers have long argued that even the most elite pitchers are not strong MVP candidates because they don’t play everyday, and thus aren’t as valuable as a position player who both hits and fields every day for six months. If we think that constraining defensive value would improve WAR, we have to simultaneously argue that pitchers have been dramatically underrated (and underpaid) for quite some time, as the historical batter/pitcher split in payroll has persisted over the years.

That may be the correct position, but it is worth understanding that diminishing the value of defense in WAR means that we have to explain why teams are overvaluing position players and undervaluing pitchers when it comes to spending. Maybe they are, but I think it’s worth considering that we don’t have much in the way of evidence that teams buy into a smaller position player/pitcher split than what is currently modeled. The fact that the division of WAR matches the division of payroll isn’t a smoking gun, but it is at least a point that should make us pause before we consider whether or not the imperfections of WAR could be improved by moving value from the position players’ side of the ledger to the pitchers’ side.

In fact, there appears to be as much evidence that the 57/43 split underweights defensive value as there is that it overweights it. Robert Arthur noted the following a few weeks ago when he modeled the value of defensive metrics in a linear regression attempting to encapsulate ERA.

Just as with BP’s FRAA, the UZR-based dWAR of FanGraphs contributes some accuracy to our model of ERA. And, as with BP’s defensive metric, if any error is being committed, it’s that we are not weighting defense enough. For optimal accuracy, we should be accentuating the differences between players’ defensive statistics, not regressing them.

These results shouldn’t be entirely surprising. Defensive WAR is not a truth revealed from on high; it was designed (by very capable sabermetricians) with full knowledge of the fact that it improved our understanding of runs allowed. The coefficients which translate defensive play into runs weren’t chosen arbitrarily from a hat or a random number generator, but rather calibrated with at least some attention given to the resulting models’ ability to fit things like ERA. For this reason, we shouldn’t be surprised to find that our defensive metrics are well-suited to predicting ERA. Indeed, I would bet that the small error observed in both models (FG and BP), in which defensive metrics are perhaps slightly underutilized, is by design.

Considering this experiment, I don’t think that there exists any particular issue with the weighting of defensive WAR as a whole, despite Passan’s argument. There might be a problem with Alex Gordon’s dWAR in particular (or Adeiny Hechavarria’s, or whoever’s). Yet, the overall weighting of dWAR is reasonably accurate, or it would have been discarded for something different.

It is entirely correct to state that we lack the confidence in our defensive estimates that we have in our offensive estimates. However, that statement remains true even if we regress defensive metrics, and the reality may very well be that constraining the defensive component of WAR might very well make the metric less correct, not more so. Any total value metric, like WAR, will have to make some assumption about the value of a positional player’s defensive contribution. A smaller range of contributions might be more palatable, but may in fact be less correct.

Uncertainty goes both ways. There is not currently strong evidence that defensive metrics themselves are too aggressive in assigning 14% of the value of run prevention to position players. Just as the correct number might be 10% or 12%, it might also be 16% or 18%. I would suggest that there is more evidence in favor of something between 10-15% than there is for something between 5-10% (or 15-20%, for that matter), so we should at least be aware of the possibility that constraining defensive value would make WAR a worse model of player value, not a better one.

That said, this is not any kind of declaration that the 57/43 split is gospel and cannot be changed, or that we are not open to adjusting the defensive component of WAR in any way. We want the model to reflect the best possible use of the data we have, and of the public understanding of how baseball players should be valued. Certainly, there are flaws with the model that we would like to rectify, and we have had numerous discussions about how to implement changes that could address some of these issues.

Catcher framing, for instance, is something that will be a significant addition to WAR, and we have spent a lot of time thinking about the proper way to implement the very high values suggested by the framing metrics into WAR. We have not yet implemented it because we are not yet convinced that we have the best solution, so we have left the model incomplete and wildly incorrect for some players, while attempting to acknowledge that limitation along the way. The addition of framing — which probably isn’t too terribly far away at this point — will also have ramifications for how we think about pitcher WAR, as the acceptance of framing runs saved necessarily requires pitchers to receive altered credit for their contributions to walks and strikeouts at the least.

The reality is that disentangling run prevention is difficult and there is no magical solution that erases all of the models problems. Regressing the defensive component might be a worthwhile endeavor, and it’s something we have considered and will continue to consider. We understand that a lot of people prefer a runs allowed basis for pitching WAR, and have had many conversations about whether to alter the calculations that we use for pitchers. Passan’s critiques of the model might go too far, but he’s not wrong that the model has issues, and that there are areas to be improved upon.

We are not attempting to stand as gatekeepers of the current calculation, keeping better calculations out because this is what we have now. We’ve made a great number of changes to WAR over the years, from adding things like baserunning on non-stolen base plays for position players to crediting pitchers for their infield flies. Last year, we settled on a unified replacement level with Baseball Reference. The model is constantly being reviewed and updated, and our hope is that it continues to improve over time.

So I will put this question to our readers. Passan has suggested that WAR would be better if the defensive component was minimized, and that some of the credit for run prevention was moved from position players to pitchers. Do you agree? There’s a simple yes/no poll below, but I’m interested in hearing more in-depth responses as well. If you agree that a 57/43 split puts too much emphasis on position players contributions to run prevention, what is the batter/pitcher split that you would prefer? If you would prefer that we used a regressed version of a defensive component in WAR, how much would you regress the number, and to what mean would you regress it?

Our hope is that WAR is always as good a model as it can reasonably be. How would you make it better, considering the ramifications of the suggested changes? And can you show that changing the model would indeed make it better, and not just more palatable to our current perception of player value? If improvements to the model can be shown to be reasonable, they will be made. We are not ignorant of WARs imperfections, nor do we want them to continue any longer than need be. Our goal is to push the model forward, and we are open to suggestions on how to do just that.

Dan Szymborski FanGraphs Chat – 9/8/14

FG on Fox: Sam Fuld on In-Game Corner Outfield Switching

Dave is the Managing Editor of FanGraphs.

197 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Carl

10 years ago

I don’t understand why regressing the defensive component of WAR necessarily involves changing the hitter/pitcher WAR balance. Couldn’t you regress defensive performances to the average while leaving the total amount of WAR unchanged?

TKDC

10 years ago

Reply to Carl

This is exactly what I was going to say. However, I’d only be in favor of this if it were done due to the belief that the metric is likely wrong for outliers. How exactly should it be regressed? Dan Uggla graded out as an above average defensive player in 2013 despite three consecutive years as below average, and getting way past his prime. Is that 6.5 number regressed the same as Leonys Martin’s 6.9 number this year? Are we just minimizing the differences between the best and worst outcomes? Aside from aesthetics, is there a good reason to do so?

Tangotiger

10 years ago

Reply to Carl

If you regress ALL fielding performance to the average, then you are evaluating those players ONLY on their offense.

How can you then allocate the 57/43 split to nonpitcher/pitchers, if the nonpitchers absorb all the offense and the pitchers absorb all the defense?

You can decide on using only the positional adjustment (so all SS get the same +7.5 run credit). While that will move you away from a 50/50 split, it still won’t move you enough.

Alternatively, you can do a “team fielding”, so instead of 57/43, you do 50/7/43, and just not allocate any fielding (other than positional adjustment).

There’s alot to think about here.

Roger

10 years ago

Reply to Tangotiger

Perhaps the positional adjustment should be a larger part of the 14% positon players’ contribution to run prevention, and UZR less so. This would reduce the role of defensive measurement in WAR until it becomes more accurate, without reducing the overall value of defense.

Jianadaren

10 years ago

Reply to Roger

That’s exactly what regressing UZR to the mean would do. UZR would approach 0 and the positional adjustment would approach 100% of the 14%.

ReuschelCakes

10 years ago

Reply to Roger

One of Dave’s points that is lost is that there is no evidence that these defensive values should be regressed – it is only our *suspicion* of the data that makes us want to…

Said differently, no one questions an outlier offensive year like Chris Davis’ 52 batting runs / 6.8 WAR in 2013. Because we can observe his discrete outcomes, his 0.348 ISO, his 0.370 OBP, his 29.6% HR/FB, etc… we KNOW that this season is an outlier… but we also KNOW that he really did hit 53 HRs and 42 2Bs…

For dWAR we do not as explicitly/intuitively KNOW the latter and therefore ASSUME that the outliers are measurement-based rather than outcome-based…

munchtime

10 years ago

Reply to Tangotiger

If you take a defensive metric (UZR, for example) and regress it to the mean before incorporating it into WAR, you certainly do not “evaluate players only on their offense”. It would create a smaller range of values, but there would still be a range of values.

What you are describing is replacing defensive metrics with a constant value for everyone. I haven’t seen anyone suggest doing that.

-1

Jianadaren

10 years ago

Reply to munchtime

What he meant is that as you regress a defensive metric to the mean (0 WAR) you approach “evaluat[ing] players only on their [oWAR]” – i.e. offense and positional adjustment only – because the defensive metric component will approach zero.

The 7% defense component would be made more and more out of pure positional adjustment and less and less out of the metric.

Cool Lester Smooth

10 years ago

Reply to Tangotiger

But what if you give +7.5 to a guy who gets a +15 UZR, and a -7.5 to a guy who gets a -15 UZR. Wouldn’t that eliminate the problem you’re describing?

-1

Cool Lester Smooth

10 years ago

Reply to Cool Lester Smooth

(I’m not saying that’s the correct amount of regression, to be clear)

Catoblepas

10 years ago

Reply to Cool Lester Smooth

Nope! Consider a hypothetical four-player league, with two pitchers and two hitters. 100 WAR is earned, split 57/43, with 50 going to hitters for hitting, 43 going to pitchers, and 7 going to hitters for fielding. Hitter A gets 28 offensive wins and 0 defensive wins, while Hitter B gets 22 offensive wins and 7 defensive wins. Hitter B is just barely worth more than Hitter A, 29 to 28 overall.
We decide that that’s way too wide a range of defensive value, and it needs to be regressed. Hitter A is regressed toward the mean, and Hitter B away, so that the new defensive values are 2 and 5, respectively. Now, we didn’t change the amount of fielding value given out, but without any change in performance, we now see Hitter A as more valuable, at 30 wins, vs. Hitter B at 27. Offense has become more important, since the range of defensive values has narrowed and consequently its impact on our evaluation of the players.
This is a super simplified example obviously, but as far as I understand it the same forces are at play in the league, so hopefully it helps.

Yirmiyahu

10 years ago

Reply to Carl

Agreed. I don’t think the primary argument here is that WAR overvalues defense; it’s that UZR does a pretty clumsy job of measuring defense. It’s not unreasonable to halve the fielding runs that go into WAR, but that’s not because defense is less of a component; it’s because the numbers themselves are untrustworthy.

I think the more logical (and maybe more common) suggestion is to use multi-year and/or regressed fielding runs in the defense portion of things. I know Dave will counter with the argument that then we’re describing true talent rather than what actually *happened*, but I’ve never cared for that argument. And if you understand how UZR works, it’s not measuring what actually happened anyway (many plays are entirely thrown out of the data).

Cool Lester Smooth

10 years ago

Reply to Yirmiyahu

Yeah, I don’t have a problem with single-year WAR totals. The defensive numbers just have to be vocally taken with a massive grain of salt when we only have one year of data.

So, when talking Gordon and Trout (for instance) you say “WAR grades them out similarly, but seeing as how the defensive numbers for each player are massive outliers, we should probably err on the side of Trout,” rather than “well, this just shows that the mainstream isn’t valuing defense highly enough.”

We don’t necessarily do that right now, and it’s a problem.

Blue Wonder

10 years ago

Reply to Cool Lester Smooth

It’s Brian McCann Guy. Right on.

Cool Lester Smooth

10 years ago

Reply to Cool Lester Smooth

Huh, it would be nice if there were also losers with way too much time on their hands who obsessively cataloged every time I was right, you know?

Besides being wrong about McCann, I’ve also been yelled at for suggesting that Martin Prado and Randall Delgado was an absurd underpay for a player of Upton’s caliber. And I was a huge booster of Denard Span and Jonathan Lucroy heading into this season, and Matt Carpenter heading into last year, but no one likes to talk about that.

Dave CameronMember since 2018

10 years ago

Reply to Carl

We know the run value of different events; the question is how to distribute the credit for the change in game state. Say, for instance, there’s a fly ball to the gap. If it falls, it’s very likely to be a double (~0.7 runs), but the value of an out is -0.3 runs, so the difference between the ball landing or being caught is a full run. Defensive metrics like UZR and DRS estimate the odds of that play being made — say, 10% — and give the remaining credit to the fielder, so if that play had a 10% probability of being made, the run value of the catch would be ~0.9. That would get credited to the fielder for making the catch.

But now we just cut all these run values in half in a regressed defensive component of WAR, so instead of ~0.9 runs, he gets ~0.45 runs. The other 0.45 runs saved have to go somewhere, and since we’re suggesting that they don’t go to the fielder, they have to go to the pitcher.

Essentially, regressing the defensive component would say that the pitcher is more involved in turning balls in play into outs than we currently believe, and we’d have to give him the remaining portion of the defensive value of the run saved that we’re taking away from the fielder.

dkdc

10 years ago

Reply to Dave Cameron

It also assigns a negative value for plays not made, so by regressing both, you will not have any impact on the overall WAR split between pitchers and hitters.

The total UZR/DRS for the league is zero. I don’t think anyone is advocating changing that.

Dave CameronMember since 2018

10 years ago

Reply to dkdc

The total UZR/DRS for the league is zero. I don’t think anyone is advocating changing that.

The average of all players defensive value relative to each other is zero, but the contribution of overall defense to run prevention is not zero. These are not the same thing.

Dkdc

10 years ago

Reply to dkdc

Dave, I think you are just mis-interpreting the criticism of the UZR/DRS component of WAR.

My criticism, and I think many others, is that many of the extreme +/- values of individual UZR/DRS are not credible, and those are skewing WARs for the affected players.

So regressing all UZR/DRS by some amount to average WOULD lessen the impact that an individual’s UZR/DRS has on their WAR, which is ultimately what many people want.

Eminor3rdMember since 2019

10 years ago

Reply to dkdc

But what is the evidence that the tails of the UZR/DRS values actually aren’t credible?

BipMember since 2016

10 years ago

Reply to dkdc

But by keeping the average 0, you are not changing the overall contribution of defense to run prevention, which is the point.

As someone below says, if we take 5 runs away from Alex Gordon and give 5 runs to Matt Kemp, that is regressing the defensive model without changing the overall value assigned to defense.

BipMember since 2016

10 years ago

Reply to dkdc

@Eminor3rd

For any model that involves measurement error, the results found at the extreme ends of a scale are more likely to have gotten there in part because the measurement error pushed them towards the extreme end.

For example, let’s say there are two fielders who are both the best in league, with 20-run true talent. One benefits from measurement error, and rates as 25 runs, and the other is hurt, and rates at 15 runs. We don’t automatically know that the 15-run guy was hurt, but we can guess that the 25-run guy benefitted from error, and be right most of the time.

Zach

10 years ago

Reply to Dave Cameron

But aren’t systems like UZR somewhat vulnerable to positioning/shifting variables (though I know it doesn’t count plays where an infield shift is on)? If in your given example the centerfielder is positioned five extra steps into the gap (because of scouting/analysis of the batter), that 10% chance is really much, much higher. It’s something the system can’t pick out (at least until we have real-time location tracking), but I kind of like Tango’s idea of assigning some of the credit to the team defense, and not to individual players.

a eskpert

10 years ago

Reply to Zach

But then you have the problem of assigning credit for non-shift positioning and positioning within a shift. Does that go to the team? I have difficulty believing that unless players on a team are more less positioned uniformly well, that the team had anything to do with it.

-1

Yirmiyahu

10 years ago

Reply to Dave Cameron

This is where I think a lot of smart people disagree with you, Dave. If the goal of WAR is to measure a player’s total value, and we understand that all of our measuring devices are imperfect to various degrees…. why do all the numbers need to zero out perfectly on the balance sheet? Just because we’re not giving 100% fielding credit to a position player doesn’t mean we need to give the credit to the pitcher. Likewise, what’s wrong with using a multi-year sample on the fielding component?

BipMember since 2016

10 years ago

Reply to Yirmiyahu

I see what you’re saying, but I think you’re somewhat falling for the trap of conflating a measurement of value with a measurement of talent. Using a multi-year defensive rating makes sense if what you want to measure is talent, but if you just want to know what sort of value a player contributed in a year, and he just had an unusually bad year in the field, then you want to know that he was bad that year, not necessarily that he underperformed his talent.

You can make the same argument for hitting. Using multi-year samples will be better at approximating a player’s true talent level, but it doesn’t isolate that player’s single-season performance, and sometimes that is what you’re looking for.

Yirmiyahu

10 years ago

Reply to Yirmiyahu

What I’m saying is that UZR is so clumsy at measuring *value* in a single season sample that a regressed measure of *talent* is probably a more accurate measure of *value* than trying to use single season UZR.

I’d love for that to change with better data and a better metric, but the fact is that the data being put in is very rough, a lot of plays are thrown out, a lot of factors are ignored, and the required sample size is too large.

BipMember since 2016

10 years ago

Reply to Yirmiyahu

That makes sense. I don’t think I agree, but you may be right.

AK7007

10 years ago

Reply to Yirmiyahu

“what’s wrong with using a multi-year sample on the fielding component?”

Value right now is accrued based on the yes/no answer to the question “was a ball converted into an out?” on an individual batted ball basis. How can you make this into a multi-year thing? You just haven’t given thought into how fielding components are calculated.

Pirates Hurdles

10 years ago

Reply to Yirmiyahu

What is the evidence for this statement? – “UZR is so clumsy at measuring *value* in a single season sample”

We will agree that it is poor at measuring talent in a single season sample, but why is it so hard top accept variation in defensive performance (value) from year to year? If Miggy can post seasons of wRC+ 129 and 192, an offensive run variance of 40, why cant players defensive value vary by 20 runs in different seasons?

Luke

10 years ago

Reply to Yirmiyahu

“What I’m saying is that UZR is so clumsy at measuring *value* in a single season sample that a regressed measure of *talent* is probably a more accurate measure of *value* than trying to use single season UZR.”

This is what I don’t agree with. We KNOW that offensive metrics do not measure *talent,* but instead measure *value.* We all know that things like BABIP and HR/FB need to be regressed to find out true talent, and a hitter can have an enormously valuable season that is well above his true talent. However, it’s wRC+ that ends up feeding into WAR. Why should fielding metrics be any different?

UZR is not perfect, but I don’t think the high variance in a single season is problem. It makes sense that this variance is mostly caused by inequality of opportunity and the inherent limited sample size for individual fielders. A fielder might get 5 chances all season to make a truly enormous, run-saving play, and if he happens to make all 5 of those plays, it’s going to skew his UZR high. This is not a problem with the metric; it is simply measuring *what actually happened.*

Cool Lester Smooth

10 years ago

Reply to Yirmiyahu

The issue is where the “high variance in a single season” in each category comes from.

We know the exact result of every offensive play. There is no ambiguity.

That is simply not true for our defensive data, and all the “Oh, look how much variance Miggy’s had this season” claims don’t address the very simple fact that the underlying data of UZR is inherently suspect. Input error is entirely possible in the UZR model, and it can completely skew results. The reliability of each metric is not comparable, and defending UZR variance by saying “If Miggy can post seasons of wRC+ 129 and 192, an offensive run variance of 40, why cant players defensive value vary by 20 runs in different seasons?” suggests an equivalency between the data sets that we unequivocally know to be false.

bstar

10 years ago

Reply to Yirmiyahu

Multi-year regressions don’t really answer the question being asked, which is, “How many runs did this fielder save his team THIS YEAR?”

You say you hate the true-talent argument, but that’s exactly what a multi-year regression is giving us.

We don’t want a true-talent estimate. We want that fielder’s performance in that specific year. We want to know how many runs he saved from his great plays minus the runs lost from his misplays.

UZR and DRS are currently our best answers to that question.

Cool Lester Smooth

10 years ago

Reply to Yirmiyahu

Yirmiyahu’s point is that a single season of UZR data is so bad at telling us “how many runs did a fielder save his team THIS year?” that a regressed, true talent number is probably closer to answering that question in any single year.

Brian

10 years ago

Reply to Dave Cameron

This is a great comment – perhaps it should be reiterated in the footer of the article.

Carl

10 years ago

Reply to Dave Cameron

If you accept the hypothesis that year-to-year fluctuations in a player’s UZR are due largely to chance and imperfect metrics, then you might regress each player’s UZR to the mean rather than changing the run values of individual events. That way, you don’t have to redistribute WAR from fielders to pitchers; instead you redistribute WAR from players with positive UZR, who are likely to have benefited from stochastic variation, to players with negative UZR, who are likely to have had stochastic variation work against them.

cass

10 years ago

Reply to Carl

“If you accept the hypothesis that year-to-year fluctuations in a player’s UZR are due largely to chance and imperfect metrics”

Why would you? Why would fielding performance be assumed to be the same from year to year when pitching and hitting vary wildly? As someone who watches baseball and see fielders have good years and bad years, I just don’t understand this idea.

Defense slumps. It just does. Can you really watch Mike Trout in 2012 and Mike Trout in 2014 and not see that he’s not providing the same defensive value this year? Or watch Ian Desmond in April and Ian Desmond the rest of the year and realize he had a pretty bad spell with the glove early in the year?

BipMember since 2016

10 years ago

Reply to Carl

@cass

You’re absolutely right. The problem is that we know there is some measurement error attached to defense in addition to natural player variation. With defensive ratings, we observe a play and want to know if an average fielder would have made the play, but we don’t actually know for sure if an average fielder would have made that play. If a guy fails to make the play, it could be because he’s in a defensive slump, but we also have to question whether he actually did a poor job at all, or whether the play wasn’t really as makeable as we thought.

10 years ago

Reply to Carl

@cass, also sample sizes, one season of UZR is like using a 200 PA sample to decide what kind of season a player had. For teams that shift, the sample is even smaller because shifted plays are removed.

Pirates Hurdles

10 years ago

Reply to Carl

But, isn’t sample size irrelevant when trying to calculate a players value in a single season. We aren’t using UZR to assess true talent in this case, but rather to measure seasonal contribution to wins. WAR is a measure of what did happen and is not predictive.

Cool Lester Smooth

10 years ago

Reply to Carl

@cass, we would do that because we know UZR to be an imperfect and unreliable metric. It’s not a philosophical question. We know that the data we use for UZR is prone to error, and that the model itself is highly flawed (albeit the definitely best we currently have).

Recognizing and accounting for the flaws in a model is not the same thing as denying the existence of performance variance in fielding.

Matt P

10 years ago

Reply to Dave Cameron

“But now we just cut all these run values in half in a regressed defensive component of WAR, so instead of ~0.9 runs, he gets ~0.45 runs. The other 0.45 runs saved have to go somewhere, and since we’re suggesting that they don’t go to the fielder, they have to go to the pitcher.”

I don’t understand. Ten balls are hit into the gap. There’s a 10% chance of them being caught. Therefore, 9 are caught and 1 isn’t caught.

Currently, the ball that is caught is worth .9 runs and the nine balls that aren’t caught are worth -.9 runs (-.1 runs each).

If you regress the defensive component by half then the ball that is caught is worth .45 runs and the nine balls that aren’t caught are worth -.45 runs (-.05 runs each).

Each way it averages out to zero.

Andy

10 years ago

Reply to Matt P

If you give the fielder only .45 run credit for making that play, you are saying that a ball hit into the gap for a double only contributes 0.50 run, rather than 1.0 run (0.7 + 0.3 for not making an out). But you can’t do that. The run value of an offensive event is more or less fixed; it can vary slightly from season to season, determined by events, but it can’t be changed arbitrarily. We know that a double has a net run value of 1.0 or whatever because a large body of historical data shows us this.

If you did change run values in this way, things wouldn’t add up. The total runs created by players during a season would not equal the total number of runs actually scored.

Matt P

10 years ago

Reply to Matt P

I agree it wouldn’t be logically consistent if you graded offensive events according to one scale and defensive events according to a different scale.

Danny

10 years ago

Reply to Matt P

What I don’t understand is, we are fixated on making sure everything adds up (a ball caught that would have otherwise been a double being 1 run instead of .5 runs for example) but UZR doesn’t count plays where the defense is shifted. If we are not using those plays, then doesn’t that throw the balance off itself?

Justin

10 years ago

Reply to Dave Cameron

So why aren’t we using these same defensive run values to evaluate pitchers? What you’ve described says that a pitcher should be given credit for having given up .7*.9 – .3*.1 = .6 runs. Instead we’re almost completely ignoring the pitcher’s credit side of that double (or catch). What gives?

Also as others have said, your conclusion about regressing defensive numbers is completely wrong. The overall fielder/pitcher credit wouldn’t change at all and you’d still have the 57/43 split.

Bats Left, Throws Right

10 years ago

Reply to Justin

We ARE completely ignoring the pitcher’s credit for the ball in play. Pitcher fWAR doesn’t consider balls in play.

Cool Lester Smooth

10 years ago

Reply to Justin

rWAR tries to do that. fWAR subscribes to DIPS theory.

Miffleball

10 years ago

Reply to Dave Cameron

While I agree that we understand the value of a fly ball in the gap, what many have argued is that we don’t know what a fly ball in the gap actually is and until we do we need to consider our valuations

Jianadaren

10 years ago

Reply to Dave Cameron

“Defensive metrics like UZR and DRS estimate the odds of that play being made — say, 10% — and give the remaining credit to the fielder, so if that play had a 10% probability of being made, the run value of the catch would be ~0.9. That would get credited to the fielder for making the catch.”

OK, I would follow you if the remaining 10% automatically would automatically to the pitcher, but my understanding is it doesn’t work like that. The pitcher doesn’t get 0.1 runs saved for that catch. The pitcher just gets .1 IP and the run value depends on the rest of the FIP function. If it were a regular 99% probability catch, the pitcher would get .1 IP just the same.

The remaining 0.1 runs from that catch just went into the ether, presumably to resurface in one of the global adjustments. And if you were to regress the defensive component such that there were and extra .45 in the ether, that would be fine because you know how handle that

10 years ago

Reply to Dave Cameron

In your example, the defender gets +0.9 runs saved for making the catch. The hitter, OTOH, gets -0.3, because he made an out. Mind you, the hitter didn’t make an out; the fielder made the out. The hitter put a BIP that had an expected run value of +0.6 (90% x +0.7 + 10% x -0.3). After he put it in play, the hitter did nothing else to affect the outcome.

Put another way, the outcome of an out is a -0.3 to the offensive team. That should be apportioned as (+0.6 hitter) + (-0.9 fielder). Otherwise we are giving credit to the fielder for taking away value we deny to the hitter.

There is a bias to these denials of value. On plays that go as expected – likely hits that become hits, likely outs that become outs – credit is fairly apportioned. On likely outs that become hits, the official scorer flags it as an error, and the hitter doesn’t get credit, which is deserved. On likely hits that become outs, the hitter gets no credit, which is totally predicated on the fielder succeeding beyond expectations. In this case, the hitter isn’t credited with the value of the BIP; rather he’s blamed for the negative outcome. The net result is that offensive value stats are suppressed.

IOW, yes, defensive value stats are too high relative to offensive value stats. But that doesn’t mean the problem is on defensive value stats. It just means they’re not on the same scale, because offensive value stats ignore hitter value that defensive stats consider. Regressing defensive stats is one way to get there, but reworking offensive stats is the more appropriate way IMO.

EthanBMember since 2020

10 years ago

Reply to Carl

Yeah, this one was of my thoughts. For every Alex Gordon who might lose 5 runs of value, wouldn’t there be a Matt Kemp who picks up 5, thus leaving the total position player WAR unchanged?

Juicy-Bones Phil

10 years ago

Reply to Carl

Instead of devaluing defense, couldn’t you increase the value of offensive actions? I’m sure there is a logarithm that would keep the values distinct but still close enough to not inflate pitching numbers.

-1

Andy

10 years ago

Reply to Juicy-Bones Phil

It’s more likely to work the other way. If you increase the value of offensive events, you increase the value of defense. E.g., if that double is worth 1.5 net runs instead of 1.0, the player who prevents it gets credit for 1.35 runs and instead of 0.9 run.

But the larger problem is that you can’t arbitrarily change the value of offensive events. Their value is determined by their relationship to runs scored, the latter, of course, being a known, measurable quantity.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG