Last Year’s WAR with True-Talent Defense

I’ll begin by saying I’m not sure what value all of what’s to follow actually has. I know that’s about the least-compelling way to begin a blog post, but I just want that to be very clear. TangoTiger’s recent Building a Better WAR Metric series on the site jumpstarted an idea I’d been kicking around in my head for a while. It’s an idea that mostly exists because I’ve seen people on the internet say they’d want to see something like it, and, at the very least, it could serve as a talking point for another constructive discussion about WAR, and any constructive discussion about WAR is a good thing, because we all admit it’s far from perfect and constructive discussions usher progress.

People don’t really have beef with the offensive side of WAR, I don’t think. As far as sabermetric stats go, wRC+, and therefore wRAA, are about as infallible as they come. Tough to argue with the outcomes of history. I don’t see too many quibbles with the base-running numbers, partially because I think most people think they do a good job, but also because they don’t move the needle much either way, and there’s bigger fish to fry. Some people aren’t fans of the positional adjustments — both the assigned weights, and the entire concept of including them. I’m in the camp that firmly believes in the idea of the positional adjustment, but, like anything, the formula for the weights could always be looked at to see if it could be improved in any way, and Jeff Zimmerman’s work on this topic last offseason was a great place to start.

But the bigger beef, beyond the positional adjustments, is of course defense. Anyone will admit this is the weak link of WAR. It’s probably the weakest link of sabermetrics, as a whole, in 2016. And mostly, what it boils down to is, we know that defensive metrics don’t stabilize until the sample spans roughly three years, or 3,000-ish innings. Meaning, the single-season data is subject to noise, and if we wanted to draw any conclusions from it, we’d have to regress it. Despite that, single-season WAR is powered by noisy, unregressed, single-season defensive metrics. That’s the crux of the beef with WAR.

So, some folks have suggested that the defensive component of WAR ought to be regressed in some way, in an effort to strip out some of the noise that comes with single-year defensive data, or to better capture a defender’s true performance. I think there are a number of flaws in this general line of thinking, but there are a number of flaws with the way it’s being done now, too, so let’s humor one another.

Both ZiPS and Steamer use multiple years of data, giving more weight to the most recent seasons. Multiple years of data to weed out noise? Check. Both also incorporate some form of “scouting” information: Steamer regresses toward the results of the Fans Scouting Report, ZiPS searches for keywords in actual, physical scouting reports and uses those as a means for regression. Eye test? Check. Blend all that together and factor in some aging curves, and you’ve got yourself as good an idea of any player’s true-talent defensive ability as you’re going to find. Sort the Fld column here and I think you’ll agree that these numbers pass the eye test with flying colors.

So let’s imagine a world where, last year, every player performed exactly to their true-talent defensive ability. Everyone hit the same, everyone ran the same, everyone had the same amount of playing time, but defensively, we knew exactly what everyone’s true-talent ability was worth, and no one varied from it.

In that world, last year’s WAR leaderboard, in sortable, scrolling fashion, would look like this:

Last Year’s WAR with True-Talent Defense
Name TT_WAR 2015_WAR TT_RANK 2015_RANK RANK_DIF WAR_DIF DIF_stDEV
Bryce Harper 9.9 9.5 1 1 0 0.4 0.8
Mike Trout 9.1 9 2 2 0 0.1 0.1
Josh Donaldson 8.8 8.7 3 3 0 0.1 0.2
Paul Goldschmidt 7.6 7.4 4 4 0 0.2 0.4
Joey Votto 7.6 7.4 5 5 0 0.2 0.3
Manny Machado 7.5 6.8 6 6 0 0.7 1.3
A.J. Pollock 6.7 6.6 7 8 1 0.1 0.2
Kris Bryant 6.3 6.5 8 10 2 -0.2 -0.4
Lorenzo Cain 6.1 6.6 9 9 0 -0.5 -0.9
Anthony Rizzo 5.9 5.5 10 15 5 0.4 0.7
Buster Posey 5.8 5.7 11 13 2 0.1 0.2
Andrew McCutchen 5.7 5.8 12 12 0 -0.1 -0.2
Mookie Betts 5.6 4.8 13 22 9 0.8 1.4
Yoenis Cespedes 5.5 6.7 14 7 -7 -1.2 -2.1
Nelson Cruz 5.4 4.8 15 23 8 0.6 1.1
Chris Davis 5.2 5.6 16 14 -2 -0.4 -0.7
Jose Bautista 5.2 4.5 17 27 10 0.7 1.2
Matt Carpenter 5.2 5.2 18 17 -1 0.0 0.0
Nolan Arenado 5.1 4.5 19 28 9 0.6 1.1
Jason Heyward 4.7 6 20 11 -9 -1.3 -2.3
Matt Duffy 4.7 4.9 21 21 0 -0.2 -0.3
Curtis Granderson 4.5 5.1 22 19 -3 -0.6 -1.0
Miguel Cabrera 4.4 4.3 23 31 8 0.1 0.1
Jason Kipnis 4.3 5.2 24 18 -6 -0.9 -1.5
Edwin Encarnacion 4.3 4.5 25 29 4 -0.2 -0.3
Adam Eaton 4.3 3.6 26 46 20 0.7 1.2
Ian Kinsler 4.1 4.2 27 36 9 -0.1 -0.1
J.D. Martinez 4.1 5 28 20 -8 -0.9 -1.5
Adrian Beltre 4.1 4.6 29 25 -4 -0.5 -0.9
Dee Gordon 4.1 4.6 30 26 -4 -0.5 -0.9
Kevin Kiermaier 4.1 5.5 31 16 -15 -1.4 -2.5
Brandon Crawford 4.1 4.7 32 24 -8 -0.6 -1.1
Brandon Belt 4.0 4.3 33 32 -1 -0.3 -0.5
Todd Frazier 4.0 4.4 34 30 -4 -0.4 -0.7
Michael Brantley 4.0 3.8 35 41 6 0.2 0.3
Evan Longoria 4.0 4.2 36 37 1 -0.2 -0.4
Xander Bogaerts 3.9 4.3 37 33 -4 -0.4 -0.6
Mike Moustakas 3.9 3.8 38 42 4 0.1 0.2
Francisco Cervelli 3.9 3.8 39 43 4 0.1 0.1
Kyle Seager 3.8 3.9 40 39 -1 -0.1 -0.1
Logan Forsythe 3.8 4.1 41 38 -3 -0.3 -0.5
David Peralta 3.8 3.7 42 45 3 0.1 0.1
Josh Reddick 3.7 3 43 61 18 0.7 1.3
Eric Hosmer 3.7 3.5 44 50 6 0.2 0.3
Russell Martin 3.7 3.5 45 51 6 0.2 0.3
Starling Marte 3.7 3.6 46 47 1 0.1 0.1
Jose Altuve 3.6 4.3 47 34 -13 -0.7 -1.3
Kevin Pillar 3.4 4.3 48 35 -13 -0.9 -1.6
Adrian Gonzalez 3.4 3 49 62 13 0.4 0.7
Brett Gardner 3.3 2.6 50 70 20 0.7 1.3
Justin Upton 3.2 3.6 51 48 -3 -0.4 -0.6
Joc Pederson 3.2 2.8 52 66 14 0.4 0.8
Brian Dozier 3.2 3.4 53 53 0 -0.2 -0.4
Christian Yelich 3.2 2.3 54 79 25 0.9 1.5
Lucas Duda 3.2 3.1 55 58 3 0.1 0.1
Ryan Braun 3.2 2.8 56 67 11 0.4 0.6
Shin-Soo Choo 3.2 3.5 57 52 -5 -0.3 -0.6
Odubel Herrera 3.1 3.9 58 40 -18 -0.8 -1.4
Jose Abreu 3.1 3 59 63 4 0.1 0.1
Kole Calhoun 3.0 3.8 60 44 -16 -0.8 -1.4
Andrelton Simmons 3.0 3.2 61 56 -5 -0.2 -0.4
Ben Zobrist 3.0 2.1 62 88 26 0.9 1.6
Brian McCann 2.9 2.9 63 64 1 0.0 0.0
Adam Jones 2.9 3.6 64 49 -15 -0.7 -1.3
Brandon Phillips 2.9 2.6 65 71 6 0.3 0.5
David Ortiz 2.8 2.8 66 68 2 0.0 0.0
Billy Burns 2.8 2.3 67 80 13 0.5 0.9
Robinson Cano 2.8 2.1 68 89 21 0.7 1.2
Alex Rodriguez 2.7 2.7 69 69 0 0.0 0.0
Gerardo Parra 2.7 0.4 70 121 51 2.3 4.0
Yunel Escobar 2.6 2.1 71 90 19 0.5 0.9
Kolten Wong 2.6 2.3 72 81 9 0.3 0.5
Neil Walker 2.6 2.4 73 74 1 0.2 0.3
Didi Gregorius 2.5 3.1 74 59 -15 -0.6 -1.0
Stephen Vogt 2.5 2.3 75 82 7 0.2 0.3
Carlos Gonzalez 2.4 2.4 76 75 -1 0.0 0.1
Derek Norris 2.4 2.4 77 76 -1 0.0 0.0
Martin Prado 2.3 3.1 78 60 -18 -0.8 -1.4
Jhonny Peralta 2.3 1.7 79 99 20 0.6 1.1
Ian Desmond 2.3 1.7 80 100 20 0.6 1.1
Gregory Polanco 2.3 2.3 81 83 2 0.0 0.0
Chris Coghlan 2.3 3.3 82 54 -28 -1.0 -1.8
Troy Tulowitzki 2.3 2.3 83 84 1 0.0 0.0
Ender Inciarte 2.3 3.3 84 55 -29 -1.0 -1.8
Yadier Molina 2.3 1.3 85 109 24 1.0 1.7
DJ LeMahieu 2.3 1.9 86 95 9 0.4 0.6
Dexter Fowler 2.2 3.2 87 57 -30 -1.0 -1.8
Charlie Blackmon 2.1 2.1 88 91 3 0.0 0.1
Kendrys Morales 2.1 2.1 89 92 3 0.0 0.0
Chase Headley 2.1 1.5 90 107 17 0.6 1.0
Mitch Moreland 2.1 2.1 91 93 2 0.0 -0.1
Prince Fielder 2.0 1.6 92 102 10 0.4 0.7
Salvador Perez 2.0 1.6 93 103 10 0.4 0.6
Addison Russell 2.0 2.9 94 65 -29 -0.9 -1.7
Ben Revere 1.9 1.9 95 96 1 0.0 0.0
Carlos Santana 1.9 2.4 96 77 -19 -0.5 -0.9
Trevor Plouffe 1.8 2.5 97 72 -25 -0.7 -1.2
Brett Lawrie 1.8 0.6 98 118 20 1.2 2.2
Asdrubal Cabrera 1.8 2.2 99 86 -13 -0.4 -0.7
Carlos Beltran 1.8 1.9 100 97 -3 -0.1 -0.2
Brock Holt 1.8 2.4 101 78 -23 -0.6 -1.1
Wilmer Flores 1.8 1.9 102 98 -4 -0.1 -0.3
Adam Lind 1.7 2.2 103 87 -16 -0.5 -0.8
Marcus Semien 1.7 1.7 104 101 -3 0.0 0.1
Albert Pujols 1.7 2 105 94 -11 -0.3 -0.5
Daniel Murphy 1.7 2.5 106 73 -33 -0.8 -1.5
Yangervis Solarte 1.6 1.6 107 104 -3 0.0 0.0
Elvis Andrus 1.6 1.6 108 105 -3 0.0 -0.1
Nick Markakis 1.6 1.6 109 106 -3 0.0 -0.1
Austin Jackson 1.4 2.3 110 85 -25 -0.9 -1.5
Erick Aybar 1.4 1 111 113 2 0.4 0.7
Cameron Maybin 1.3 1 112 114 2 0.3 0.5
Freddy Galvis 1.3 1.3 113 110 -3 0.0 -0.1
Anthony Gose 1.2 0.4 114 122 8 0.8 1.4
Marlon Byrd 1.2 1 115 115 0 0.2 0.3
Matt Kemp 1.1 0.4 116 123 7 0.7 1.2
Mark Trumbo 1.1 1.1 117 111 -6 0.0 0.0
Alcides Escobar 1.1 1.5 118 108 -10 -0.4 -0.8
Jace Peterson 0.9 1.1 119 112 -7 -0.2 -0.4
Starlin Castro 0.8 0.8 120 117 -3 0.0 0.0
Joe Mauer 0.7 0.3 121 125 4 0.4 0.6
Jimmy Rollins 0.6 0.2 122 127 5 0.4 0.8
Wilson Ramos 0.5 0.4 123 124 1 0.1 0.2
Jay Bruce 0.5 0.1 124 128 4 0.4 0.7
Jose Reyes 0.5 0.5 125 120 -5 0.0 -0.1
Brandon Moss 0.4 0.6 126 119 -7 -0.2 -0.3
Angel Pagan 0.4 -0.5 127 134 7 0.9 1.5
Jean Segura 0.3 0.3 128 126 -2 0.0 0.0
Evan Gattis 0.1 0 129 129 0 0.1 0.2
Logan Morrison 0.0 -0.2 130 131 1 0.2 0.3
Michael Taylor 0.0 1 131 116 -15 -1.0 -1.9
Alexei Ramirez -0.1 -0.5 132 135 3 0.4 0.7
Melky Cabrera -0.1 -0.3 133 132 -1 0.2 0.3
Nick Castellanos -0.2 -0.1 134 130 -4 -0.1 -0.1
Ryan Howard -0.4 -0.4 135 133 -2 0.0 -0.1
Chris Owings -0.6 -1.4 136 138 2 0.8 1.5
Pablo Sandoval -0.6 -2 137 139 2 1.4 2.4
Billy Butler -0.8 -0.7 138 136 -2 -0.1 -0.2
Avisail Garcia -0.9 -1.1 139 137 -2 0.2 0.4

What can we take away from all this? Let’s get through some of the nerdy stuff first. The mean of the changes evens out to 0.0, and the standard deviation is 0.6 WAR. The changes are normally distributed, and a little more than two-thirds of all players fall within 0.5 WAR of their actual 2015 WAR. We know that WAR comes with error bars, and that, for example, 2.5 WAR and 3.0 WAR aren’t always all that different. The lack of certain in defensive metrics is most of the reason why, and this reinforces that the error bar for any player is likely within one win. Only six players out of 139 see their WAR change by more than two standard deviations — or more than 1.0 WAR.

Gerardo Parra is our only true outlier. His 2015 value, with estimated true-talent level defense, would have been +2.7 WAR. His actual 2015 value, according to WAR with UZR, was only +0.4 WAR. The 2.3-WAR difference is four standard deviations above the mean, which is crazy. Clearly, something is up here. For context, UZR had Parra pegged at -18, while DRS had him at -10. BaseballProspectus’ FRAA had him at just -0.5. Perhaps UZR was overly harsh on Parra last year. Perhaps the projections are being too generous with his current true-talent ability. For what it’s worth, I sought clarity on Parra’s defensive numbers back in November, and found some evidence that suggested Parra’s true-talent level has declined some, though I can’t objectively say whether that means I’d take the over or under on the estimation of the projections.

The next-largest gap between 2015 value and expected true-talent value belongs to Kevin Kiermaier, who comes in at +4.1 WAR with true-talent defense, and +5.5 WAR with 2015’s figures. UZR had Kiermaier pegged as a +30-run defender last year. And while that’s obviously not his true-talent level — we’ve got that estimated at +18 runs, which is still remarkable — I think this discrepancy highlights the issue with the line of thinking prevalent in this post. UZR had Kiermaier at +30, FRAA agreed at +31, and DRS thought those were both light, putting him at +42. Every system available to us suggested Kiermaier’s defense was worth north of three wins last year, yet we know that’s not his true-talent ability. Just like how Bryce Harper posted a 1.109 OPS, yet we know that’s not his true-talent ability.

Kiermaier robbed something like half-a-dozen home runs or more last season. Part of that is because he’s incredible, but he also probably got somewhat lucky in having more balls than usual hit to the exact sliver of space in which a fielder has the opportunity to a rob a home run. We’d never expect him to rob half-a-dozen or more home runs in a year moving forward, but he did, and he ought to get credit for that. Just like how a batter’s BABIP can be inflated by a bunch of balls being hit to just the right spot, a fielder’s defensive metrics can be inflated by a bunch of balls being hit to just the right spot that allow him to make a plus play.

On the flip-side of that coin, consider Juan Lagares, who doesn’t appear in the table because he wasn’t qualified, but whose fielding ability was valued at +3.5 runs last year, and whose true-talent fielding ability would’ve been +7.9 runs. We know Lagares is an excellent fielder, but we also know he was dealing with elbow problems that limited his greatest defensive asset — his arm. In other words, it would be silly to use anything like Lagares’ true-talent defensive numbers for last season, given we have reason to believe Lagares was performing well below his true-talent ability, just as we have reason to believe Victor Martinez was performing well below his true-talent ability at the plate, for the same reasons.

I don’t think it’s all bad, though. Aside from the outliers, this could help clean up plenty of guys’ defensive numbers. The scouting and eye test data we have seems to suggest Mookie Betts may be an even better defender than he was given credit for last year. Plenty of people seemed to think Yoenis Cespedes‘ value was boosted due to overly-optimistic numbers on his defense, and his estimated true-talent agrees and docks him a win.

This methodology has its flaws. Our current methodology has its flaws. I’m certainly not suggesting this is how things should be done — I think that would be a step backward, in fact. But more information can never hurt.





August used to cover the Indians for MLB and ohio.com, but now he's here and thinks writing these in the third person is weird. So you can reach me on Twitter @AugustFG_ or e-mail at august.fagerstrom@fangraphs.com.

31 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ernie Camachomember
8 years ago

I’m afraid I really don’t understand the point of this. It seems to conflate predictive reliability with accuracy. When assessing past performance, I don’t really care whether stats are reliable for prediction, I just want the most accurate estimate of the run value of what the player actually did.

It might be that this defensive true talent estimate also ends up being a more accurate estimate of what actually happened, on average, but I think you need to make that case first in order for this exercise to make much sense.

In fairness, you sort of hint at the conceptual weirdness of this in your opening sentence, so I’m guessing I’m not saying anything you haven’t already thought of.

bbdawgrex
8 years ago
Reply to  Ernie Camacho

A lot of people even on this site mis-understand when they’re using descriptive vs. predictive stats, so this article is an always useful reminder of the differences.

The Kiermaier paragraph (Kiermaier robbed something like half-a-dozen…) is also an underused train of logic in defending the defensive component in WAR. WAR gets discredited often for the accuracy issues in the defense calcs, which exist I know, but this discussion of opportunities does not happen enough. A player can put up worse defensive metrics simply because he had fewer tough play opportunities that impact the metrics most. He’s still equally as talented, but his descriptive stats do and should suffer, just like grounders squeaking through holes impacting a player’s Avg.

Point being I love this article just in that it reiterates a couple of great points that, based on plenty of comments on articles, the community still needs to hear more.

Ernie Camachomember
8 years ago
Reply to  bbdawgrex

That’s a good point about the opportunities issue. I don’t think it’s much of a critique of WAR (which is a counting stat, after all, so isn’t meant to be a talent estimator), but it does highlight the potential value of defensive rate stats that are based on number of relevant opportunities, not just innings. Maybe something more like Inside Edge.

soddingjunkmailmember
8 years ago
Reply to  bbdawgrex

Co-sign.

The opportunities discussion is criminally under served.

Walter
8 years ago
Reply to  bbdawgrex

Like Elias mentions below though, we have very course binning going on with fielding, which is often combined with very few non-routine opportunities that players can actually get value from. So, unlike BABIP for hitters, a few plays that are boarder line difficult or straight up misclassified as difficult, have the potential to swing the values a tremendous amount for fielders. So essentially, the problem is that UZR isn’t particularly good at actually measuring “what happened on the field” over the course of just one season.

For example, what if all those HRs Kiermaier robbed where actually exceeding easy plays and just about every OFer would have made them? How much credit should he actually get then? Obviously taking a HR off the board is huge, but what if the league average player makes those same plays 80% of the time?

Only glove, no love
8 years ago
Reply to  Walter

Point taken and in support see Inciarte’s numbers for last two seasons.

But you clearly didn’t look up kiermaier’s catches heh.

Elias
8 years ago
Reply to  Ernie Camacho

Agree that WAR isn’t about prediction, so we want the components to measure “what actually happened.”

However, whereas something like the number of HRs a player hit can be measured with NO error, fielding metrics almost certainly contain some mismeasurement and therefore do not perfectly measure what actually happened.

So there is an argument for adjusting the fielding component for measurement error that has nothing to do with making WAR more predictive (even if that might be a side benefit).

Ernie Camachomember
8 years ago
Reply to  Elias

I totally agree with this. But adjusting for measurement error isn’t the same thing as using estimated true talent (unless they accidentally overlap). If regressing towards league average has MGL has supported in the past seems more accurate a depiction of what happened, that’s the way to go.

TheGrandslamwich
8 years ago
Reply to  Elias

“Whereas the number of HRs a player hit can be measured with NO error.”

Angel Hernandez, even with replay available, has already proven this statement false.

FunFella13member
8 years ago
Reply to  Ernie Camacho

But I think you need to make that the most accurate estimate of what actually happened, on average, but I think you need to make that the most accurate estimate of what actually care whether stats are reliable for predictive reliability with accuracy. When assessing past performance, I don’t understand the run value of what the conceptually did.

It might be that this exercise to make much sense.

In fairness, you sort of hint at the conceptually don’t really don’t really don’t already thought of.edictive reliable for predictive reliability with accuracy. When assessing I’m afraid I really happened, on average, but I this exercise to make that case first in order for this.

Sn0wman
8 years ago
Reply to  FunFella13

I’m hearing this post in the voice of Matt Frewer as Max Headroom…

Walter
8 years ago
Reply to  Ernie Camacho

Ernie,

“It might be that this defensive true talent estimate also ends up being a more accurate estimate of what actually happened, on average, but I think you need to make that case first in order for this exercise to make much sense.”

But how are you ever going to do that? And its exactly because we can’t do that, that many people suggest using regressed fielding components in WAR and why this article was written. Basically, what it allows us to have is some sort of guess at which side of the error a player’s WAR likely resides. No need to overthink this.

Ernie Camachomember
8 years ago
Reply to  Walter

At the end of the day, you might be right, and it’s just a matter of framing. But as we inch closer to better defensive info from statcast, and a more precise sense how player skills related to plays made and not made, it seems like a weird time to lose site of the real goal here.

philosofoolmember
8 years ago
Reply to  Ernie Camacho

It seems to me like stat cast will do just the opposite. What actually happens on the field is in part a huge matter of luck in where the defender lined up and the hitter hit the ball. Rather than trying to quantify what a player does, we will look at component skills (jump, speed, etc.) as measures of his skill and we won’t measure what he did at all, because really he was just a small part of the outcome–“what he did” misplaces responsibility on his shoulders. Our model of defensive skill will be component based and measure his context neutral expected runs saved over/under average.

Ernie Camachomember
8 years ago
Reply to  Ernie Camacho

But his component actions are what he did. I certainly agree that statcast’s greatest promise is assessing skills (i.e., estimating true talent), but it can also be used to more accurately classify actual past events. The two concepts will still be separate, but the same new tools can help with both.