I’ll begin by saying I’m not sure what value all of what’s to follow actually has. I know that’s about the least-compelling way to begin a blog post, but I just want that to be very clear. TangoTiger’s recent Building a Better WAR Metric series on the site jumpstarted an idea I’d been kicking around in my head for a while. It’s an idea that mostly exists because I’ve seen people on the internet say they’d want to see something like it, and, at the very least, it could serve as a talking point for another constructive discussion about WAR, and any constructive discussion about WAR is a good thing, because we all admit it’s far from perfect and constructive discussions usher progress.
People don’t really have beef with the offensive side of WAR, I don’t think. As far as sabermetric stats go, wRC+, and therefore wRAA, are about as infallible as they come. Tough to argue with the outcomes of history. I don’t see too many quibbles with the base-running numbers, partially because I think most people think they do a good job, but also because they don’t move the needle much either way, and there’s bigger fish to fry. Some people aren’t fans of the positional adjustments — both the assigned weights, and the entire concept of including them. I’m in the camp that firmly believes in the idea of the positional adjustment, but, like anything, the formula for the weights could always be looked at to see if it could be improved in any way, and Jeff Zimmerman’s work on this topic last offseason was a great place to start.
But the bigger beef, beyond the positional adjustments, is of course defense. Anyone will admit this is the weak link of WAR. It’s probably the weakest link of sabermetrics, as a whole, in 2016. And mostly, what it boils down to is, we know that defensive metrics don’t stabilize until the sample spans roughly three years, or 3,000-ish innings. Meaning, the single-season data is subject to noise, and if we wanted to draw any conclusions from it, we’d have to regress it. Despite that, single-season WAR is powered by noisy, unregressed, single-season defensive metrics. That’s the crux of the beef with WAR.
So, some folks have suggested that the defensive component of WAR ought to be regressed in some way, in an effort to strip out some of the noise that comes with single-year defensive data, or to better capture a defender’s true performance. I think there are a number of flaws in this general line of thinking, but there are a number of flaws with the way it’s being done now, too, so let’s humor one another.
Both ZiPS and Steamer use multiple years of data, giving more weight to the most recent seasons. Multiple years of data to weed out noise? Check. Both also incorporate some form of “scouting” information: Steamer regresses toward the results of the Fans Scouting Report, ZiPS searches for keywords in actual, physical scouting reports and uses those as a means for regression. Eye test? Check. Blend all that together and factor in some aging curves, and you’ve got yourself as good an idea of any player’s true-talent defensive ability as you’re going to find. Sort the Fld column here and I think you’ll agree that these numbers pass the eye test with flying colors.
So let’s imagine a world where, last year, every player performed exactly to their true-talent defensive ability. Everyone hit the same, everyone ran the same, everyone had the same amount of playing time, but defensively, we knew exactly what everyone’s true-talent ability was worth, and no one varied from it.
Read the rest of this entry »