What Mean Do You Regress Defensive Metrics To?

Jerry Crasnick has an excellent article on defensive metrics as they relate to valuing free agents, especially diving into how they affect Matt Holliday and Jason Bay. It’s no secret that, as the hosts of UZR, we’re big proponents of its usefulness. However, I still agree with essentially everything in Crasnick’s article.

There are aspects of defense that zone-based metrics won’t capture. There are results from UZR that make you scratch your head and say “really?” There is value in having the experienced eyes of a scout watch a player and offer an opinion on the abilities that he saw. We agree with all of that.

The cases where the value of metrics like UZR are the most contentious are when the results diverge significantly from what the perceived scouting wisdom says about a player. Often times, the reaction to counterintuitive data is to dismiss it entirely, offering up the example as evidence that the metric is flawed beyond use. Or, on the other side, to offer up the player’s numbers as proof that scouts just don’t get it, and that subjective opinions are worthless. Simply go back and re-read the threads about Mark Teixeira’s defense over the summer to see this effect in full force on both sides.

In reality though, both positions are wrong. Re-quoting the assistant GM from Crasnick’s piece:

“If there’s some kind of discrepancy, you need to use your best judgment,” the assistant says. “If a scout says, ‘This guy stinks,’ but the numbers say he’s excellent, the truth probably lies somewhere in between.”

This is essentially a paraphrase of the concept of regression to different means. If we have two players with identical UZRs, but scouts love one and abhor the other, our projection for their relative UZRs going forward should favor the one preferred by scouts. The fact that observational information is available gives us a useful data point to add to the calculation, pushing forward analysis that leads to “best judgment”.

I said last week that I think Teixeira is probably a bit better defensively than his recent UZR scores have indicated, and the foundation of that belief lies in the value of scouting information. Teixeira is revered by almost every scout in the game as an exceptional defensive first baseman. That matters when we’re projecting future defensive performance. There is no reason to simply ignore those opinions simply because they don’t line up with what UZR has measured. We account for those opinions by regressing Teixeira’s UZR projections to a different mean than a player that scouts are less enamored of.

UZR is a tool. Scouts are a tool. They can be used together to produce better information than either can on their own. It is not an either/or proposition. Use both.





Dave is the Managing Editor of FanGraphs.

52 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
joser
14 years ago

But is there an “asterisk” column in the fielding section on Fangraphs so we know which UZR scores we can accept at face value and which ones we need to adjust? Or will those adjustments be taken into account in some future refinement of UZR?

Because right now if I’m trying to evaluate (say) one player vs another, I can compare their wOBA numbers and be fairly confident that has captured most of their offensive value and, therefore, I know which one is better at the plate. But — without doing further research — I don’t have as much confidence when comparing their UZRs, and since the value calculation here uses UZR as well, I can’t really trust that either. And yeah, we should all do more research and not do two-bit 30-second analysis, but that’s about all the time I have… and even if I have more time, I’m not even sure where I go to get the definitive “scout’s consensus” on any given player, let alone how to translate that into “regressing to a different mean.”

So help a brother out: what should we be doing?

Sam
14 years ago
Reply to  joser

I can compare their wOBA numbers and be fairly confident that has captured most of their offensive value and, therefore, I know which one is better at the plate.

How confident?

One thing I would like fangraphs to add to its impressive list of statistics are the error bars, or 95 percent confidence interval. The only writer who speaks of statistical uncertainty surrounding metrics consistently and incorporates in his analysis is Dave Allen. I would like to see more authors follow suit.

Xeifrank
14 years ago
Reply to  Sam

I think the error bars should only be given with projections. I don’t want to see error bars on accumulated stats. I just want to see what their production was. I believe Zips projections have the error bars if you go straight to the source. I don’t think it’s necessary for Fangraphs to carry them when the reader can find them himself. Fangraphs already does enough of the hard leg work.
vr, Xei

Sam
14 years ago
Reply to  Sam

I just want to see what their production was.

Are these “productions” (like wOBA) measured with certainty, like runs scored? If not, then error bars are absolutely critical. And secondly, it is more important when it comes to defense.

These productions that you talk about, are measured based on the average run value of each event using linear weights. These average run values are random variables, and therefore the linear combinations of these random variables are themselves random variables. Therefore, there should be error bars attached to them.

WY
14 years ago
Reply to  Sam

I agree with Sam. I think we’d all just like to see what the player’s production was, but that doesn’t change the fact that many of these stats (such as runs saved on defense, to cite one example) are ultimately estimates. That doesn’t make them wrong, but it does mean that there is going to be some uncertainty surrounding them. That is unavoidable.

I agree with Sam that it would be interesting to get a sense of how large or small the error might be for some of these stats. I also think it would help rein in some of the people who get a little too casual in tossing around WAR values to the nearest decimal point as if those numbers were set in stone. These stats are useful, but they are not monolithic or perfect, nor should we expect them to be.

Sam
14 years ago
Reply to  Sam

Linear weights are presented as a context neutral statistic. Therefore, there is no “estimate” – we literally are saying we don’t care how many runs the hit actually produced. It’s an average because we’re intentionally removing the context of the play.

Respectfully disagree. Removing context is not equivalent to removing statistical uncertainty, which may result from factors that are unmeasurable, or factors the statistician cannot control for. Because it is an average, an expected run value from a specific event if you will, it should have error bars or 95 (or 99) percent confidence interval surrounding it. It may turn out that the standard error will be extremely low due to huge sample sizes that are used to calculate the linear weights, but there still is uncertainty.

vivaelpujols
14 years ago
Reply to  Sam

Actually Dave, UZR is probably more accurate (in terms of identifying true talent level) than wOBA. wOBA splits everything up into a few buckets. If a guy hits a rope that’s caught at second base, wOBA gives him a 0 for that play. UZR has a lot more buckets than wOBA, so it will be a better indicator of a players true ability than wOBA. That’s why we see UZR have similar year to year correlation as wOBA, despite a much smaller sample size for defense.

Fresh Hops
14 years ago
Reply to  Dave Cameron

Exactly. This isn’t actually a difficult thing to reason about. It’s not “can I trust this?”, it’s “How much can I trust this?” If you have a decent sample of UZR, the answer is +/- 5 runs; taking their three year average (if you have that much data) with a slight weight toward recent performance is a good idea. Finally, you should always expect players to be a little more average than they have been in the past–that’s just regression to the mean (or, if you have a very small sample, a lot more average.)

Once you’ve done all that, you should use additional information you have about the player–recent injuries, recoveries, learning, aging, can all be factored in to suggest that a player will be a little different from the number we arrived at after reflecting on brute UZR data. Oh, and of course to the point of this article: there’s scouting information as well.