Valuing Defense

by Dave Cameron

December 4, 2008

If there’s been an underlying theme to my writing here over the last year, it’s been that defense is significantly undervalued in MLB. Well, it seems like people are catching on. From the latest Peter Gammons’ article:

“The other thing is that teams are moving away from the base offensive statistics,” says another GM. “They are pouring through defensive studies and seeing that below-average defenders like Ramirez and Burrell in the field depreciate their offensive numbers because of what they give up.”

This isn’t just talk – last week, the Yankees, D’Backs, and Phillies decided that they had no interest in risking a one year arbitration offer to Bobby Abreu, Adam Dunn, or Pat Burrell, all good hitters who are miserable defensive outfielders. These guys made between $13 and $16 million in 2008, but their employers figured out that their overall production (with defense included in the calculations) just weren’t worth those kinds of paydays again.

So, with this shift in thinking apparently creeping into MLB, I figure it’s time that we spend a bit of time talking about defensive statistics and how they should be viewed. Over the next couple of days, we’ll talk about the theory behind them, and how they should be interpreted.

There are, essentially, two kinds of two defensive statistics available right now. The first would best be described as estimators – these include things like Zone Rating (and it’s THT derivative, Revised Zone Rating) and BP’s FRAA/FRAR. These measures have been around for a while, and because they don’t require a huge amount of precise data to calculate, offer very rough ideas of a player’s defensive value. More recently, several advanced defensive metrics have been created based on more precise play-by-play data – these include Ultimate Zone Rating, Plus/Minus, and Probablistic Model of Range.

The latter are the types that have pushed the new wave of defensive valuation forward, and these are the kinds of “defensive studies” that Gammons refers to. The extra data required helps make them more precise, giving us a better view of how much defense actually matters and how good each player is relative to his peers. For understanding defense, they’re a huge step forward from where we were 5 to 10 years ago.

However, there’s a pretty significant difference between the modern defensive statistics and the numbers that we use to value offense, and that lies in the variability of human error. When we talk about something like on base percentage, it is a statistic based on indisputable factual results – Player X reached base Y times in Z plate appearances. There’s no gray area – it happened, it was recorded, and no one disagrees. You could call OBP, and other things like that, a descriptive statistic – it describes a series of irrefutable events that definitely occurred.

Things like UZR, +/-, and PMR, however, are not simply recording incontrovertible facts. In order to get increased data precision, a human judgment is required to determine where on the field the ball landed and how hard the ball was hit, which both go into the calculation of how likely it was that an average fielder would have caught that ball. Humans, of course, aren’t perfect, so their required inclusion makes the data less reliable. We have to factor in human error when we’re looking at the results of these statistics, because we cannot assume 100% accuracy when there’s a subjective call in the equation.

Because of the variability of human error, these defensive metrics are simply not descriptive statistics. Instead, they are what I would consider an inferential statistic, which is still every bit as valid, but requires a different viewpoint for understanding the data. In the 5 pm post, we’ll look at what I mean by an inferential statistic and how they should be analyzed.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG