WAR: Imperfect but Useful Even in Small Samples
This morning, Jon Heyman noted an odd thing on Twitter:
i am not a hater of WAR stat, but if someone can explain to me how starling marte & bryce harper are both 1.7, please do
— Jon Heyman (@JonHeymanCBS) April 29, 2013
He was quoting Baseball-Reference’s WAR calculation, and the two are indeed tied at +1.7 WAR on B-R. Here, we have Bryce Harper (+1.5 WAR) ahead of Starling Marte (+1.2 WAR), but the point still basically stands; WAR thinks Harper (1.200 OPS) and Marte (.835 OPS) have both been pretty great this year, with just a small (or no) difference between them. What Harper has done with the bat, WAR believes that Marte has mostly made up with his legs in baserunning (+3 run advantage) and defense (+3 run advantage), as well a slight bump from getting 12 extra plate appearances.
There’s no question that Harper has been a better offensive player, but there are questions about the defensive valuations, because defensive metrics aren’t as refined at this point as offensive metrics are. It is much easier to prove that Harper has been +10 runs better with the bat this year than it is to prove that Marte has been +3 runs better defensively by UZR, or +7 runs better defensively by DRS. There are more sources for error in the defensive metrics, and Heyman’s tweet led to a discussion on Twitter about the usefulness of including small sample defensive metrics in WAR.
I’ve written before about the strong correlation between team WAR and team winning percentage, and others have followed up with similar analysis more recently. However, all those articles have focused on full season or multi-season data samples, and since the question was raised and I hadn’t yet seen it answered, I became curious about whether WAR would actually correlate better at this point in the year if we just assumed every player in baseball was an average defender.
Essentially, if we just removed defensive metrics from the equation, and evaluated teams solely on their hitting and pitching, how would our WAR calculation compare to team winning percentage? And how does WAR correlate to team winning percentage based on just April 2013 data, when we’re dealing with much smaller sample sizes?