Build a Better WAR Metric: Neutralizing Players
Larry Walker is a great hitter.
He’s a great hitter at Olympic Stadium. He’s a great hitter at Coors. He’s a great hitter at Busch. He’s a great hitter at any ballpark named after a beer.
Whereas the average hitter might create 120 runs per 162 games at Coors, Walker would create 190. That’s +60%.
Whereas the average hitter might create 85 runs per 162 games in every non-Coors park, Larry Walker would create 110. That’s +30%.
When you evaluate Larry Walker, you have two choices:
1. Neutralize Larry Walker by giving him 268 plate appearances in each of the home ballparks of the 30 MLB teams. His 2501 PA at Coors? Now we only count about 11% of that. His 32 PA in Oakland? We have to figure out how he’d have done if he got 268 PA. And so on.
2. Take a league average hitter, and put him in the same playing conditions as Larry Walker. Walker had 2501 PA at Coors? Great, let’s count it all. But let’s compare him to a league average hitter who also got to bat 2501 times at Coors. He came to bat 32 times in Oakland? Then the league average hitter also came to bat 32 times.
So, what are the strengths of these two options. In Option 1, we don’t allow Larry Walker to take “unfair” advantage of a park he might be ideally suited for. Whereas most hitter would increase their runs created by 40% at Coors relative to a non-Coors park, Larry Walker increased his by 70%. Given that he got 2501 PA at Coors, Walker ends up shining more than he would otherwise. It’s like letting Mariano Rivera come in 1- and 2-run games, while letting Trevor Hoffman and Billy Wagner only enter blowouts. They are all suited to close games, but if only Mo gets to leverage that, he’ll be the one getting all the saves. Is that fair? I dunno.
In Option 2, we deal with what actually happened. We don’t have to play a game of what-if. We simply accept what the player did, and that he was able to leverage (or not leverage) his unique playing conditions. All we do is make the comparison level all those thousands of players who also played in those exact same conditions, at the same frequency as our player. Walker@Coors is compared to average player at Coors, and Mo in high leverage situations is compared to the average reliever in high leverage situations, and so on. Everyone gets to keep what they did.
So, how do you want to see it?
“How do I want to compare players” for what purpose? Option 1 seems obviously better for predictive forward-looking talent evaluation/projection, Option 2 seems obviously better for retrospective comparison. Do we need to pick one option for both? If so, why?
Someone people have only one purpose, or prefer one purpose.
Though I wouldn’t be so quick to say that Option 1 is better for projections. You’re severely underweighting about half of the sample size while expanding the relative weight of the rest. That could be a huge issue for players with just a small number of plate appearances at any given ballpark – if Walker only had a .113 wOBA for his 32 PA in Oakland, that’s a meaningful distortion.
You wouldn’t necessarily just pro-rate.
* regress