Build a Better WAR Metric, Results and Commentary

by Tangotiger

February 27, 2016

Here are the results of the nine polls. You will see a link to each Instagraph, the results, along with my commentary.

***

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-4/
One team defense allows 8 runs, and the other a shutout. But both gave up 10 hits and 3 walks.

60% I don’t care about the timing. These defenses should count the same.
40% In the end, all that matters is runs allowed. The shutout defense should count for much better.

Commentary: This was the closest of the polls. Readers were nearly split as to whether to consider sequencing at the team-defense level or not. They prefer not to by a small margin. So given the choice between a metric based on hits, walks and outs like Baseball Prospectus, or one based on runs like Baseball Reference, readers lean toward the BPro model. I’ll get to Fangraphs in a bit.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-7/
Solo HR or a bases-clearing double?

62% I don’t care about the context. I want the HR to count for more than the double.
38% Even if I prefer context-neutral stats, that’s true only to a point. Bases-clearing double for the win.

Commentary: The change in run expectancy is two runs for the double that clears the loaded bases, and one run for the HR with no runners on. The readers felt that even that wasn’t enough context to prefer the double over the HR. Basically, they don’t want to reward the batter for being a position to leverage a situation that he didn’t have a hand in creating. Even though he exploited it almost perfectly.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-2/
It’s the bottom of the 9th of a tie game, bases loaded, and it’s a walk or a HR

66% Totally different. I want the HR to count for alot more than a BB
34% Same impact. I care about the preservation of wins.

Commentary: Once again, even in a scenario in which it’s do-or-die, HOW you do it matters to the reader, even if it doesn’t matter for the situation at hand.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-6/
Ace relievers enters 9th, allow 1 run, with 2- or 1-run lead:

66% Both Billy and Trevor did an equally poor job, regardless of their lead
34% Trevor was a net negative. Billy was at least neutral, perhaps even net positive

Commentary: This is similar to the above scenario, where a player is thrust into a situation not of his making. And by an almost 2:1 margin, the readers want to evaluate the performance without respect to the situation.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-5/
Ace reliever enters 9th with a 3 run lead, allows 2 to end the game.

77% Two runs in the 9th is an abysmal performance.
23% Two runs when given 3 runs to work with is barely passable, but still a net positive.

Commentary: There’s a limit to which Fangraphs readers will take context into account. Being given a three run lead is too much buffer to consider, and they don’t want to reward an 18.00 ERA in one inning as a net positive. But still, 23% did.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-9-and-last/
Who pitched the better game, Strasburg or Wainwright?

78% Strasburg. His K/BB performance is so overwhelming. Maybe he didn’t get fielding support.
22% Wainwright. He didn’t dominate the hitters, but he was more effective. In the end, the runs tell the story.

Commentary: This was a referendum on FIP (heart of the Fangraphs model) v ERA (heart of the BR.com model) or Component-based ERA (heart of the BPro model). Obviously, having this poll on Fangraphs will bias the results towards those readers. But I tried to make the choice as tough as possible. Indeed, I carefully selected the numbers so they would match my version of the Bill James Game Score model.
http://tangotiger.com/index.php/site/comments/game-scores-for-2015

The two pitchers were set to have a score of “81” in both cases (50 is average, and around 100 is perfect). The 9 extra strikeouts for Strasburg were in balance against the 3 fewer singles and 1 fewer run for Wainwright. My version of Game Score is pretty much an even balance of the three models. I was really hoping that the readers, not realizing what I was doing, would end up going 50/50. They didn’t.

Given the results of the two polls, where the FIP model is much more preferred to the other two, while the BPRo model is slightly preferred to BR.com, this is what it looks like the Fangraphs readers prefer:
60% Fangraphs (FIP)
25% Baseball Prospectus (Component ERA)
15% Baseball Reference (ERA, actually RA/9)

http://www.fangraphs.com/blogs/instagraphs/building-a-better-war-metric/
Comparing a bases loaded walk with a solo HR:

81% Totally different. I want the HR to count for alot more.
19% Same impact. I care about the preservation of runs.

Commentary: Once again, fans don’t care about the runs, and what the batter can leverage. They care about the events. If the batter didn’t create the situation, the fans don’t want to let them leverage it.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-8/
Runner on third, 1 out, and the result is a K or SF:

83% This situation is so different, so obvious, that the value gap between a K and SF is huge.
17% I’m sticking that everything is context-neutral: an out is an out.

Commentary: In this case, an out is not an out. One is an out that moves the runner over, and the other is an out that doesn’t. This would be as different as a double v single. Fans are clear that if the situation is the same, but the outcome is different, then they’ll reward the outcome.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-3/
Comparing the stranded leadoff triple to the one where he scores:

95% Same thing. A triple is a triple.
5% Totally different. One puts runs on the board, the other was useless.

Commentary: The fans are saying the sequencing of the play ends as soon as the next batter comes to the plate. The batter did his job, and he ended up at third. Whatever happens after that, they won’t hold the runner accountable, nor reward him (unless he actively did something himself).

***

Thank you to everyone who participated in the voting with 8000 total votes, and for the 180 comments in the comments section. I’m not sure where I go from here yet, but I’ll think of something soon.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG