If Josh Collmenter pitches a good game in a blowout, did it really happen?

Collmenter came in relief in 32 games. Among the 210 relievers with at least 30 innings, he had the 6th lowest Leverage Index at a tiny 0.39. He also had the 23rd best performance as measured by change in Run Expectancy.

Only 4 of those 32 games did he enter the game when it really mattered. There was another 3 when it sort of mattered. The other 25 games were a smattering of complete blowouts to mostly didn’t matter.

When we’re handing out wins, especially in games that he had virtually no hand in participating because of the timing in which he came into the game, should Collmenter be like a tree in the forest with no one around?

Or do we give him credit for his performance even though it had no impact when it did happen?


Build a Better WAR Metric: Timing Buckets

On September 1, 2015, the Nationals and Cardinals played a game where the Nationals took a big lead, only to give most of it back almost immediately. The Nationals kept trying to hold on, until the end, when the Cardinals won the game on a 3-run HR.


Source: FanGraphs

Let’s look at that ninth inning. First up was Jason Heyward. He grounded out. That context-neutral run value of making an out is -0.25 runs (or -.027 wins). Making an out to start the inning with the bases empty is only worth -0.225 runs (or -.024 wins). Therefore, the base-out timing value of the out is +.025 runs (or +.003 wins). It looks like this:

-.027 wins: Heyward’s out
+.003 wins: low impact timing of out with bases empty

But we know more information. It was a 5-5 game to start the bottom of the 9th. This is a higher leverage situation than random. Heyward’s out actually reduced the chance of winning by .050 wins, not .024 wins. That is, the impact is felt twice as much as a random leadoff situation. So, there’s yet another .026 wins to account for. This is what it looks like:

-.027 wins: Heyward’s out
+.003 wins: low impact timing of out with bases empty
-.026 wins: high impact timing of out in 9th inning of tied game

The question to ask yourself (not to me, but to yourself), is how much do you want to credit Heyward for making an out in this situation: do you want to just credit him with a random out, because he was just plucked into this situation, or do you want to credit him with making an out as the leverage was lower impact (bases empty) or even high impact (9th inning of a tied game)? Is an out an out, or does the out depend on the situation?

Let’s continue. Yadier Molina also got an out. Going through the above machinations gives us this:

-.027 wins: Molina’s out
+.010 wins: low impact timing of out with bases empty
-.019 wins: high impact timing of out in 9th inning of tied game

Now the fun begins. Cody Stanley doubled.

+.081 wins: Stanley’s double
-.056 wins: low impact timing of double with two outs
+.043 wins: high impact timing of double in 9th inning of tied game

So, in a random situation, a double with two outs is not that valuable. It’s less valuable than a random walk. That’s why we have a huge -.056 win value to account for its low impact. But at the same time, this puts the winning run on base in the bottom of the 9th. This is enormously high impact. How you approach valuation will decide how you want to credit Stanley and his double.

Tommy Pham walked with first base open and winning runner already on base.

+.032 wins: Pham’s walk
-.020 wins: low impact timing of walk with 1st base open
-.009 wins: low impact timing of walk (run is useless)

Let’s pause here. The double put the winning run on base, and left 1B open. The walk is in fact practically useless. The win value changed by +.003 wins, which is pretty close to zero. The batter and pitcher know this, which is why we see a NEGATIVE impact of the walk in the 9th inning of a tied game, even though we are in a high leverage situation. This is unlike the double which had a huge POSITIVE impact. The entire sequencing of the situation matters. Given that the batter and pitcher are aware of the situations as they develop, the entire timing values noted above make perfect sense.

Finally, the HR by Brandon Moss.

+.150 wins: Moss’s HR
+.137 wins: high impact timing of HR with 2 runners on
+.114 wins: high impact timing of HR to win the game

In the end, the Cardinals went from a 61.4% chance of winning to 100%, adding +0.386 wins. Adding up the above, and we get:

+.209 wins: all the events in a random situation
+.074 wins: high/low impact timing for base-out situations
+.103 wins: high impact timing of inning/score (except walk)

So, how do you, the reader, want to evaluate each of these plays? How much do you want to assign to the batter (and pitcher) and how much do you just want to have some general “timing” buckets, not linked to any particular player?


Short, Unnecessary Film: Thus Sprach Corey Ray

At the end of this past week, prospect analyst Jesse Burkhart published a post at FanGraphs enumerating the virtues of draft-eligible Louisville outfielder Corey Ray. Less than an hour after that, the present author published a collection of regressed statistical leaderboards for select college conferences — atop one of which appeared draft-eligible Louisville outfielder Corey Ray, for his performance among ACC batters during the first week of the collegiate season.

For both reasons, the author observed with some interest this weekend’s series between Louisville and Ole Miss, played at the latter’s home park. While the Cardinals lost two of three games there, Ray played well, striking out just once over 14 plate appearances and recording an extra-base hit in all three contests.

The last of those extra-base hits was a home run — which is to say, basically the best kind of extra-base hit there is. With a view both to (a) celebrating that home run and also (b) attempting to reach the very narrowest possible demographic of this site’s wide readership, what the author has done is not only to capture the video footage of Corey Ray’s homer, but also to render it into slow motion and set it to an excerpt from Also Sprach Zarathustra by dead German composer Richard Strauss.

The results of this unnecessary moment in the history of film appear below.


Build a Better WAR Metric, Results and Commentary

Here are the results of the nine polls. You will see a link to each Instagraph, the results, along with my commentary.

***

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-4/
One team defense allows 8 runs, and the other a shutout. But both gave up 10 hits and 3 walks.

60% I don’t care about the timing. These defenses should count the same.
40% In the end, all that matters is runs allowed. The shutout defense should count for much better.

Commentary: This was the closest of the polls. Readers were nearly split as to whether to consider sequencing at the team-defense level or not. They prefer not to by a small margin. So given the choice between a metric based on hits, walks and outs like Baseball Prospectus, or one based on runs like Baseball Reference, readers lean toward the BPro model. I’ll get to Fangraphs in a bit.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-7/
Solo HR or a bases-clearing double?

62% I don’t care about the context. I want the HR to count for more than the double.
38% Even if I prefer context-neutral stats, that’s true only to a point. Bases-clearing double for the win.

Commentary: The change in run expectancy is two runs for the double that clears the loaded bases, and one run for the HR with no runners on. The readers felt that even that wasn’t enough context to prefer the double over the HR. Basically, they don’t want to reward the batter for being a position to leverage a situation that he didn’t have a hand in creating. Even though he exploited it almost perfectly.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-2/
It’s the bottom of the 9th of a tie game, bases loaded, and it’s a walk or a HR

66% Totally different. I want the HR to count for alot more than a BB
34% Same impact. I care about the preservation of wins.

Commentary: Once again, even in a scenario in which it’s do-or-die, HOW you do it matters to the reader, even if it doesn’t matter for the situation at hand.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-6/
Ace relievers enters 9th, allow 1 run, with 2- or 1-run lead:

66% Both Billy and Trevor did an equally poor job, regardless of their lead
34% Trevor was a net negative. Billy was at least neutral, perhaps even net positive

Commentary: This is similar to the above scenario, where a player is thrust into a situation not of his making. And by an almost 2:1 margin, the readers want to evaluate the performance without respect to the situation.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-5/
Ace reliever enters 9th with a 3 run lead, allows 2 to end the game.

77% Two runs in the 9th is an abysmal performance.
23% Two runs when given 3 runs to work with is barely passable, but still a net positive.

Commentary: There’s a limit to which Fangraphs readers will take context into account. Being given a three run lead is too much buffer to consider, and they don’t want to reward an 18.00 ERA in one inning as a net positive. But still, 23% did.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-9-and-last/
Who pitched the better game, Strasburg or Wainwright?

78% Strasburg. His K/BB performance is so overwhelming. Maybe he didn’t get fielding support.
22% Wainwright. He didn’t dominate the hitters, but he was more effective. In the end, the runs tell the story.

Commentary: This was a referendum on FIP (heart of the Fangraphs model) v ERA (heart of the BR.com model) or Component-based ERA (heart of the BPro model). Obviously, having this poll on Fangraphs will bias the results towards those readers. But I tried to make the choice as tough as possible. Indeed, I carefully selected the numbers so they would match my version of the Bill James Game Score model.
http://tangotiger.com/index.php/site/comments/game-scores-for-2015

The two pitchers were set to have a score of “81” in both cases (50 is average, and around 100 is perfect). The 9 extra strikeouts for Strasburg were in balance against the 3 fewer singles and 1 fewer run for Wainwright. My version of Game Score is pretty much an even balance of the three models. I was really hoping that the readers, not realizing what I was doing, would end up going 50/50. They didn’t.

Given the results of the two polls, where the FIP model is much more preferred to the other two, while the BPRo model is slightly preferred to BR.com, this is what it looks like the Fangraphs readers prefer:
60% Fangraphs (FIP)
25% Baseball Prospectus (Component ERA)
15% Baseball Reference (ERA, actually RA/9)

http://www.fangraphs.com/blogs/instagraphs/building-a-better-war-metric/
Comparing a bases loaded walk with a solo HR:

81% Totally different. I want the HR to count for alot more.
19% Same impact. I care about the preservation of runs.

Commentary: Once again, fans don’t care about the runs, and what the batter can leverage. They care about the events. If the batter didn’t create the situation, the fans don’t want to let them leverage it.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-8/
Runner on third, 1 out, and the result is a K or SF:

83% This situation is so different, so obvious, that the value gap between a K and SF is huge.
17% I’m sticking that everything is context-neutral: an out is an out.

Commentary: In this case, an out is not an out. One is an out that moves the runner over, and the other is an out that doesn’t. This would be as different as a double v single. Fans are clear that if the situation is the same, but the outcome is different, then they’ll reward the outcome.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-3/
Comparing the stranded leadoff triple to the one where he scores:

95% Same thing. A triple is a triple.
5% Totally different. One puts runs on the board, the other was useless.

Commentary: The fans are saying the sequencing of the play ends as soon as the next batter comes to the plate. The batter did his job, and he ended up at third. Whatever happens after that, they won’t hold the runner accountable, nor reward him (unless he actively did something himself).

***

Thank you to everyone who participated in the voting with 8000 total votes, and for the 180 comments in the comments section. I’m not sure where I go from here yet, but I’ll think of something soon.


Build a Better WAR Metric, Part 9 and Last

This will be the 9th and final question I will ask. And I think it’s the toughest one. But, you guys keep surprising me, so, let’s get into it.

The question relates to how you see pitchers and the impact of their fielders. We have two pitchers, let’s call them Stephen Strasburg and Adam Wainwright. And they are pitching in the same game.

Strasburg pitches a complete game, striking out 13, without walking anyone, or allowing any extra base hits. But he does allow 10 singles, or at least, he and his fielders allow 10 singles, and that leads to 2 runs.

Wainwright also pitches a complete game, he also doesn’t walk anyone, but he only strikes out 4. He only allows 7 singles, or at least, he and his fielders allow 7 singles, and that leads to only 1 run.

The only thing you know is what I’ve told you. If you wish to infer more, like perhaps the Cardinals fielders helped Adam more than the Nationals fielders helped Stephen, go ahead. If you wish to infer that Wainwright allowed softer hit balls than Strasburg, you can do that if you want. You decide how to interpret the information I’ve given you.

The question:


Build a Better WAR Metric, Part 8

The worst time to strike out is with a runner on third base and less than 2 outs. In fact, it’s runner on 3B and exactly 1 out. The pitcher knows it, the batter knows it, the fielder knows it, the runner knows it. The fans know it. Everyone is extremely aware that getting the second out changes the entire dynamic of the situation, since now only a positive event can score that runner on third base.

We can even quantify this situation rather precisely. With a runner on third and 1 out, we expect around 0.94 runs to score to the end of the inning. But with 2 outs, it’s all the way down to 0.37 runs. A strikeout in that situation is worth an astounding -0.57 runs.

On the other hand a sacrifice fly clears the bases, gets an extra out (run expectancy down to 0.09 runs), but adds a run to the scoreboard, so the total run value is 1.09 runs, or a net gain of +.15 runs. The sac fly is one of the very few times that trading a base for an out is a net positive.

We have a potential swing of 0.72 runs between a “bad” out and a “good” out. With everyone in the ballpark well aware of the situation, and with outs so common place, it’s clear the pitcher is going to do his best to pitch in a way to increase strikeouts at the cost of perhaps an increase in low impact walks with first base open. And the batter is going to do his best to counteract that, and not try to strike out at the cost of perhaps a tiny bit of power. Everyone is changing their strategy.

But remove the specific context, and an out is an out.

So, how do you want to account for this specific play?


Build a Better WAR Metric, Part 7

So far, we’ve looked at seeing the overall impact be the same, but for different reasons, like the bases loaded walk versus the solo HR. And you are either given a choice of “same” or “this one is better”. It’s been designed to see if you prefer a DIRECTION. In the above case, the direction is based on whether the event matters.

Now, we’ll have two situations that have an overall different impact. We have our trusty solo HR. We all love the HR. It gives us a guaranteed run. It tells us about the hitter, and if given better circumstances, we can dream of even more runs.

If I ask about a double with a runner on second, I’d get into the same question as with the bases loaded walk: the impact is exactly one run, and we’re left with the same state that we entered. And 80% of Fangraphs readers will prefer the solo HR to the double with the runner on 2nd.

But how about a bases-clearing double? Any way you want to measure it, the impact is going to be alot(*) more than the 1 run from the solo HR. Run expectancy tables tell us it’s 1.7 to 2.2 runs depending on the number of outs.

(*) “Get over it.” — Scalia

So, how much of a hurdle are Fangraphs readers willing to climb to keep their allegiance to the event in a context-neutral setting, and ignore the context of a somewhat inferior event, but in a highly more leverageable setting?


Build a Better WAR Metric, Part 6

In the previous post about the two-run reliever with the three-run lead, the Fangraphs readers leaned 80-20 toward “a run is a run”, compared to “he did his job”.

Let’s make this one similar, but tougher. We have our ace reliever entering the bottom of the 9th inning. This time, our ace reliever gives up a leadoff HR, followed by striking out the side.

We have two different relievers:
(a) we have Billy Wagner entering the game with a 2-run lead, and so, managed to squeak out of it to end the game. If you are interested, his team had a 90% chance of winning before he showed up. And the Astros won.

(b) we also had Trevor Hoffman face the same scenario with a 1-run lead. He leaves the game tied making the Padres go into extra innings, turning an 80% chance of winning into a 50% chance.


Build a Better WAR Metric, Part 5

When the home team enters the top of the 9th with a 3-run lead, they will win that game 98% of the time. That happens mostly because they get to pick and choose the reliever they want. If they chose a random reliever, they’d win 97% of the time. If they chose a poor reliever, they’d win 96% of the time. It’s pretty tough to mess up a 3-run lead, especially when the home team gets one more crack at it in the bottom of the 9th.

So, we have a SP that went 8, and he hands off to the reliever this 3-run lead. The ace reliever comes in. Let’s call him Armando Benitez. He walks the first batter, allows a HR to the second, then strikes out the side. The game ends, and his team wins. Armando even gets a “save”, whatever that is supposed to imply.

Since he was given a 3-run rope, and he only used 2-runs, he was able to turn a 96% or 98% chance of winning into 100%, all without the help of his fielders. Incredibly, things could have gotten worse, which does happen 2 to 4 percent of the time. In this case, he pitched just bad enough to win.


Build a Better WAR Metric, Part 4

First thanks so much for the tremendous responses, both in the comments and just the participation in the polls. There’s been 700 to 1200 votes in each poll. Just overwhelming responses.

For now, let’s start the second inning by leaving aside the hitters and talk about defense. Now, when I refer to defense, I mean pitching+fielding. Remember, defense is the whole team, the pitchers and the fielders. We’ll worry about how to separate fielders from pitchers in a soon-to-be-asked question. Just not now. Cool?

Let’s say that one team defense allows 10 hits with 3 walks. But they are all scattered, and so actually end up with a shutout. Another team defense allows the exact same number of hits and walks. They even allow them in the exact same way. The only difference is the timing. They allowed them bunched up, and so resulted in eight runs allowed. From a team defense perspective, how do you see them in terms of assigning value?