Short, Unnecessary Film: Thus Sprach Corey Ray

At the end of this past week, prospect analyst Jesse Burkhart published a post at FanGraphs enumerating the virtues of draft-eligible Louisville outfielder Corey Ray. Less than an hour after that, the present author published a collection of regressed statistical leaderboards for select college conferences — atop one of which appeared draft-eligible Louisville outfielder Corey Ray, for his performance among ACC batters during the first week of the collegiate season.

For both reasons, the author observed with some interest this weekend’s series between Louisville and Ole Miss, played at the latter’s home park. While the Cardinals lost two of three games there, Ray played well, striking out just once over 14 plate appearances and recording an extra-base hit in all three contests.

The last of those extra-base hits was a home run — which is to say, basically the best kind of extra-base hit there is. With a view both to (a) celebrating that home run and also (b) attempting to reach the very narrowest possible demographic of this site’s wide readership, what the author has done is not only to capture the video footage of Corey Ray’s homer, but also to render it into slow motion and set it to an excerpt from Also Sprach Zarathustra by dead German composer Richard Strauss.

The results of this unnecessary moment in the history of film appear below.


Build a Better WAR Metric, Results and Commentary

Here are the results of the nine polls. You will see a link to each Instagraph, the results, along with my commentary.

***

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-4/
One team defense allows 8 runs, and the other a shutout. But both gave up 10 hits and 3 walks.

60% I don’t care about the timing. These defenses should count the same.
40% In the end, all that matters is runs allowed. The shutout defense should count for much better.

Commentary: This was the closest of the polls. Readers were nearly split as to whether to consider sequencing at the team-defense level or not. They prefer not to by a small margin. So given the choice between a metric based on hits, walks and outs like Baseball Prospectus, or one based on runs like Baseball Reference, readers lean toward the BPro model. I’ll get to Fangraphs in a bit.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-7/
Solo HR or a bases-clearing double?

62% I don’t care about the context. I want the HR to count for more than the double.
38% Even if I prefer context-neutral stats, that’s true only to a point. Bases-clearing double for the win.

Commentary: The change in run expectancy is two runs for the double that clears the loaded bases, and one run for the HR with no runners on. The readers felt that even that wasn’t enough context to prefer the double over the HR. Basically, they don’t want to reward the batter for being a position to leverage a situation that he didn’t have a hand in creating. Even though he exploited it almost perfectly.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-2/
It’s the bottom of the 9th of a tie game, bases loaded, and it’s a walk or a HR

66% Totally different. I want the HR to count for alot more than a BB
34% Same impact. I care about the preservation of wins.

Commentary: Once again, even in a scenario in which it’s do-or-die, HOW you do it matters to the reader, even if it doesn’t matter for the situation at hand.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-6/
Ace relievers enters 9th, allow 1 run, with 2- or 1-run lead:

66% Both Billy and Trevor did an equally poor job, regardless of their lead
34% Trevor was a net negative. Billy was at least neutral, perhaps even net positive

Commentary: This is similar to the above scenario, where a player is thrust into a situation not of his making. And by an almost 2:1 margin, the readers want to evaluate the performance without respect to the situation.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-5/
Ace reliever enters 9th with a 3 run lead, allows 2 to end the game.

77% Two runs in the 9th is an abysmal performance.
23% Two runs when given 3 runs to work with is barely passable, but still a net positive.

Commentary: There’s a limit to which Fangraphs readers will take context into account. Being given a three run lead is too much buffer to consider, and they don’t want to reward an 18.00 ERA in one inning as a net positive. But still, 23% did.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-9-and-last/
Who pitched the better game, Strasburg or Wainwright?

78% Strasburg. His K/BB performance is so overwhelming. Maybe he didn’t get fielding support.
22% Wainwright. He didn’t dominate the hitters, but he was more effective. In the end, the runs tell the story.

Commentary: This was a referendum on FIP (heart of the Fangraphs model) v ERA (heart of the BR.com model) or Component-based ERA (heart of the BPro model). Obviously, having this poll on Fangraphs will bias the results towards those readers. But I tried to make the choice as tough as possible. Indeed, I carefully selected the numbers so they would match my version of the Bill James Game Score model.
http://tangotiger.com/index.php/site/comments/game-scores-for-2015

The two pitchers were set to have a score of “81” in both cases (50 is average, and around 100 is perfect). The 9 extra strikeouts for Strasburg were in balance against the 3 fewer singles and 1 fewer run for Wainwright. My version of Game Score is pretty much an even balance of the three models. I was really hoping that the readers, not realizing what I was doing, would end up going 50/50. They didn’t.

Given the results of the two polls, where the FIP model is much more preferred to the other two, while the BPRo model is slightly preferred to BR.com, this is what it looks like the Fangraphs readers prefer:
60% Fangraphs (FIP)
25% Baseball Prospectus (Component ERA)
15% Baseball Reference (ERA, actually RA/9)

http://www.fangraphs.com/blogs/instagraphs/building-a-better-war-metric/
Comparing a bases loaded walk with a solo HR:

81% Totally different. I want the HR to count for alot more.
19% Same impact. I care about the preservation of runs.

Commentary: Once again, fans don’t care about the runs, and what the batter can leverage. They care about the events. If the batter didn’t create the situation, the fans don’t want to let them leverage it.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-8/
Runner on third, 1 out, and the result is a K or SF:

83% This situation is so different, so obvious, that the value gap between a K and SF is huge.
17% I’m sticking that everything is context-neutral: an out is an out.

Commentary: In this case, an out is not an out. One is an out that moves the runner over, and the other is an out that doesn’t. This would be as different as a double v single. Fans are clear that if the situation is the same, but the outcome is different, then they’ll reward the outcome.

http://www.fangraphs.com/blogs/instagraphs/build-a-better-war-metric-part-3/
Comparing the stranded leadoff triple to the one where he scores:

95% Same thing. A triple is a triple.
5% Totally different. One puts runs on the board, the other was useless.

Commentary: The fans are saying the sequencing of the play ends as soon as the next batter comes to the plate. The batter did his job, and he ended up at third. Whatever happens after that, they won’t hold the runner accountable, nor reward him (unless he actively did something himself).

***

Thank you to everyone who participated in the voting with 8000 total votes, and for the 180 comments in the comments section. I’m not sure where I go from here yet, but I’ll think of something soon.


Build a Better WAR Metric, Part 9 and Last

This will be the 9th and final question I will ask. And I think it’s the toughest one. But, you guys keep surprising me, so, let’s get into it.

The question relates to how you see pitchers and the impact of their fielders. We have two pitchers, let’s call them Stephen Strasburg and Adam Wainwright. And they are pitching in the same game.

Strasburg pitches a complete game, striking out 13, without walking anyone, or allowing any extra base hits. But he does allow 10 singles, or at least, he and his fielders allow 10 singles, and that leads to 2 runs.

Wainwright also pitches a complete game, he also doesn’t walk anyone, but he only strikes out 4. He only allows 7 singles, or at least, he and his fielders allow 7 singles, and that leads to only 1 run.

The only thing you know is what I’ve told you. If you wish to infer more, like perhaps the Cardinals fielders helped Adam more than the Nationals fielders helped Stephen, go ahead. If you wish to infer that Wainwright allowed softer hit balls than Strasburg, you can do that if you want. You decide how to interpret the information I’ve given you.

The question:


Build a Better WAR Metric, Part 8

The worst time to strike out is with a runner on third base and less than 2 outs. In fact, it’s runner on 3B and exactly 1 out. The pitcher knows it, the batter knows it, the fielder knows it, the runner knows it. The fans know it. Everyone is extremely aware that getting the second out changes the entire dynamic of the situation, since now only a positive event can score that runner on third base.

We can even quantify this situation rather precisely. With a runner on third and 1 out, we expect around 0.94 runs to score to the end of the inning. But with 2 outs, it’s all the way down to 0.37 runs. A strikeout in that situation is worth an astounding -0.57 runs.

On the other hand a sacrifice fly clears the bases, gets an extra out (run expectancy down to 0.09 runs), but adds a run to the scoreboard, so the total run value is 1.09 runs, or a net gain of +.15 runs. The sac fly is one of the very few times that trading a base for an out is a net positive.

We have a potential swing of 0.72 runs between a “bad” out and a “good” out. With everyone in the ballpark well aware of the situation, and with outs so common place, it’s clear the pitcher is going to do his best to pitch in a way to increase strikeouts at the cost of perhaps an increase in low impact walks with first base open. And the batter is going to do his best to counteract that, and not try to strike out at the cost of perhaps a tiny bit of power. Everyone is changing their strategy.

But remove the specific context, and an out is an out.

So, how do you want to account for this specific play?


Build a Better WAR Metric, Part 7

So far, we’ve looked at seeing the overall impact be the same, but for different reasons, like the bases loaded walk versus the solo HR. And you are either given a choice of “same” or “this one is better”. It’s been designed to see if you prefer a DIRECTION. In the above case, the direction is based on whether the event matters.

Now, we’ll have two situations that have an overall different impact. We have our trusty solo HR. We all love the HR. It gives us a guaranteed run. It tells us about the hitter, and if given better circumstances, we can dream of even more runs.

If I ask about a double with a runner on second, I’d get into the same question as with the bases loaded walk: the impact is exactly one run, and we’re left with the same state that we entered. And 80% of Fangraphs readers will prefer the solo HR to the double with the runner on 2nd.

But how about a bases-clearing double? Any way you want to measure it, the impact is going to be alot(*) more than the 1 run from the solo HR. Run expectancy tables tell us it’s 1.7 to 2.2 runs depending on the number of outs.

(*) “Get over it.” — Scalia

So, how much of a hurdle are Fangraphs readers willing to climb to keep their allegiance to the event in a context-neutral setting, and ignore the context of a somewhat inferior event, but in a highly more leverageable setting?


Build a Better WAR Metric, Part 6

In the previous post about the two-run reliever with the three-run lead, the Fangraphs readers leaned 80-20 toward “a run is a run”, compared to “he did his job”.

Let’s make this one similar, but tougher. We have our ace reliever entering the bottom of the 9th inning. This time, our ace reliever gives up a leadoff HR, followed by striking out the side.

We have two different relievers:
(a) we have Billy Wagner entering the game with a 2-run lead, and so, managed to squeak out of it to end the game. If you are interested, his team had a 90% chance of winning before he showed up. And the Astros won.

(b) we also had Trevor Hoffman face the same scenario with a 1-run lead. He leaves the game tied making the Padres go into extra innings, turning an 80% chance of winning into a 50% chance.


Build a Better WAR Metric, Part 5

When the home team enters the top of the 9th with a 3-run lead, they will win that game 98% of the time. That happens mostly because they get to pick and choose the reliever they want. If they chose a random reliever, they’d win 97% of the time. If they chose a poor reliever, they’d win 96% of the time. It’s pretty tough to mess up a 3-run lead, especially when the home team gets one more crack at it in the bottom of the 9th.

So, we have a SP that went 8, and he hands off to the reliever this 3-run lead. The ace reliever comes in. Let’s call him Armando Benitez. He walks the first batter, allows a HR to the second, then strikes out the side. The game ends, and his team wins. Armando even gets a “save”, whatever that is supposed to imply.

Since he was given a 3-run rope, and he only used 2-runs, he was able to turn a 96% or 98% chance of winning into 100%, all without the help of his fielders. Incredibly, things could have gotten worse, which does happen 2 to 4 percent of the time. In this case, he pitched just bad enough to win.


Build a Better WAR Metric, Part 4

First thanks so much for the tremendous responses, both in the comments and just the participation in the polls. There’s been 700 to 1200 votes in each poll. Just overwhelming responses.

For now, let’s start the second inning by leaving aside the hitters and talk about defense. Now, when I refer to defense, I mean pitching+fielding. Remember, defense is the whole team, the pitchers and the fielders. We’ll worry about how to separate fielders from pitchers in a soon-to-be-asked question. Just not now. Cool?

Let’s say that one team defense allows 10 hits with 3 walks. But they are all scattered, and so actually end up with a shutout. Another team defense allows the exact same number of hits and walks. They even allow them in the exact same way. The only difference is the timing. They allowed them bunched up, and so resulted in eight runs allowed. From a team defense perspective, how do you see them in terms of assigning value?


Build a Better WAR Metric, Checkpoint

In trying to summarize the responses to the three questions, so far, what we have in terms of preference is:
– the event, regardless of the context
– the event, within the context of the whole game state (inning, score, base, out)
– the event, within the context of the base-out state
– and far down the list, the event as it ultimately affects the inning

What the responders therefore are gravitating toward is a purely
content-neutral metric. But, to the extent that we do want to measure the context-specific impact, that should be kept separate, and perhaps not even tied to the player at all. Just a general “timing” bucket.

If we take the case of the triple in the previous thread, in either case, Hamilton and Dyson will get +1 run, because that’s the context-neutral value of the triple, according to Linear Weights.

We immediately add a -0.1 runs because a triple with the bases empty and 0 outs is worth +0.9 runs. So, they don’t want to penalize either guy for getting the triple when they did, and so, to make things add up, we need “-0.1” runs for timing.

Then the three outs, they each get -0.25 runs, as is the standard weight.

So far, we have this:
+1.0 Hamilton
-0.1 timing: limited impact triple
-0.25 batter1
-0.25 batter2
-0.25 batter3

That’s a total of +0.15 runs. But since the inning started at +0.5 runs of expectancy, and we get 0 runs scored, the total has to be -0.5 runs. So, we add another item:
-0.65 bad timing: leaving runner on base

As for the other scenario:
+1.0 Dyson
-0.1 timing: limited impact triple
-0.25 batter1
-0.25 batter2
-0.25 batter3

But, since we actually scored a run, that should come in at +0.5 runs. We need another:
+0.35 good timing: scoring the runner

For a minority, a vocal minority, those “timing” impact runs should be given to the players involved. Looking at the Hamilton one, whereas a generic out is worth -0.25 runs, an out with a runner on third is more costly. So, that -0.65 runs has to be distributed to the three out-makers, for those readers part of the vocal minority. For the readers in the majority, those runs are an after-thought. Maybe they should be considered, so the thing adds up. But, it shouldn’t fall on the shoulders of the players involved. Just a general team bucket to capture the various plays affected by timing.

So, that’s how you build your WAR:

For each player, figure his context-neutral impact as one value, and his “timing” as another value.

Then, the reader can choose whether to include the timing value or not.

Now, on to the pitchers and fielders!


Build a Better WAR Metric, Part 3

Before we talk about baseball, let’s talk about the other three major sports. You’re on your own 20 yard line, you march down field on a series of plays, but ultimately, you punt. Or, you march down field and move far enough for a tough field goal that gets made. All those running and passing plays aren’t considered any differently based on the results. The RB got 25 yards on 4 running plays, and no one matches it up to the end result.

In hockey and basketball, a great pass that doesn’t ultimately lead to a goal or basket goes away like a fart in the wind. No one tracks it, and if they do, it’s not considered anything close to the impact of an assist that led to a score.

Why the difference? I think it’s because of the stop-start nature of football, that the “sequence” ends after each play, and the whole drive is 2-5 football minutes or 5-15 human minutes. In hockey and basketball, turnovers happen often enough and each drive lasts 10-30 seconds in sport or human minutes. I think that’s the reason.

So, let’s talk about the leadoff triple. Billy Hamilton gets on third base, and the next three batters strike out. He’s stranded there, no runs score. A fart in the wind triple? Or something much more tangible? Jarrod Dyson gets on third, the next batter hits a medium fly ball out, far enough to let Dyson score. The next two batters strike out. One run scores. That triple is obviously tangible.

How do you see these two triples?