Build a Better WAR Metric, Part 8

The worst time to strike out is with a runner on third base and less than 2 outs. In fact, it’s runner on 3B and exactly 1 out. The pitcher knows it, the batter knows it, the fielder knows it, the runner knows it. The fans know it. Everyone is extremely aware that getting the second out changes the entire dynamic of the situation, since now only a positive event can score that runner on third base.

We can even quantify this situation rather precisely. With a runner on third and 1 out, we expect around 0.94 runs to score to the end of the inning. But with 2 outs, it’s all the way down to 0.37 runs. A strikeout in that situation is worth an astounding -0.57 runs.

On the other hand a sacrifice fly clears the bases, gets an extra out (run expectancy down to 0.09 runs), but adds a run to the scoreboard, so the total run value is 1.09 runs, or a net gain of +.15 runs. The sac fly is one of the very few times that trading a base for an out is a net positive.

We have a potential swing of 0.72 runs between a “bad” out and a “good” out. With everyone in the ballpark well aware of the situation, and with outs so common place, it’s clear the pitcher is going to do his best to pitch in a way to increase strikeouts at the cost of perhaps an increase in low impact walks with first base open. And the batter is going to do his best to counteract that, and not try to strike out at the cost of perhaps a tiny bit of power. Everyone is changing their strategy.

But remove the specific context, and an out is an out.

So, how do you want to account for this specific play?


Build a Better WAR Metric, Part 7

So far, we’ve looked at seeing the overall impact be the same, but for different reasons, like the bases loaded walk versus the solo HR. And you are either given a choice of “same” or “this one is better”. It’s been designed to see if you prefer a DIRECTION. In the above case, the direction is based on whether the event matters.

Now, we’ll have two situations that have an overall different impact. We have our trusty solo HR. We all love the HR. It gives us a guaranteed run. It tells us about the hitter, and if given better circumstances, we can dream of even more runs.

If I ask about a double with a runner on second, I’d get into the same question as with the bases loaded walk: the impact is exactly one run, and we’re left with the same state that we entered. And 80% of Fangraphs readers will prefer the solo HR to the double with the runner on 2nd.

But how about a bases-clearing double? Any way you want to measure it, the impact is going to be alot(*) more than the 1 run from the solo HR. Run expectancy tables tell us it’s 1.7 to 2.2 runs depending on the number of outs.

(*) “Get over it.” — Scalia

So, how much of a hurdle are Fangraphs readers willing to climb to keep their allegiance to the event in a context-neutral setting, and ignore the context of a somewhat inferior event, but in a highly more leverageable setting?


Build a Better WAR Metric, Part 6

In the previous post about the two-run reliever with the three-run lead, the Fangraphs readers leaned 80-20 toward “a run is a run”, compared to “he did his job”.

Let’s make this one similar, but tougher. We have our ace reliever entering the bottom of the 9th inning. This time, our ace reliever gives up a leadoff HR, followed by striking out the side.

We have two different relievers:
(a) we have Billy Wagner entering the game with a 2-run lead, and so, managed to squeak out of it to end the game. If you are interested, his team had a 90% chance of winning before he showed up. And the Astros won.

(b) we also had Trevor Hoffman face the same scenario with a 1-run lead. He leaves the game tied making the Padres go into extra innings, turning an 80% chance of winning into a 50% chance.


Build a Better WAR Metric, Part 5

When the home team enters the top of the 9th with a 3-run lead, they will win that game 98% of the time. That happens mostly because they get to pick and choose the reliever they want. If they chose a random reliever, they’d win 97% of the time. If they chose a poor reliever, they’d win 96% of the time. It’s pretty tough to mess up a 3-run lead, especially when the home team gets one more crack at it in the bottom of the 9th.

So, we have a SP that went 8, and he hands off to the reliever this 3-run lead. The ace reliever comes in. Let’s call him Armando Benitez. He walks the first batter, allows a HR to the second, then strikes out the side. The game ends, and his team wins. Armando even gets a “save”, whatever that is supposed to imply.

Since he was given a 3-run rope, and he only used 2-runs, he was able to turn a 96% or 98% chance of winning into 100%, all without the help of his fielders. Incredibly, things could have gotten worse, which does happen 2 to 4 percent of the time. In this case, he pitched just bad enough to win.


Build a Better WAR Metric, Part 4

First thanks so much for the tremendous responses, both in the comments and just the participation in the polls. There’s been 700 to 1200 votes in each poll. Just overwhelming responses.

For now, let’s start the second inning by leaving aside the hitters and talk about defense. Now, when I refer to defense, I mean pitching+fielding. Remember, defense is the whole team, the pitchers and the fielders. We’ll worry about how to separate fielders from pitchers in a soon-to-be-asked question. Just not now. Cool?

Let’s say that one team defense allows 10 hits with 3 walks. But they are all scattered, and so actually end up with a shutout. Another team defense allows the exact same number of hits and walks. They even allow them in the exact same way. The only difference is the timing. They allowed them bunched up, and so resulted in eight runs allowed. From a team defense perspective, how do you see them in terms of assigning value?


Build a Better WAR Metric, Checkpoint

In trying to summarize the responses to the three questions, so far, what we have in terms of preference is:
– the event, regardless of the context
– the event, within the context of the whole game state (inning, score, base, out)
– the event, within the context of the base-out state
– and far down the list, the event as it ultimately affects the inning

What the responders therefore are gravitating toward is a purely
content-neutral metric. But, to the extent that we do want to measure the context-specific impact, that should be kept separate, and perhaps not even tied to the player at all. Just a general “timing” bucket.

If we take the case of the triple in the previous thread, in either case, Hamilton and Dyson will get +1 run, because that’s the context-neutral value of the triple, according to Linear Weights.

We immediately add a -0.1 runs because a triple with the bases empty and 0 outs is worth +0.9 runs. So, they don’t want to penalize either guy for getting the triple when they did, and so, to make things add up, we need “-0.1” runs for timing.

Then the three outs, they each get -0.25 runs, as is the standard weight.

So far, we have this:
+1.0 Hamilton
-0.1 timing: limited impact triple
-0.25 batter1
-0.25 batter2
-0.25 batter3

That’s a total of +0.15 runs. But since the inning started at +0.5 runs of expectancy, and we get 0 runs scored, the total has to be -0.5 runs. So, we add another item:
-0.65 bad timing: leaving runner on base

As for the other scenario:
+1.0 Dyson
-0.1 timing: limited impact triple
-0.25 batter1
-0.25 batter2
-0.25 batter3

But, since we actually scored a run, that should come in at +0.5 runs. We need another:
+0.35 good timing: scoring the runner

For a minority, a vocal minority, those “timing” impact runs should be given to the players involved. Looking at the Hamilton one, whereas a generic out is worth -0.25 runs, an out with a runner on third is more costly. So, that -0.65 runs has to be distributed to the three out-makers, for those readers part of the vocal minority. For the readers in the majority, those runs are an after-thought. Maybe they should be considered, so the thing adds up. But, it shouldn’t fall on the shoulders of the players involved. Just a general team bucket to capture the various plays affected by timing.

So, that’s how you build your WAR:

For each player, figure his context-neutral impact as one value, and his “timing” as another value.

Then, the reader can choose whether to include the timing value or not.

Now, on to the pitchers and fielders!


Build a Better WAR Metric, Part 3

Before we talk about baseball, let’s talk about the other three major sports. You’re on your own 20 yard line, you march down field on a series of plays, but ultimately, you punt. Or, you march down field and move far enough for a tough field goal that gets made. All those running and passing plays aren’t considered any differently based on the results. The RB got 25 yards on 4 running plays, and no one matches it up to the end result.

In hockey and basketball, a great pass that doesn’t ultimately lead to a goal or basket goes away like a fart in the wind. No one tracks it, and if they do, it’s not considered anything close to the impact of an assist that led to a score.

Why the difference? I think it’s because of the stop-start nature of football, that the “sequence” ends after each play, and the whole drive is 2-5 football minutes or 5-15 human minutes. In hockey and basketball, turnovers happen often enough and each drive lasts 10-30 seconds in sport or human minutes. I think that’s the reason.

So, let’s talk about the leadoff triple. Billy Hamilton gets on third base, and the next three batters strike out. He’s stranded there, no runs score. A fart in the wind triple? Or something much more tangible? Jarrod Dyson gets on third, the next batter hits a medium fly ball out, far enough to let Dyson score. The next two batters strike out. One run scores. That triple is obviously tangible.

How do you see these two triples?


Build a Better WAR Metric, Part 2

Ok, you guys have spoken, and you don’t want a bases loaded walk to count the same as a solo HR. That even though the base-out state before the event and after the event remain unchanged, and that the number of runs now in the bank are the same, the WAY it happened matters to most of you. Therefore, we are NOT trying to preserve the runs, we are not trying to make sure the runs add up. You have been clear on that.

Now, let’s talk about “preservation of wins”. It’s a 0-0 game, the bottom of the 9th, the bases are loaded with two outs. Historically, at this point in the game, the batting team would end up winning 68% of the time. It’s a high stakes situation, a Leverage Index of 6.4. And the batter walks. The batting team wins, game is over. Ooops, I meant the batter hit a single. No, wait, it was a Grand Slam. No, wait it should have been a Grand Slam, but Robin Ventura decided to abandon the bases after he reached first base. Regardless, the game is over, and the batting team won as soon as the batter touched first base.

Your question:


eBay’s Five Most Marvelous and Currently Available Ballcaps

It’s become a practice of the present author in recent years to begin in February a painstaking search for the new ballcap that will express his entire being. It’s also become a practice in recent years to parlay that search into web content so that the author might “remain” “employed.”

Two years ago, this pursuit yielded a Winston-Salem Spirits cap from 1994 with a weird red sun and melancholy eagle on it. Last year, I had the fortune of procuring a handsome Diablos Rojos cap from the actual team store at Parque Fray Nano in Mexico City. In each case, I have documented the relevant search for the benefit of posterity — even if posterity has failed to show any real interest in my work.

Last week, the author began this year’s edition of the search. What follows is the second installment of the newest volume.

To wit:

A Town

Atlanta Braves A-Town Corduroy 5-Panel Velcro Snapback (Link)
Style: Snapback
Time Left: 23 days, 5 hours
Cost: US $17.99 (Buy It Now)

While it’s difficult to conceive of a scenario in which one would voluntarily acknowledge any sort of affiliation with the city of Atlanta, this cap is almost certainly the best means by which to do it. The low crown preserves one from the awkward fit offered by other new caps. As for the corduroy, it’s the most expedient way to announce publicly that, regardless of what these so-called “credit reports” suggest, one has plans to acquire a sweet conversion van in the near future.

Read the rest of this entry »


Building a Better WAR Metric

Wins Above Replacement (WAR) has as its genesis Bill James, even if Bill might not necessarily take the credit (or blame based on some readers) for it. But make no mistake, Bill provided the plumbing for it. For those interested, you can read Brandon Heipp’s account on that backstory.

When you put all the plumbing together, you can create a framework. And that’s what WAR is, a framework to provide an estimate. Wins Above Replacement is an estimate of… something. What that something is is different for every person. While the currency is wins, it’s not clear what those wins represent. There are reasonable choices you can make along the way. And for every fork in the road you take, you may diverge yourself from the next guy. This is why WAR can never be one thing.

As a framework, WAR leaves little room for discussion. Whether it’s what you see at Baseball Reference or at FanGraphs or openWAR or (to some extent) at Baseball Prospectus, they have as their framework the WAR that was championed on my old blog, which culminated with this article. But a framework is not the same as an implementation. 95% of the cars on the road all follow the same core design. That’s the framework. But a Chevy is different from a Lexus. Those are implementations. And there are as many implementations of WAR as there are baseball fans. Whereas Baseball Reference and FanGraphs and the others provide a consistent, systematic implementation, most fans have their own personal mish-mash of arbitrary, biased, and capricious combination of stats, which can change as their mood fits.

This series of articles, of which there may be a dozen(*) is an effort to try to come up with a WAR metric that will satisfy the Straight Arrow readers.

(*) I have no idea. This is the first one I’ve written.

***

I’ll ask you a series of questions, starting now. The openWAR guys talk about “preservation of runs”. That is a good starting point, and a great way to describe the concept. So, the question centers around whether we want to make sure that everything adds up at the play level. If you get a bases-loaded walk, do we want to make sure that exactly one run is accounted for or not?

If you care about “talent”, you just want to account for around +.30 runs for offense (and -.30 runs for defense), because you don’t want to be concerned with the specific base-out state. (We’ll talk about “preservation of wins” in a later question.) Similarly, is a bases-empty walk and bases-empty single the same thing or not? And if you want to preserve runs, are you ready to accept a bases-loaded walk and a solo HR as being the exact same thing?

So, have a discussion, and then answer this poll question:

There are plenty of other discussion points that go into building an implementation of WAR, and we’ll get to those in the future. For this post, I’m interested to hear what you guys think about this issue specifically.