On Context, or Evaluating Hitters and Pitchers Differently

Here at FanGraphs, our pitching WAR is built around Fielding Independent Pitching, which focuses solely on a pitcher’s walks, strikeouts, and home runs allowed. Because it ignores the results of balls in play and the order in which results occur, there are occasionally big differences between a pitcher’s FIP and his ERA. This divide often leads to some consternation when a pitcher with a high ERA posts a decent WAR, or in reverse, when our WAR doesn’t grade out a pitcher with a very low ERA that highly.

A significant number of people — including a good chunk of our own readers, and noted sabermetric evangelists like Brian Kenny — prefer to evaluate pitchers by runs allowed because, as I’ve heard repeatedly over the last few years, that measures “what actually happened”. And that’s one of the reasons we have RA9-WAR here on the site, as we know that a sizable amount of people prefer to evaluate pitchers in that way.

I believe there are valid points on both sides, and I see the argument using a FIP-based WAR and a RA9-based WAR when evaluating a pitcher’s past performance. However, I find it interesting that this debate has not carried over to position players, where there seems to be broad consensus* that context-neutral is the way to go.

*Allowing for the fact that there was definitely some positive response to my article on Context Batting Runs last week, my feeling is that there’s still not much of a push towards this kind of evaluation for hitters.

It’s not even just those of us who subscribe to a linear weights based WAR, like we use here on FanGraphs. Even just looking at a player’s standard batting line, or using BA/OBP/SLG with some adjustments for playing time, or OPS+; these are all offensive evaluations that consider only the number of events that happened, not the situation in which they occurred. And there is essentially broad agreement that these are the best types of measures to use when evaluating how well a hitter contributed to his team’s performance.

The only context-specific statistic that has any real traction is RBIs, and the sabermetric community — myself included, so I’m not pointing fingers here — has spent years explaining why using RBIs to evaluate a player’s contributions is a misuse of statistics. RBIs are something of a pariah in the analytic community, shunned because they are a team statistic masquerading as an individual measure. For valid reasons, they’ve been marginalized by sites like this one, almost completely removed from the discussion of player value among the “new school” crowd.

These decisions on how to value player performance are incongruous. If you use something like wOBA to evaluate a hitter, you are making a conscious decision to ignore the order of events that “actually happened”. However, when evaluating a pitcher by runs allowed, you’re making the same decision to include sequencing factors and hold the pitcher entirely responsible for the order in which events occurred.

For instance, here are two hypothetical innings, with only the order of sequence changed.

Scenario A: Single, single, homer, fly out, fly out, fly out.

Scenario B: Homer, single, single, fly out, fly out, fly out.

By runs allowed, the pitcher in Scenario A is charged with giving up three runs, while in Scenario B, he’s either giving up one or two, depending on whether or not the guy who hit the first single was fast enough to tag up and reach third on the first fly out, and then whether the second fly out was hit deep enough to score him. His FIP for the inning would be 16.04 in either case, but his ERA could be 9.00, 18.00, or 27.00, depending the order of sequence and the speed of the inning’s second hitter.

By wOBA, these two innings are exactly the same, as it simply sees six individual events, giving out credit for each one without regard for what became before or after. The third batter in Scenario A will get the same credit as the first batter in Scenario B, because they both hit home runs, and because the measure is context-neutral (by design), it will simply give them the average credit for a home run based on the expected distribution of when home runs occur. wOBA, like FIP, sees these two innings as equal, both from the perspective of the hitter and pitcher.

ERA and RA9 would see these innings quite differently, assigning three runs to the pitcher in Scenario A — and either one or two in Scenario B — because that is how many runs were scored while he was pitching. The sequence of events is a significant factor in how the pitcher is valued.

This isn’t to say that one or the other is definitively right or wrong. There is some evidence that pitcher sequencing is at least partially a skill. Guys like Jim Palmer and Tom Glavine accumulated an extra +13 wins from sequencing during their careers, while Nolan Ryan was an amazing 34 wins in the negative based on the order of events. If a pitcher struggles to pitch out of the stretch — or conversely pitches in such a way that he strands more runners than you might expect from his overall numbers — then there is an argument that the pitcher ought to be credited or blamed for those results.

Of course, the answers aren’t always that clear cut, especially when we’re not looking at a player’s entire career. In the two scenarios above, we really don’t yet know how much credit or blame the pitcher should receive for the two singles that occurred. They could have been scorching line drives that no fielder had a chance to make a play on, or they could have been routine ground balls that rolled into the outfield because the defenders behind the pitcher have the range of a potted plant.

Both RA9 and FIP respond to this uncertainty by taking polarizing extremes, with FIP giving the pitcher no responsibility for the hit and RA9 giving the pitcher all the responsibility. Both are clearly wrong. There are ways to attempt to adjust for defensive performance, as Baseball-Reference does with their version of pitcher WAR, but they require some huge assumptions about the consistency of team defensive performance that is also clearly wrong, and gets away from that “what actually happened” point of origin.

But I’m getting a little off course here. The goal of this article isn’t to argue that a FIP based WAR or a RA9 based WAR is superior. I simply think it’s worth pointing out that using a linear weights based WAR for position players — which pretty much every popular WAR implantation uses, including ours — is inconsistent with using an ERA/RA9 based WAR for pitchers. If you use that combination of metrics, you are giving hitters no credit or blame for the contexts in which their performances occurred, but you are giving the pitcher full credit or blame for those same situations.

And, based on what we know about how to distribute credit for how balls in play become hits, this is probably backwards. We are pretty sure that, when a hitter gets a hit, he is the only offensive player who deserves credit for that outcome. However, when a pitcher gives up a hit, we often do not know whether it was his fault or whether it was a failure of his defense. And yet, if our scenarios included a bases clearing double instead of a home run, we would assign the pitcher (through ERA/RA9) the full blame for the two runs that scored on that double, while only giving the hitter credit for the average run value of all doubles, ignoring the fact that he drove in two runs in the process.

Again, I see the argument for using both context neutral and context dependent statistics to evaluate player performance, especially when we are looking backwards and asking questions of past value. There is a difference between trying to isolate skills and trying to measure the value of events that have already occurred. I just think that maybe we, as a community, should consider evaluating position players and pitchers the same way.

In this way, wOBA and FIP are similar, which is one of the reasons why we use FIP as our basis for pitching WAR. With a linear weights model for both hitters and pitchers, we are attempting to evaluate pitchers based on the number of events we can credit or blame them for, and not measuring the sequence in which those events occurred. If one’s preference is to use RA9-WAR, then I’d suggest that perhaps it would be more fair to also evaluate hitters based on situational performance, which would lead to relying on something like RE24 for offensive performance.

It is worth noting that RE24 here on FanGraphs isn’t a perfect replacement for Batting Runs in the WAR calculation, because RE24 also includes SB/CS, it’s more like RE24 replaces Batting Runs and the wSB part of our Baserunning measure. However, depending on future interest in this kind of calculation, it is possible to build a version of RE24 that doesn’t include any baserunning, and that could simply be subbed in for context-neutral batting runs if there was a desire to build a version of WAR for position players that modeled the way RA9 treats situational events for pitchers.

But then again, there’s also a school of thought that there are already too many versions of WAR going around as it is. Most people I talk to want fewer WARs, not more. The problem is that we’re not always asking the same question, and at the end of the day, answering questions is the entire reason we have analytical data to begin with. While I’m not necessarily advocating for one position or the other, I do think it’s worth pointing out that the currently popular versions of WAR for hitters do not answer the same question that a runs allowed based WAR for pitchers seeks to answer.

If you prefer RA9-WAR for pitchers, you’re essentially asking a different question than you are when you use WAR for position players. It’s worth considering whether that’s a problem we’re okay with, or whether that is an argument for either using a FIP-based WAR for pitchers — the conclusion we came to when we built WAR here on FanGraphs — or creating a new version WAR for position players that gives them credit or blame for their situational hitting.

Otherwise, combining a linear weights based position player WAR with a runs allowed based pitcher WAR creates a bit of a paradox. Maybe we’re okay with that, but we should probably at least be aware that it’s what many people who use that combination of WARs are doing.

We hoped you liked reading On Context, or Evaluating Hitters and Pitchers Differently by Dave Cameron!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs




Dave is the Managing Editor of FanGraphs.

newest oldest most voted
Will
Guest

On a very simple level, it makes sense to evaluate pitchers and hitters differently because they have different ways of impacting the game. Hitters get a small amount of non-consecutive at bats each game, while pitchers face a continuous lineup. Context neutral makes more sense for hitters because they don’t contribute to the context. Pitchers, however, often make the bed they sleep in, so there’s value in judging them on the bigger picture.

tz
Guest
tz

Best argument I’ve heard on why they should be evaluated differently.

channelclemente
Guest
channelclemente

A nuance that seems very important.

tz
Guest
tz

This is kind of analogous to the reason that pitchers’ BABIP is assumed to have a “universal” mean, while batters’ BABIPs vary based upon each hitters skill level. Any pitchers’ sample size of batters faced grows fairly quickly and is close to uniformly distributed between hitters batting in each of the 9 spots in the batting order.

tangotiger
Guest
tangotiger

The issue with your statement is that you are ignoring fielders from the sequencing issue. If a team gives up 4 straight singles, a run-based metric would assign all of that (and it’s results, like runs allowed) to the pitcher, while a FIP-based metric would assign all of that not to the pitcher.

And the truth is somewhere (unknown) in the middle. Which is Dave’s point.

matt w
Guest
matt w

Then it’s not a point that’s well served by comparing Scenario A and Scenario B. The difference between Scenario A and B is not, or need not be, in the fielders’ contribution; they could consist of the exact same balls hit to the exact same spots on the field and fielded in the exact same way. The difference is in the sequencing of a series of events, to all of which the pitcher has at least some contribution. Whereas the context a hitter faces will almost always be determined by events to which he made no contribution.

There are at least three factors I can see at work here, over and above the players’ skill.
1. The contribution of a player’s teammates to an event.
2. The contribution of the player’s opponent (and luck) to an event.
3. The context in which the event took place.

For hitters, number 1 is usually negligible, and number 3 is usually beyond the hitters’ control. So we want to use “context-neutral” metrics which discount 3, and the only question is how we account for 2 (whether we do something like normalize BABIP in order to take out the contribution of the opponents’ defense and batted-ball luck). I suppose park factors might be a fourth element that we’d usually want to account for.

For pitchers, 1 is quite large because of the contribution of the defense on batted balls. And 3 is at least in part dependent on what the pitcher did. While we’d like to be able to correct for 1, it’s not clear how much we ought to correct for 3, given the pitcher’s responsibility for 3. FIP corrects for 1 and 3 by eliminating defense and sequencing. rWAR corrects for 1 by explicitly normalizing against team defense but doesn’t correct for 3. But if you’ve come up with a way of allocating responsibility for defensive events between the pitcher and defense, it’s not clear that you’re obliged to ignore sequencing once you’ve done that.

In the post Dave seems to be arguing that it’s inconsistent to use linear weights for batters and RA9-based stats for pitchers, because it gives credit to the pitchers for the context in which they pitch and no credit to the batters. But the batters in fact deserve no credit — none — for the context in which they hit, and the pitchers deserve some credit (but not all the credit) for the contexts in which they pitch. It makes sense to draw a distinction there. We want some way to separate the pitchers’ responsibility from the fielders’ responsibility, but we might not need to go all the way to a linear-weight-based metric to do that.

db
Guest
db

But why do we give batters the complete credit for a single, double or triple which are influenced as much by the fielders when he hits the ball as when the pitcher pitches it? It’s that reason that I think FIP for pitcher WAR misses the point.

Stathead
Guest
Stathead

If I worked for a team, I would look at contact quality, bat speed, batted ball speed, angle off the bat, chase percentage, etc. to evaluate a player and probably less off results. That being said, it probably balances out more in the long term in the batter’s case because they’re hitting to different defenses from game to game, so over the course of a season it’s about league average, whereas a pitcher is pitching to the same defense every game (just about). It’s the same reason that HR/FB ratio and BABIP balance out for pitchers.

Well-Beered Englishman
Guest
Well-Beered Englishman

The only example I can think of, of a hitter affecting sequencing in an inning, is baserunning. Whether it’s making a pitcher nervous, drawing 10 pickoff attempts, stealing, being tasked with a hit-and-run, or the old bogeyman of stealing signs, I can imagine many examples of batters affecting other batters’ contexts but all of them involve reaching base first.

Maybe if you have a 15-plus-pitch at-bat and simply wear the pitcher out, some benefit will come to the following hitter. This could be a database enquiry for a bored soul.

B N
Guest
B N

There is actually some studies, if I recall, that show there is value in taking more pitches for that reason. I’m also relatively sure the Red Sox used that lineup design with some effectiveness a few years ago, which was useful even on better pitchers (e.g., if you can’t beat em, at least make them throw pitches and hope they get pulled for a reliever). I would assume that it increases WAR by some degree, though I’m not sure how you’d quantify it.

DJG
Guest
DJG

Right. If the same batter hit in all nine spots in the lineup (ghost runners!), then runs score would probably be the only metric people would use to evaluate a hitter.

Whether you think this would be a good measure or a bad measure, it does, I believe, explain the asymmetry Dave discusses in the article.

B N
Guest
B N

This. Exactly the same thought that I had: a batter doesn’t get to hit, then go to bat again. As a batter, you don’t get to say, “Hey, I’ll just try to settle for a single, because I’m sure I can drive myself in on my next at bat.” As a pitcher, you can basically do exactly the converse of that by pitching around one guy to get to another guy that you think you can get a double-play off of.