Pitcher Win Values Explained: Part Two

As we announced yesterday, win values for pitchers are now available on the site. As before, we’re going to go through the process of explaining the calculations that lead to the values you see here on FanGraphs and lay the foundation for understanding what these win values represent.

To start with, let’s take a look at the main input that goes into the win value calculation – a pitcher’s FIP, or Fielding Independent Pitching, which calculates a pitcher’s responsibility for the runs he allows based on his walks, strikeouts, and home runs allowed. The FIP formula is (HR*13+(BB+HBP-IBB)*3-K*2)/IP, plus a league-specific factor that scales FIP to match league average ERA for a given season and league. For the win value purposes, we modified the league specific factor to scale FIP to RA instead of ERA.

Why did we use FIP? I know this a popular question, and it’s something I wrestled with myself. However, what I couldn’t get away from is that we wanted the context sensitivity for the position player and pitcher win values to be as close as possible. wRAA, the offensive input into Win Values for position players, is context-neutral – a hitter does not get credit for his situational performance, such as hitting well with runners in scoring position. Since we aren’t giving hitters credit for situational performance, we can’t give it to pitchers either, in order to maintain the same situation neutral scale.

This is going to lead to some questions – we’re aware of that. Claiming that Javier Vazquez was a +5.2 win pitcher in 2006, when traditional metrics will tell you that he went 11-12 with a 4.84 ERA, is going to be a tough sell. We know.

However, the tangled web of responsibility for run prevention is not accurately unraveled by simply giving pitchers credit and blame for all earned runs and fielders credit and blame for all unearned runs. As most of you know, there are so many extra variables that go into a pitcher’s ERA that the pitcher himself simply doesn’t have control over. We have to try to extract the pitcher’s responsibility from his team’s run prevention while he’s on the mound. Using ERA or RA simply adds too many non-pitcher factors into the equation to the point that we’re no longer just evaluating the pitcher.

FIP removes defense from the equation by only looking at three factors that a pitcher has demonstrable control over – walks, strikeouts, and home runs allowed. By using FIP, we’re isolating the pitcher’s core abilities and evaluating him based on those skills. Now, we’re not claiming that FIP captures everything a pitcher is responsible for. It is not the perfect context-neutral pitcher run modeler – we know that. But when confronted with a choice of including way too many non-pitcher inputs or leaving out a few minor actual pitcher inputs, the latter was the better choice. You will get more accurate win values for a pitcher using FIP than you will ERA or RA.

Getting back to Vazquez for a second – his 2006 FIP was a full run lower than his ERA. The driving forces behind his struggles were a .321 BABIP and a 65.8% LOB%. Most everyone would agree that we don’t want to penalize him for poor defense played behind him, but how do we untangle the responsibility for the lack of stranded runners? Vazquez was horrible with men on base in ’06, but most of that was BABIP related – a .343 BABIP with men on versus a .284 BABIP with the bases empty. If we’re going to say that he’s not responsible for his high batting average on balls in play, and the batting average on balls in play was responsible for the lack of runners stranded, than how do we remove the former but not the latter? This is what I mean by a tangled web of responsibility in terms of run prevention.

If you wanted to make the argument that the context-sensitive stuff, such as how often a pitcher leaves runners on base, should be included, then you also need to be prepared to fight for WPA/LI as the offensive metric of choice for hitter win values. And honestly, I won’t put up much of an argument – there’s a case to be made for context-sensitive win values as a useful metric, and I’d imagine there will be a day that those are publicly available too. But, there’s a more compelling argument for context neutral win values, which is what we’ve decided to present here. What most of us are interested in knowing is how well a player performed in helping his team win, regardless of the performance of his teammates. To answer that, we have to strip out as much context as we can.

Think of FIP as the pitcher version of wRAA. wRAA doesn’t include non-SB/CS baserunning or situational hitting. FIP doesn’t include batted ball data or situational pitching. Neither are perfect, but but both give us the vast majority of the context-neutral picture.

That doesn’t mean that we’re set in our ways and that these win values will never be improved upon. If and when a new metric like tRA is proven to be significantly more effective in valuing pitchers (and I’m hopeful that it will be, given more data exploration on the topic), we won’t be standing here as guardians of the infallibility of FIP. We want to get to the truth, and do so as quickly and as accurately as possible. I will encourage you (especially those of you in the “tRA is awesome/FIP sucks” camp), though, to not let minor differences cause you to miss the fact that FIP and tRA lead to very similar results.

This afternoon, we’ll talk about replacement level for pitchers, how it differs for each league and role, and how we tackled the issue.





Dave is the Managing Editor of FanGraphs.

59 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
snowshoe
15 years ago

Pitcher win value is another nice addition to the site and you guys should be commended for your efforts. The rate of innovation here at Fangraphs is very impressive.

That’s said, I do have concerns over the validity of the pitcher win value. FIP was a nice advancement at its time, but it is fundamentally flawed statistics that doesn’t represent the underlying construct it is intended to. It is just not conceptually sound and no statistic can make up for a lack of conceptual grounding. Again, not a knock on Tom Tango – it was a huge step forward. But it’s not a high quality statistic and in that regard it is not like wRAA at all.

It simply leaves too many important variables unaccounted for.

Right now tRA is simply a much better statistic. It’s much better conceptually rooted and allows for a more valid appraisal of a pitcher’s intrinsic skills. Don’t know if Fangraphs has had any conversations with StatCorner but replacing FIP with tRA here and incorporating it into pitcher win value would be an enormous step forward in the quantitative analysis of pitching.

snowshoe
15 years ago
Reply to  Dave Cameron

I honestly have absolutely no stake in any “war” and am not trying to create one. I also have no past experience with any “war” and have no reference for it. I’m not a sabermetrician so perhaps this is the kind of insider issues that come up in any field. But I’m not aware of them.

I’m simply stating my opinion. Not sure what the cause of the antagonism is.

It’s my opinion that FIP is not conceptually well grounded as its missing important sources of variance. It’s not an adequate model of the underlying reality it is trying to represent. That’s a conceptual issue more than it is a statistical one. Based on my understanding of the game that’s my take on FIP as a model. I hardly think I’m the only one that feels its inadequate to try to describe something as complex as a pitchers intrinsic talent/performance through 5-6 variables, among which line drive rate, GB rate, etc. aren’t included.

You may disagree but I feel that the distribution and kind of contact a pitcher allows is rather important in analyzing his effectiveness. That’s not captured by FIP so I find that to be a serious conceptual limitation to the statistic.

Again you may disagree. But that has nothing to do with me trying to stoke some “war” I have no knowledge of. Rather, it’s just a point of concern.

tRA is also limited in certain way – but less so than FIP as its face validity is higher. That’s my opinion.

If you want people to simply agree with you and say what you’re doing here is beyond reproach that’s fine.

But most vigorous analytic communities aren’t like that. Ones that are stultified and narrowing as they become dogmatic often are.

snowshoe
15 years ago
Reply to  Dave Cameron

Also, I should add nowhere did I say FIP was “useless.” Those were your words. Not mine. I said the statistic is flawed and limited. Big difference.

David Appelman
15 years ago
Reply to  Dave Cameron

The way I see it, FIP gets you most of the way there, and is more or less linear weights on HR, BB, SO, and HBP. If you think these are the things a pitcher can control, great, if you think there are more, (like GB and FB, or whatever else), that’s fine too.

It’s my understanding that tRA adds the GB/FB and LD (regressed) and does some other stuff, but it’s more or less linear weights too.

This is why in the other thread I considered FIP a middle ground (though it might not be), because HRs are a pretty important part of a pitcher’s real world performance and FIP considers them, while tRA I don’t believe does.

Samg
15 years ago
Reply to  Dave Cameron

Why not use wOBA against? That way hitters and pitchers are exactly equal.