Evaluating Pitchers as a Concept: Average, Replacement Level, or Just Totals?

Don’t read further until you’ve already participated in this thought-exercise Cy Young poll.

***

The intent of this poll was to determine how the Fangraphs readers evaluated pitchers with different amounts of innings pitched. Basically, how does a reader balance better rate stats with lesser playing time stats?

The Pitcher X that all other pitchers were being compared to had a 13-4 W/L record, 2.00 ERA, and 153 IP. The league average pitcher has a 4.00 ERA. (You can presume that all the other missing stats would be consistent with that kind of W/L and ERA record.)

The reader was presented with a sample list of pitchers in the league, and was asked to rank Pitcher X in this continuum:
Pitcher A: 20-4, 1.50
Pitcher B: 19-5, 1.80
Pitcher C: 18-6, 2.10
Pitcher D: 17-7, 2.40
Pitcher E: 16-8, 2.70
Pitcher F: 15-9, 3.00
Pitcher G: 14-10, 3.30
Pitcher H: 13-11, 3.65

Each of these pitchers had 216 IP. (You can presume here as well that all the other missing stats would be consistent with that kind of W/L and ERA record.)

The list was chosen so that a person could place Pitcher X behind any pitcher, and be able to justify it, to some extent or other.

Do you believe only in ERA? Then you can place the pitcher as high as between Pitcher B and Pitcher C. Do you believe only in IP? Then you can place the pitcher below Pitcher H. Do you believe only in W? Then you can place him below Pitcher G, maybe Pitcher H. Do you believe only in L? Then you can place him below Pitcher A.

The more you start to combine the various stats, the wins, losses, ERA, and innings pitched, then the more decisions you have to make.

Suppose for example that you simply believe in comparing to the league average. In that case, you can rely just on the won-loss differential. Since Pitcher X was a +9, then you’d slot him between Pitcher D (+10) and Pitcher E (+8). Twenty-nine percent of readers chose won-loss as their answer, and therefore implied that their baseline is the AVERAGE pitcher.

Suppose that you believe in making the baseline as 2 points for a win and minus 1 point for a loss (that is, your zero point is a pitcher with a 1-2 or 8-16 record). In that case, Pitcher X (13-4, or 22 “points”) would fit snugly between Pitcher E (16-8, or 24 “points”) and Pitcher F (15-9, or 21 “points”). Thirty percent of readers chose that baseline as their answer, and therefore implied that their baseline was around a .300-.400 pitcher. That is, they treated that as their personal replacement level.

Another way to say the same thing here is to give each pitcher 0.350 wins for each missing game. So, our Pitcher X, being 7 games short, would get around 2.5 wins and 4.5 losses. That bumps his record from 13-4 to an EQUIVALENT 15.5-8.5. As you can see, that puts him snugly between the 16-8 pitcher and the 15-9 pitcher.

But what if you think that giving a pitcher 0.350 wins for each missing game is still too much. What if instead you want to give a pitcher 0.200 wins for each missing game. In that case, your Pitcher X would get an extra 1.5 wins and 5.5 losses, to bump his 13-4 record to an EQUIVALENT 14.5-9.5 record. That places him right between Pitcher F (15-9) and Pitcher G (14-10). Fifteen percent of readers chose 0.200 for a missed hame as their option.

We’ve captured the implied decisions of 75% of the readers.

But what about the rest? Well, 10% of the readers chose to place Pitcher X between Pitcher C (18-6) and Pitcher D (17-7). This would mean that his 13-4 record is equivalent to 17.5-6.5. Pitcher X was given an extra 4.5 wins and 2.5 losses. In this case, these readers are more interested in his rate stats, his 2.00 ERA and its equivalent 0.765 win%. Since the Cy Young is an award for Most Outstanding Pitcher, these readers reasoned (or their answers may have an implied reasoning) that only a pretty good pitcher would post a 13-4, 2.00 record, and so, they set the extra games equivalency at something between what he posted and league average.

Then we had 3% of the readers that placed Pitcher X (2.00 ERA) between Pitcher B (1.80 ERA) and Pitcher C (2.10 ERA). Those readers simply ignored the IP, and went completely on the rate stats.

On the other end, we had 5% of the readers that placed him between Pitcher G (14-10) and Pitcher H (13-11). In essence, the readers relied almost entirely on the wins, and ignored the losses almost completely.

The readers are broken down into four fairly distinct groups:
1. they like their rate stats mostly
2. they like to compare to league average
3. they like to compare to replacement level
4. they like their IP (counting) stats mostly

Hence the reason we get so many arguments, because each group has decided to use their own perspective. If someone preaches at the alter of rates stats, and the other at the alter of innings pitched, then they won’t be able to compromise. Even those that are closer in concept – choosing between league average and replacement level – have pretty much drawn a line in the sand.

For what it’s worth, the overall average of the respondents was to make Pitcher X the equivalent of 15.8 wins. This means that his missing 7 games was given 2.8 wins, which is a 0.400 win% record. And this is extremely close to the replacement level that I choose for starting pitchers.

While I’d like to say we have a consensus, the answers being so far-ranging would really limit that conclusion. Only a third of the readers actually evaluated the pitcher in terms of this replacement level.

Anyway, consider this poll as my introduction to readers to the concept of replacement level, without actually directly talking about replacement level.





21 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Telo
12 years ago

I like this series. To me, the material difference between hitters and pitchers in this exercise is the effect of a high IP total on the bullpen and other pitchers on the team. I have a feeling that I am overestimating this, and that my personal anecdotal evidence of emptying your bullpen to finish a game isn’t quite as impactful as it feels, but it’s definitely a non-zero factor. Whereas for the hitter in the first exercise, to me it was merely a matter of “how many runs over replacement did this player contribute while he played” – that was simply his value to the team – in this second exercise there is some additional inherent value to innings pitched per game in this scenario. That’s how I approached it, at least. 80% of the pitcher’s value derived from runs saved over replacement, 20% for the “Roy Halladay Factor”.

Telo
12 years ago
Reply to  Telo

Which brings up an interesting point in that we don’t know how exactly the 150 IP are distributed versus the 216 IP. They could both go an average of 6 1/3 innings into games, which by my definition would have an equal effect on the bullpens and erase the Roy Halladay Factor from the equation (though you could aruge that the replacement level pitcher replacing X would NOT go 6 1/3 deep on average). I just assumed the 216 was going deeper into games, and credited him for it, which looking back on it is a pretty faulty assumption. In any case, not the overriding factor, I voted on replacement level and that probably would’ve kept my answer the same, regardless of how I handled IP.