Why Replacement Level?

This morning, we announced that we’d come to an agreement with Sean Forman of Baseball-Reference on a unified replacement level, allowing both WAR calculations to measure players on the same scale. In that post, though, I didn’t spend much time talking about why replacement level is the baseline to begin with, so I thought it was worth taking a little bit more time to talk about why replacement level is a necessary concept.

After all, we’re all used to metrics that use average as a baseline, and average is easily defined and measured. Why not just measure everyone against league average, and use WAA instead of WAR?

There are two answers to that question, really. Let’s tackle them in order of importance, beginning with the playing time issue.

If the baseline is set to average, then the results of the metric will actually favor inferior talents whose lack of skills convince their manager to keep them chained to the bench in lieu of better players who actually take the field on a semi-regular basis. The 25th guy on the roster will end the season with a WAA very close to zero, but the guy who is good enough to start ahead of him but not quite produce at a league average rate will end up with a negative WAA, as he accumulated enough plate appearances to drive his value down.

Did the below average but still useful starter have a worse year than the inferior player who wasn’t good enough to get into the line-up? That’s what an average baseline metric will tell you. If you need an example, we have metrics here on FanGraphs that are baselined to league average, like wRAA, which measures batting runs above average. Last year, Zack Cozart posted a .298 wOBA over 600 plate appearances, which translated into a -8.1 wRAA. Meanwhile, Tigers utility infielder Danny Worth posted a .275 wOBA in 90 plate appearances, so his wRAA was only -2.9.

I don’t think any of us believe that Danny Worth was a more valuable offensive player than Zack Cozart last year, nor do we think the Reds offense would have been improved had the Reds traded for Worth and used him to replace Cozart. Using an average baseline punishes below average regulars in a way that it does not punish even further below average scrubs who aren’t good enough to crack the line-up. That’s a real problem.

So, the question becomes, at what point do you want to start taking away value from a regular, on the belief that he’s actually hurting his team by convincing them to put him into the line-up? A baseline as high as average creates a system that suggests that bench players are more valuable than below average starters, which is not true in most cases. So, for a more fair comparison of all players, the baseline has to be lower than average in order to accurately account for differences in playing time. But how much lower?

That’s where the other side of the replacement level coin comes in. Since we only want to start assigning negative value to players when they’re actually performing worse than the alternative option, we have to know what the alternative might actually do if given the chance, but we also have to factor in the cost to acquire that alternative. And this is why a replacement level around .294 works — because the data suggests that this is close to the level of performance a team could get without having to pay a price in order to acquire that performance. In other words, with minimal effort and without surrendering any assets of real value, a team could get an upgrade over a player performing below that level.

We’re actually at a pretty good time to see exactly what replacement level players look like, as the end of spring training brings a host of veterans who were in camp on minor league deals but weren’t able to hook on with those teams, and are now either reporting to Triple-A are or are shopping themselves around the league looking for bench jobs. These types of players include Yuniesky Betancourt, Miguel Olivo, Chris Young, Lyle Overbay, Aaron Cook, Daisuke Matsuzaka, Endy Chavez, Xavier Nady, Freddy Garcia, and Chris Snyder. A few of those guys landed bench jobs not long after they opted out of their contracts, while others took the $100,000 assignment bonus and are going to go to Triple-A while they wait for a big league job to open up.

Realistically, this is the kind of player that a Major League team can get for something close to the league minimum, without surrendering any talent in trade in order to acquire them. And all of these guys project at around replacement level for 2013. When we’re talking about a minimum level of performance for positive value, these are the kinds of players that a Major Leaguer needs to beat in order to accrue credit for playing. If he can’t be better than any of those guys, then he deserves to have it noted that he is actively hurting the team by continuing to play.

The replacement level baseline also becomes useful in calculating worth in terms of dollars. If you begin with a performance baseline that equates to a cost above the league minimum, you are essentially just creating an extra step for yourself. For instance, if you use Wins Above Average, turning that into a dollar valuation requires you to then calculate the value of an average player, since they’re clearly not freely available and require real assets in order to sign or trade for. And, in order to find the value of that +0 WAA player, you need a new baseline to establish their value, which just leads you back to a number that approximates replacement level. Even if you don’t want to call it replacement level, measuring the performance expected from a league minimum salary is necessary for any kind of financial valuation tool.

Of course, not everyone cares about financial valuations of players, so I’d consider that a secondary reason for why replacement level is a necessary baseline. The playing time issue is a larger one, as everyone should want a single value metric to account for the fact that, in general, better players receive more playing time, and we shouldn’t want to punish those players for playing more than the guys who sit on the bench and watch. This doesn’t mean that playing time is efficiently distributed, but it does account for the fact that a decent role player who produces +1 WAR in 600 plate appearances did more to help his team win than the guy who produces +0 WAR in 100 plate appearances.

With a higher baseline, you’d end up with the bench player being more valuable than the starter, and that’s not the reality in most cases. We want our models to reflect Major League value as closely as possible, and having a replacement level baseline gives a more accurate reflection of a player’s overall contributions than an average baseline.

It doesn’t mean that average baselines are always useless. Sometimes, you simply want to know how a player did with the opportunity he was given, especially if you’re arguing that he deserves a larger opportunity in the future. A productive part-time player shouldn’t be continually limited to part-time work simply because he never received a full-time opportunity. For some questions, an average baseline makes sense.

But for a model that is attempting to answer how much value a player produced for his team in the past, average doesn’t work as the floor. It has to be lower than that in order to model the reality of playing time distributions, and replacement level provides a good barometer for the point at which a player should be penalized for continuing to take the field.

Dave is the Managing Editor of FanGraphs.

Newest Most Voted
Inline Feedbacks
View all comments
9 years ago

Why not: look at positions over some period of time by win totals. Measure the runs created at each position by win, e.g., the first baseman position of 70 win teams produced 50 wRC, of 80 win teams 70 wRC, of 90 win teams 100 wRC, and so on, and use that data to extrapolate to a 50 win team and use that as the replacement level at each position (weighted by AB)? So if a first baseman produces 40 wRC/650 PA, we label him turning in a ’55 win equivalent’ season. Doing something like this, weighted over a couple year period to smooth out variance, would allow for replacement level to vary with the supply of players at that position. If in 2000 there were 50 3B capable of producing 30 wRC over a season or more, but in 2010 there were only 20, it would be good to capture that in replacement level.

9 years ago
Reply to  Kinanik

Because they think their way of calculating replacement level is better than yours, and they take care of the position rarity through adjustments in the WAR calculation. Your way might be better (I don’t think it is), but you’ll need to actually argue its merits. To keep this vague, there’s a fundamental principal underlying a set of Rotographs articles that I think is lunacy, and I do it differently in my own fantasy preparation. So I think it’s OK to disagree with Fangraphs, but you have to actually argue the merits of why your way is better (or keep it to yourself and profit).

9 years ago
Reply to  Kinanik

This is awfully complicated, has no obvious merit and results in a different and non-translatable WAR by position unless I’m missint something.