Before You Vote, Some Other Things to Consider

With less than a week remaining in the regular season, a number of end-of-year player awards appear to lack a decisive winner. With so few games left, a decisive winner is unlikely to emerge.

In the American League MVP race, for example, you couldn’t have a greater contrast between the top two candidates. Jose Altuve is the smallest position player in the majors, Aaron Judge the largest. They possess different offensive skills and different defensive homes. Yet these two very different players had produced exactly 7.3 WAR entering play Monday. Lurking behind Altuve and Judge is the game’s best position player, Mike Trout. After losing time to injury, Trout isn’t the favorite. He’s been excellent when healthy, however.

The American League Cy Young race might be even more fascinating. After Chris Sale seemed to have run away with the award by the end of July, Corey Kluber has made it very much a contested race thanks to a remarkable series of performances since he returned from the disabled list in June. How one chooses between the two might depend on which version of pitcher WAR one consults: either the FIP-based version (denoted at the site just as WAR) or the one calculated by runs allowed (RA9-WAR).

Sale holds a sizable lead by the former measure (8.2 to 7.1), Kluber by the latter (8.3 to 7.5). And Baseball Prospectus has Kluber (8.09 WARP) and Sale (7.88. BWARP) pretty close. BP’s metric employs its Deserved Runs Average (DRA) as its WAR baseline for pitchers.

Sale has hit a rare milestone: 300 strikeouts. Kluber has had as good a stretch of play as any pitcher in the 21st century. Both seasons rank within the top-five all-time for strikeout- and walk-rate differential (K-BB%), in the company of Pedro Martinez and Randy Johnson.

The National League MVP field is muddled. Charlie Blackmon, Kris Bryant, Paul Goldschmidt, Anthony Rendon, Giancarlo Stanton, and Joey Votto are all within one win above replacement of each other.

Good luck with those votes, fellow BBWAA scribes.

Part of the problem, particularly with the MVP award, is the voting criteria. The official definition can be read here and hasn’t changed since the first ballots were cast in 1931. The vague language has been an issue for nearly 90 years of voting.

Dear Voter:

There is no clear-cut definition of what Most Valuable means. It is up to the individual voter to decide who was the Most Valuable Player in each league to his team. The MVP need not come from a division winner or other playoff qualifier.

1. Actual value of a player to his team, that is, strength of offense and defense.

2. Number of games played.

3. General character, disposition, loyalty and effort.

4. Former winners are eligible.

5. Members of the committee may vote for more than one member of a team.

You are also urged to give serious consideration to all your selections, from 1 to 10. A 10th-place vote can influence the outcome of an election. You must fill in all 10 places on your ballot. Only regular-season performances are to be taken into consideration.

Keep in mind that all players are eligible for MVP, including pitchers and designated hitters.

There is, of course, quite a bit of room for subjectivity there — and debates have revolved for decades around the implications of the word “value.”

But I think there’s a consideration that has been addressed less often in this age of better data and better measurements of performance: with our votes, we are awarding (and evaluating) history, considering what has happened. We’re not trying to be predictive, not attempting to determine the most talented player at the present moment, or the best player on the best team. The runs scored and runs allowed and context matter because they help determine, ultimately, the real wins and losses in the standings.

For example, while you probably wouldn’t base any sort of predictive analysis on either Win Probability Added or Clutch, both metrics are meaningful representations of what has happened, of who’s produced in crucial situations. Not all raw value (WAR) is created equal. Consider what FanGraphs’ “Clutch” metric is measuring:

“…how much better or worse a player does in high leverage situations than he would have done in a context neutral environment.” It also compares a player against himself, so a player who hits .300 in high leverage situations when he’s an overall .300 hitter is not considered clutch. Clutch does a good job of describing the past, but it does very little towards predicting the future.”

Much of our analysis at FanGraphs and elsewhere is forward-looking; award voting is the opposite, though. It represents an attempt to more accurately understand history and distribute credit.

WAR is more predictive than WPA and Cutch, sure, but WPA tells us about performance in context and Clutch about how the player performs, relative to his skill, in high-leverage situations.

This was the problem with Bryant’s MVP candidacy last year, which Jeff documented last season.

What this gets at is that his hitting this year has been less helpful than the overall numbers would suggest. That’s not spin; I don’t have anything against Kris Bryant. It’s just how things have happened. And just as with Saunders, this is an easy thing to break down. In low-leverage situations, Bryant’s wRC+ ranks in the 99th percentile. In medium-leverage situations, it ranks in the 85th percentile. In high-leverage situations, it ranks in the eighth percentile. Bryant has done the most damage when the results have mattered the least. That’s all this says.

It’s a problem with Bryant’s candidacy again this year, a matter that Jeff recently revisited.

You can be the most valuable player in a league and the least clutch. Clutch and WPA/LI are just offensive measures. But in close races, voters ought to drill deeper.

Notably, Bryant is the third-least clutch player this year. Judge is the least clutch. Now, WAR also accounts for defense and baserunning value — which aren’t included in the win-probability numbers — but the idea here is to show that not all production is distributed equally. Some players have produced in more meaningful spots than others. Judge hit his 50th home run on Monday, he’s a great player who has enjoyed a remarkable season, but the home run also came with a significant lead late in the game. Altuve has produced similar overall value this year, but more of it has come in critical situations.

Consider the following scatter chart:

Considered in this light, one might arrive at a different conclusion about the identities of the MVP favorites.

We can also approach this from a Win Probability Added standpoint, which is dependent on context. Judge ranks 56th in WPA (+1.74) while Altuve ranks eighth (+3.68). Trout, for the record, ranks first (+5.45).

Another way to evaluate this is simply by considering leverage splits for wRC+:

Altuve
Low Leverage: 157
Medium Leverage: 174
High Leverage: 133

Judge
Low Leverage: 190
Medium leverage: 150
High leverage: 95

Trout
Low leverage: 168
Medium leverage: 198
High leverage: 158

In the Cy Young race, Kluber (-0.21) and Sale (-0.35) have very similar ratings in Clutch. As noted before, however, the differences in WAR have their own implications. FIP-based WAR, for example, possesses more predictive value because it neutralizes the influence of balls in play and sequencing. RA9-WAR, however, might be more useful for voting, taking into account the runs that were actually allowed.

Again, we’re concerned with history. What actually happened. That’s this author’s stance, at least.

WAR has become an important tool, perhaps the most important tool for many voters. It marks a step forward from batting average and RBIs. And it’s designed to give a sense — through different recipes at FanGraphs, Baseball Prospectus, and Baseball Reference — about which players produced the most total value overall. It’s a great tool. It’s trying to boil a lot down into one number, and that makes life easier for voters. But WAR doesn’t account for context or impact on individual games. In a close race, we need more context.

Award voting is a matter of history, judging recent history. It’s not a predictive exercise or an argument over who’s the top talent at the present moment or the best player on the best team. And to complete that historical picture, to look back as accurately as possible, voters ought to drill deeper.





A Cleveland native, FanGraphs writer Travis Sawchik is the author of the New York Times bestselling book, Big Data Baseball. He also contributes to The Athletic Cleveland, and has written for the Pittsburgh Tribune-Review, among other outlets. Follow him on Twitter @Travis_Sawchik.

44 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
WARrior
6 years ago

I’m a huge Trout fan, and would like to think that his WPA makes him a serious candidate. But I just can’t. Blame it on you folks at FG, but WAR to me is still the best measure of value. I completely get what you’re saying about predictive value vs. what actually happened, but everyone knows that what actually happened has a lot of random chance in it. So does WAR, but not as much.

Here’s another way of looking at it. A player with the highest WAR gives his team the best chance of getting value when it’s needed. Unless you really believe in clutch–that some players choke when production matters most, while others may find an extra gear in the same situation–the context dependent stats are driven mostly by noise. WAR is the best attempt we have at filtering out some of the noise. Even if it doesn’t account for team wins as well as a context-dependent stat, it does show which player is “helping” his team the most, i.e., the player you most want at the plate when the game is on the line–not looking back over the season, maybe, but right now.

HappyFunBallmember
6 years ago
Reply to  WARrior

So if you want “right now” instead of a total season contribution, maybe you mean WAR/PA? That leaderboard is:

Trout 0.013
Judge 0.012
Altuve 0.011
Rendon 0.011
Pham 0.011

WARrior
6 years ago
Reply to  HappyFunBall

Sure, except that Trout has had only about 500 “right now” situations over the course of the season, whereas Judge and Altuve are pushing 700.

HappyFunBallmember
6 years ago
Reply to  WARrior

WARrior – just to play Devil’s Advocate:

If we’re talking about “who provided the most value this season?”, that’s a descriptive stat. WPA, for example, or some other that measures what actually happened. Context included, with all the noise of luck, time lost to injury, and everything else.

Within that framework, Judge WAS more valuable than Trout in 2017

If we’re talking about “who we can expect to provide the most value is the most immediate future (e.g. right now)?” then a predictive or context neutral stat like WAR is appropriate. But it’s also appropriate to normalize the PAs to a degree. We don’t want to overvalue part time players who may excel in very limited opportunities, but we also don’t want to penalize a guy who bats 4th against a #1 hitter who gets 100 more PAs over the course of a season.

Within that framework, Trout IS more valuable at this exact moment in time.

stever20member
6 years ago
Reply to  HappyFunBall

But it goes to the age old question with awards. is the award for the best player, or is it for the player who had the best season. The 2 are not mutually exclusive always by any stretch of the imagination. And yes, the player who had the best season has a lot of luck involved. But that’s part of the game.

WARrior
6 years ago
Reply to  HappyFunBall

Maybe I wasn’t clear enough, but when I said “right now”, I meant that as a description of every single PA throughout the season, not literally only right this minute. My argument is probably confusing because I’m taking what happened over an entire season, as reflected in current WAR, then projecting that back over every PA during the season. In my OP, I said not over the entire season, but then I was referring to what actually happened, not to what we would expect would happen based on his season WAR, or WAR/PA if you like.

From the information we have now, Trout was the best bet at every PA during the season, regardless of what actually happened in that PA. If you could know at the beginning of the season what his WAR/PA at the end would be, you would prefer him over anyone else for any PA. That to me is the mark of the MVP, and the only reason I can’t see him as that is because he missed so many PA.

stever20member
6 years ago
Reply to  WARrior

your post though makes my point. Should the award be for the best player(which I don’t think anyone would dispute is Trout) or the guy who had the best season(which I think a lot of folks would say Judge or Altuve has). I think a lot of the voters vote for who had the best season. I think most of them do quite frankly… So I think it’s going to be a tough go for Trout quite frankly.

WARrior
6 years ago
Reply to  stever20

Don’t disagree. Trout did not have the best season, my point is just that the reason he didn’t was because of all the games he missed.

stever20member
6 years ago
Reply to  WARrior

yeah that’s very true. But it is what it is. A lot of the voters vote on who had the best season. Not who was the best player. 2 completely different animals.

Llewdor
6 years ago
Reply to  WARrior

If WAR is the best measure of value, shouldn’t the MVP go to Kluber?

sadtrombonemember
6 years ago
Reply to  Llewdor

You misspelled “Sale”

dl80
6 years ago
Reply to  sadtrombone

Only if you are looking at what they are likely to do next season (FWAR) instead of what they actually did (BWAR).

sadtrombonemember
6 years ago
Reply to  dl80

No no no no. bWAR does not measure what they actually did any more than fWAR does, it just does a worse job of isolating pitcher performance among team performance.

I said this below, but the idea that the predictive validity of FIP-based WAR is independent of its construct validity is simply wrong. bWAR does a lousy job of predicting future performance specifically because it doesn’t measure individual pitcher performance well.

Dane Roberns
6 years ago
Reply to  sadtrombone

I think we’ve had this conversation before but I have to disagree. fWAR uses FIP to gauge a pitchers success at run prevention. Basically the “unit” or FIP is theoretical runs allowed per nine innings. It’s estimated run prevention of pitcher for fWAR vs. actual run prevention of pitcher + defense for bWAR (though bWAR does make something of an attempt to adjust for defense.)

While I love (*love*) FIP, I think it can get especially problematic the further from the center of the spectrum you get, and this is important when considering end-of-season awards. For example, it goes without saying that having a lower BABIP allowed will lower your ERA, but what is less obvious is that if you are as awesome as Chris Sale, having a higher BABIP allowed will actually lower your FIP because decreases it the denominator (which is IP before you add the constant). In other words, a higher BABIP against just gives a good pitcher more chances for Ks. Look at Kluber and Sale’s Ks and BBs over TBF instead of IP. Crazy how similar they are.

isavage
6 years ago
Reply to  sadtrombone

You have no idea if bWAR does a worse job of isolating pitching from team performance than fWAR. fWAR doesn’t take into account quality of contact, and it doesn’t take into account context (LOB% and clutchness may not be predictive, but I think it’s pretty hard to argue they’re not relevant when you’re talking about a performance award). Kluber leads Sale in WPA by a substantial margin. bWAR does do something to normalize based on team defensive performance. To me the context avoidance of pitching fWAR makes it relatively useless for something like Cy Young voting. On top of that, you have no idea what level of control the pitcher actual had on the quality of contact that fWAR is ignoring.

YKnotDisco
6 years ago
Reply to  dl80

You’re confusing xFIP with FIP. xFIP projects going forward (TTO). FIP captures what occurred (TTO).