I Think Win Probability Added Is a Neat Statistic

Thomas Shea-USA TODAY Sports

We’re in a tiny lull in the baseball season, and honestly, I’m happy about it. July is jam packed with draft and trade talk, September and October are for the stretch run and the postseason, but the middle of August is when everyone catches their breath. There’s no divisional race poised on a razor’s edge, no nightly drama that everyone in baseball tunes in for; it’s just a good few weeks to get your energy back and relax.

For me, that means getting a head start on some things I won’t have time to do in September, and there’s one article in particular that I always want to write but never get around to. I’m not a BBWAA member, and I’ll probably never vote for MVP awards, but I spend a lot of time thinking about them every year nonetheless. When I’m looking at who would get my vote, I take Win Probability Added into account. Every time I mention it, however, there’s an issue to tackle. Plenty of readers and analysts think of WPA as “just a storytelling statistic” and don’t like using it as a measure of player value. So today, I’m going to explain why I think it has merit.

First, a quick refresher: Win Probability Added is a straightforward statistic. After every plate appearance, WPA looks at the change in a team’s chances of winning the game. We use our win expectancy measure, which takes historical data to see how often teams win from a given position, to assign each team a chance of winning after every discrete event. Then the pitcher and hitter involved in that plate appearance get credited (or debited, depending) for the change in their team’s chances of winning the game. Since every game starts with each team 50% likely to win and ends with one team winning, the credit for each win (and blame for each loss) gets apportioned out as the game unfolds. The winning team will always produce an aggregate of 0.5 WPA, and the losing team will always produce -0.5, spread out among all of their players.

That’s just tremendously neat. As the glossary entry for WPA puts it, “We know intuitively that a home run in the third inning of a blowout is less important to that win than a home run in the bottom of the ninth inning of a close game.” We do know that! But the argument against WPA is that thinking about the game that way doesn’t match what actually matters.

Here’s an example. Let’s say that the Giants are up 5-0 in the third inning when Joc Pederson hits a two-run home run. Win Probability Added: minimal. Then, the Giants cough up the lead and trail 8-9 headed into the ninth inning. Joc blasts another two-run bomb, this one decisive. Win Probability Added: massive. But since the final score of the game was 10-9, taking either two-run home run off the board would result in the team losing 9-8. Why are they treated differently?

That’s a compelling argument. If you imagine that the base/out state was the same both times, it gets even better. Both home runs were necessary for the Giants to win the game. One gets treated as nearly worthless by WPA, though, while the other is worth its weight in gold. You can make it feel even more unjust if different players hit the two homers. Joc the Irrelevant, Yaz the Hero? What, by virtue of when they did an equally important thing? Sure seems arbitrary when you put it that way.

Given how many baseball statistics there are in the world, you could account for that if you wanted. There’s WPA/LI, which adjusts every outcome for the leverage going into the plate appearance, so that how you perform relative to what was expected in each situation is what matters, not how important the situation was. RE24 uses run expectancy rather than win expectancy, so everything is on the same scale.

I don’t find those arguments compelling, however, because I think they misunderstand the contingent nature of a baseball game. Runs aren’t created equal. Timing matters. The game unfolds differently based on what has already happened; a team might put in their mop-up guy or go to their closer based on the game state. To reduce the argument to absurdity, consider last weekend’s Mets/Braves tilt. The Mets were down 13-3 heading into the ninth inning and thus sent a position player to the mound. The Braves promptly scored eight runs. Were those runs equally as important as the first eight of the game? I can’t imagine making that argument in good faith.

Here’s another way of looking at it. Imagine, if you will, that the Giants were on the road in our initial example. Further, imagine that they gave up a two-run bomb of their own in the bottom of the ninth to lose 11-10. Were Pederson’s two home runs each worthless to the outcome? Did they go from hugely valuable to of no import because of that subsequent event? That doesn’t feel right either.

The future is always unknowable. In my opinion, that means that evaluating one plate appearance based on how the game unfolded afterwards misses the point. Every time a hitter comes to the plate, all they can do to best help out their team is maximize that plate appearance. WPA handles that quite well, because it explicitly doesn’t care about what happens afterwards.

From a predictive standpoint, none of this matters much. A home run is a home run is a home run; you’re not going to get anywhere by treating different ones differently if you’re trying to figure out how good a player will be in the future. Decades of research have hammered that point home. That’s also true if we’re trying to measure a player’s underlying talent; there’s no evidence that hitters control when they hit their home runs. We all pretty much know that; there’s a reason that the single-season home run record is so famous while no one cares about “number of runners driven in via home run.”

If you want to know who the best player was in a given year, I think WAR answers that question pretty well. Is that what the MVP award is for? I don’t read it that way. Per an FAQ on the BBWAA website, the award considers “(a)ctual value of a player to his team, that is, strength of offense and defense” in addition to other clauses about games played, character, loyalty, and effort. To me, “actual value” carries a connotation that the particular circumstances of each event matter. What’s the “actual value” of a hit or a strikeout? The context in which it occurs surely has to matter at least somewhat.

How does that relate to WPA? I think it’s almost a direct translation. Players can’t control the situations they find themselves in; that’s one of the neat things about baseball. The batters in front of a player determine the base/out situation they face. All a player can do – all that’s in their control each time they step to the plate or face a new batter – is increase their team’s chances of winning that game by as much as possible.

If that sounds a lot like WPA to you, then you’re thinking about this the same way that I am. WPA doesn’t care about how you got there. It doesn’t care about what happens afterwards. It bores in on the individual situation and nothing else. How much actual value did a batter provide? I can’t think of a better way to encapsulate that than by starting with how the game looked before they batted and finishing with how it looked afterwards. Whether the team came back later, whether some future event cheapened or heightened their earlier contribution – that isn’t what we’re talking about here. How much did a player help his team? For me, that’s a close corollary to how much win probability they added.

I don’t mean to say that I won’t consider anything else when looking at who deserves to take home hardware at the end of the year. Sure, WPA sounds a lot like the MVP criteria to me, but it’s not a perfect match. “Actual value” is purposefully nebulous. A ton of home runs is a ton of actual value, even if a team squandered that value by not having baserunners on to capitalize. The same is true for someone who reaches base a bunch; if they end up disproportionately doing it in unfavorable spots because their team doesn’t cooperate, it’s hard to blame the player for it.

Quite frankly, a big disagreement between WAR and WPA doesn’t come up very frequently. This year’s WPA leaders in each league? Shohei Ohtani and Ronald Acuña Jr., the two MVP favorites. Last year’s? That’d be Aaron Judge and Paul Goldschmidt, who both took home the trophy. In 2021, Ohtani and Bryce Harper led their respective leagues, and both won MVP. For the most part, this statistic is telling us what we already know.

I hesitate to mention it, but WPA comes close to making me understand why RBI are still considered a key statistic by a lot of baseball fans. We know, again thanks to decades of research, that RBI aren’t a particularly skill-intensive statistic. They depend a lot on context; who’s on base when someone steps to the plate matters a lot more than the skill of the batter. But if you’re wondering who contributed to a game’s outcome, they’re undeniably important. You can’t win without scoring runs, and RBI inarguably produce runs. That’s why people still love them even though they aren’t predictive of future production or even descriptive of current talent.

In some senses, WPA is just a sharper way of measuring what RBI were attempting to capture. Driving that run in from third base with a sacrifice fly really does have value; it truly isn’t the same as a strikeout, at least in terms of winning the game at hand. But that’s less impressive than a solo homer in a different situation, or even a single to drive home the runner, and WPA can handle that range of outcomes much better than a single binary statistic (did the runner score or didn’t he?). WPA also considers driving a run home from second more valuable than driving one home from third, and doing so in a close game more valuable than doing it in a laugher. It’s what RBI fans think their statistic does. WPA also captures the other side of the coin, setting the table for future hitters, which is an equally important part of winning, and it handles it much better than a raw count of runs scored would.

So if you’re reading this and you’re an MVP voter, here’s my plea: take a look at win probability numbers when you’re compiling your ballot. It probably won’t change your vote, because as I’ve already mentioned, it mostly mirrors how MVP voting already goes. The best players tend to add the most win probability because, well, they’re the best players. But in corner cases and down-ballot tiebreakers, looking at who actually did the best with the opportunities they were given deserves a spot in the conversation. Don’t forget to sprinkle in a little bit of accounting for defense, as WPA only gives credit and blame to the pitcher rather than the fielders behind him, but how much defense matters in MVP voting has always been in the eyes of the beholder anyway.

If you’re reading this and you aren’t an MVP voter, I’d basically ask you the same thing. When you’re thinking about who helped their team out the most this year, spare a thought for WPA. It isn’t always the best at telling you who will be good next year. It isn’t always the best at telling you who was the most talented this year, even. But when you’re wondering who helped their team out the most – who came to the plate down and left ahead, who chiseled into deficits and slammed the door on leads – WPA does a great job of explaining exactly that.





Ben is a writer at FanGraphs. He can be found on Twitter @_Ben_Clemens.

149 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
grandbranyanMember since 2017
1 year ago

Devin Williams +11.42 WPA is the highest of any pitcher, starter or reliever, since 2020.

Jordan Romano (+10.28) and Zack Wheeler (+9.05) are the only other pitchers over plus nine.

ascheffMember since 2017
1 year ago
Reply to  grandbranyan

That highlights the exact problem with WPA though. Josh Hader has been objectively better than Devin Williams in the innings pitched. It isn’t Hader’s fault that his team hasn’t had leads in close games to bring him in to defend. The Padres are a much better team than the Brewers, but because they are, their average margin of victory is way higher, meaning Hader doesn’t get high leverage opportunities. Williams uniquely benefits from his team being almost exactly league average – they are not good enough to have a bunch of huge leads, but still good enough to have some leads.

DDMember since 2020
1 year ago
Reply to  ascheff

But that’s the point of the stat, no? Williams has been in tougher situations so has helped his team win closer games, while Hader is not adding as much value by pitching in lopsided games or getting saves with 3 run leads. Therefore, Williams gets more credit via WPA, even if Hader pitched better via a WAR/context-neural view.

ascheffMember since 2017
1 year ago
Reply to  DD

Yes, and that point is why the stat is inherently flawed to the point of being almost meaningless for player-level comparisons. Obviously neither Hader nor Williams are going to win the MVP Award, since I think we have all realized that the best relievers are nowhere near as valuable as even a good but not great position player or starter. But using WPA in any kind of comparison of skill is weird, since the context that fuels WPA has nothing to do with the player.

Josh Hader has been a better reliever than Devin Williams this season. If you have a one run lead with the season on the line, Hader is the guy you want up there, not Williams. The fact that Devin Williams has such a high WPA is because of his team, not because of him. If the Brewers were better OR worse OR just less lucky from a run sequencing perspective, Williams would have a much smaller WPA. He has benefitted from a perfect storm of circumstances outside his control – a team that is ok but not good and has had well above average luck with the sequencing of runs, creating a plethora of save opportunities with fairly small leads.

DDMember since 2020
1 year ago
Reply to  ascheff

But using WPA in any kind of comparison of skill is weird, since the context that fuels WPA has nothing to do with the player.” I’m not arguing that WPA is a measure of skill. It’s a measure of results/performance, which is precisely what people consider for MVPs. WPA also indeed has something to do with the player, since his performance in each event is part of the result. I agree with others in this comment section that the apportionment of the WPA to the right player is likely due for refinement, but there is some truth to the direction it gives.

I’m not disagreeing that Hader is the better pitcher based on pure skill or context-neutral performance. But all WPA is measuring is what the guy did with the situations presented to him. It’s not the end-all-be-all.

ascheffMember since 2017
1 year ago
Reply to  DD

WPA is not a measure of individual results/performance though. Josh Hader has performed better than Devin Williams and has posted better results than Devin Williams. Individual performance and results are by definition context neutral – Shohei Ohtani is still the best player ever despite his team losing more than it wins.

WPA is a curiosity, not a serious attempt at measuring player performance.

KinanikMember since 2016
1 year ago
Reply to  ascheff

Josh Hader has pitched better than Devin Williams. What WPA tells us is that, if each player’s appearance had been replaced by an average one in the circumstances the player entered, Devin Williams’s teams would have lost more games than Josh Hader’s would have. It’s not crazy to say that Williams has been more important or valuable to his team than Hader has been to his.

Would Devin Williams’s teams have won even more games had Devin Williams been Josh Hader? Likely yes. And vice versa. Hader’s teams would not have won as many games had he been Williams.

There’s a parallel for championships/playoffs.

The 1927 Yankees won the AL by 19 games. Both Ruth and Gehrig had about 12 WAR a piece. The 1927 Pirates won the NL by 1.5 games. Paul Waner had about 7 WAR.

You replace Ruth with a replacement level player and the Yankees still win handily. You replace Waner with one and the Pirates finish 3rd.

To the question: who put up the better performance? There’s no question: Ruth and Gehrig. Whose absence would have made their team more likely to miss the World Series? There’s also no question: Paul Waner.

As long as the award is the “Most Valuable Player” there’s ambiguity around value to whom, value for what? WPA is helpful in answering one perspective on that question.

(I’m glossing over the issues with WPA wrt fielding and so on–it’s useful, but not perfect.)

gavinrodsports
1 year ago
Reply to  ascheff

Your observation is that Devin Williams has pitched in significantly higher leverage than Josh Hader, which is true, but also not particularly meaningful in the context of whether or not WPA is a “serious attempt at measuring player performance”. Hader has been generally babied by Bob Melvin, and his lack of well-planned usage in high leverage is probably one of the main reasons why the Padres are underachieving so much. Additionally, this phenomenon really only exists for relievers, as managers will tend to bring in better relievers in higher leverage. The deviation in leverage index amongst hitters is absolutely minimal and, while it should be considered, makes little impact and is not something to constantly fixate on whenever someone brings up WPA.

Regardless of the reasons as to why Hader has been put in lower leverage than his reputation calls for, which would be interesting to dive into further, none of that reflects on the validity of WPA as a measurement or a concept. It very much is as advertised: it measures the amount of win probability added in plays that player X was directly involved in. The player has complete control over the result (in the same way he controls his more biased context neutral stats); it’s just a more variant but therefore more reflective statistic.

grandbranyanMember since 2017
1 year ago
Reply to  ascheff

“The Padres are a much better team than the Brewers”

Since the trade deadline in 2022
MIL: 94-86
SDP: 87-90

Since 2020 (time frame quoted in original comment)
MIL: 275-230
SDP: 263-242

Since 2017 (Hader’s debut)
MIL: 546-446
SDP: 470-521

The Padres are a much better team than the Brewers at spending money and grabbing headlines.

Smiling PolitelyMember since 2018
1 year ago
Reply to  grandbranyan

There might not be a greater difference between people’s perception of success and actual success than the San Diego Padres (I love the FG team/thought process/community, but woof, the blind spot for SDP is egregious at this point)

(edit: Manny Machado has a negative WPA this season)

Last edited 1 year ago by Smiling Politely
ascheffMember since 2017
1 year ago
Reply to  grandbranyan

The Padres’ Base Runs record is 9 wins better than the Brewers. Their team wRC+ is 18 points higher. Their FIP is .15 runs lower. The only thing they don’t do better than the Brewers is field, but that is more than made up for by the offensive difference, which is massive.

grandbranyanMember since 2017
1 year ago
Reply to  ascheff

Well, fielding and closing out games.

2023 Brewers bullpen WPA: +7.16 (1st)
2023 Padres bullpen WPA: -2.80 (27th)

There’s the gap in BaseRuns.

Smiling PolitelyMember since 2018
1 year ago
Reply to  ascheff

They also don’t win as many games as the Brewers, which is a key data point that needs to be addressed in all those things the Padres are apparently great at doing

sbf21
1 year ago

I really enjoy this site but posters like ascheff exemplify an attitude that i find maddening. There is nothing – nothing – more important than the results on the field. I guarantee that no player, manager, front office, or owner finds consolation after a real life loss by telling themselves “we were better statistically.”

The 1960 NY Yankees destroyed the Pirates in the World Series that year. They outscored them more than two to one: 55-27. The Yankees OPS was .911! Pittsburgh’s was .656. The Yanks out-homered them 10-4.

Mickey Mantle’s batting line was .400/.545/.800/1.345 OPS with 3 HR. Do you think that helped him feel better when the Yankees lost Game 7? “Who cares that they beat us? We were clearly the better team!”

You know what they say – statistics are for losers.

sandwiches4everMember since 2019
1 year ago
Reply to  sbf21

This argument boils down to what is more predictive vs. what is more descriptive. It is often about using the right tool for the right purpose.

I guarantee that no player, manager, front office, or owner finds consolation after a real life loss by telling themselves “we were better statistically.”

“Consolation” means a lot of things to different people at different times. I imagine (almost) all players and managers are going to feel better about a competitive loss where they put up better stats versus one in which they were blown out and uncompetitive.

That having been said, there are times when you have to simply realize the end results aren’t there and act accordingly rather than continuing to wish-cast about the “back of a guy’s baseball card” (see: 2023 New York Yankees).

I’ve heard lots of derogatory sayings about statistics, but I don’t think I’ve ever heard “stats are for losers”. That one might be even worse than “count the rings”.

sbf21
1 year ago

I believe that quote comes from Scotty Bowman, the legendary coach of the Montreal Canadians.

“Statistics are for losers. New York and Boston each has six players among the top fifty in the league. We only have three. See what I mean?”

That year the Rangers and Bruins were the favorites in the NHL. Of course, the Canadians took home the Stanley Cup.

sbf21
1 year ago
Reply to  ascheff

Seems like the main thing they don’t do better than the Brewers is win. Damn. Isn’t so inconvenient that they have to actually play the game on the field? That’s where it’s matters and that’s where they have been found wanting.

gblasius
1 year ago
Reply to  grandbranyan

Amen

gblasius
1 year ago
Reply to  ascheff

“The Padres are a much better team than the Brewers…”

….much? better?

Hmmm

Ivan_GrushenkoMember since 2016
1 year ago
Reply to  grandbranyan

It’s weird to me that reliver WAR has a context component

sadtromboneMember since 2020
1 year ago
Reply to  Ivan_Grushenko

It’s weird to me too, but I sort of get it.

darren
1 year ago
Reply to  Ivan_Grushenko

It sort of shouldn’t.