Lost Seasons Mean Lost Milestones

Baseball is a statistics-heavy game, and that’s true even for those who don’t think of themselves as being part of the saber set. Because the game’s rules have had a relatively high degree of consistency across eras, the sport’s career milestones have also enjoyed a certain constancy throughout its history. That doesn’t mean that 600 homers from a player whose prime came in the 1960s are exactly the same as the 600 homers a player in the Wild Card era hit, but when you’re talking 600 homers, you’re always talking about someone who was really, really good at hitting home runs.

And while we would like to think that Hall of Fame voting is based off deep analysis and not round numbers, the fact remains that milestones still play a large part in who ends up in Cooperstown. Whether a player hits 470 homers or 520 homers still means something.

For precisely how much missed time has mattered in Hall of Fame voting, you should read my colleague Jay Jaffe’s three-part series on missed time and the Hall of Fame. In those three parts, Jay tackled how missed years due to wars and strikes were handled , and how today’s hitters and pitchers might be treated in Cooperstown terms. So go read those first. I’ll wait.

[…]
[…]

Meg will probably now inform me that I don’t need to actually insert punctuation that represents foot-tapping, so let’s get to some data! That’s what a projection system is for, after all.

Given the world that we’re in, one of my many research projects this spring has been trying to better gauge how missed seasons ought to be treated. Forecasting those seasons is difficult in the best of times; the missed time is typically due to injury or suspension or war. Now, everyone is hanging out at home trying to not catch the current super-virus or crippling ennui.

And I wasn’t entirely sure whether the long layoff would affect all types of players to the same degree. Re-projecting stars for 1982 and 1995 using ZiPS — I didn’t have ZiPS in 1995 though I assume you’ll excuse me for not having a projection system when I was four — I tried to gauge whether missed time affected players of different qualities in different ways. Together with other data (suspensions, premature retirements, and war), I found that my normal missed time algorithm slightly overrated stars’ “return” projections. Apparently, the elite do have more to lose with lost time.

With those results in mind, and to get an idea of how the projections would change in a missed season, I projected the probabilities of some of the active players with the best chances of hitting major milestones doing so. These projections reflect both the lost season and the slightly decreased projection relative to the rest of baseball upon return. These projections also contain an algorithm that makes it more likely a player nearing a milestone will return for an additional season. Further, I told ZiPS that veterans will finish their contracts:

ZiPS Milestone Probabilities – Homers
Player 700 HR No 2020 600 HR No 2020 500 HR No 2020
Albert Pujols 22% 5% 100% 100% 100% 100%
Mike Trout 17% 9% 58% 47% 80% 72%
Gleyber Torres 9% 5% 26% 19% 55% 47%
Ronald Acuña Jr. 8% 2% 25% 17% 53% 45%
Cody Bellinger 6% 2% 25% 18% 51% 41%
Miguel Cabrera 0% 0% 1% 0% 50% 20%
Edwin Encarnación 0% 0% 6% 0% 47% 26%
Bryce Harper 8% 2% 25% 18% 54% 44%
Nelson Cruz 0% 0% 0% 0% 34% 10%
Giancarlo Stanton 4% 1% 20% 12% 48% 40%
Rafael Devers 1% 1% 12% 9% 42% 36%
Manny Machado 2% 1% 18% 10% 40% 32%
Francisco Lindor 0% 0% 15% 11% 37% 28%
Pete Alonso 2% 1% 21% 16% 41% 35%
Nolan Arenado 0% 0% 14% 9% 32% 25%
Juan Soto 1% 1% 24% 20% 40% 34%

With a full 2020 season, there was approximately a 57% projected chance that one of these 16 players would finish their careers with at least 700 home runs. A single missed year drops that by more than half, to 26%. A lot of that is Albert Pujols. 44 homers isn’t a lot, but he’s only signed through 2021 and let’s be honest, if he wasn’t a future Hall of Famer with a big contract, he’d have spent his summers playing golf the last three or four years. Losing 2020 washes out most of his probability of hitting 700.

But even for the future immortals with more time remaining, it’s a pretty big deal. To hit 700 homers, a lot has to go right; otherwise, we’d have more than three players in history beyond that threshold. Losing a year is a significant loss, even for a younger player. Lopping 40 homers off the career totals of Cody Bellinger or Ronald Acuña Jr. presents a significant handicap.

For the lighter milestones, it’s less of a kneecapping since you don’t need to be quite as fortunate to hit 500 homers. The exception is Miguel Cabrera. ZiPS was already looking at him askance given that his offensive profile has confusingly become Really Slow Craig Counsell, and sees hitting 500 as being difficult if 2020 is lost. That would have been a surprise a few years ago!

ZiPS Milestone Probabilities – Hits
Player 3000 Hits No 2020 2500 Hits No 2020 2000 Hits No 2020
Albert Pujols 100% 100% 100% 100% 100% 100%
Miguel Cabrera 85% 77% 100% 100% 100% 100%
Robinson Canó 30% 15% 100% 100% 100% 100%
Jose Altuve 40% 28% 81% 70% 99% 99%
Mike Trout 36% 25% 70% 61% 97% 96%
Nick Markakis 32% 17% 90% 84% 100% 100%
Francisco Lindor 28% 20% 61% 51% 85% 78%
Freddie Freeman 35% 29% 70% 60% 94% 91%
Mookie Betts 30% 24% 60% 53% 92% 90%
Ozzie Albies 25% 21% 56% 50% 87% 82%
Xander Bogaerts 27% 22% 52% 46% 90% 85%
Starlin Castro 30% 26% 50% 43% 99% 98%
Rafael Devers 22% 20% 48% 46% 80% 76%
Manny Machado 31% 22% 47% 44% 90% 87%
Christian Yelich 25% 19% 46% 41% 92% 89%
Elvis Andrus 20% 16% 42% 36% 98% 97%
Gleyber Torres 11% 10% 32% 30% 86% 84%
Nolan Arenado 25% 19% 37% 32% 94% 91%
Ronald Acuña Jr. 18% 17% 35% 34% 79% 77%
Joey Votto 1% 0% 20% 12% 95% 94%
Yadier Molina 0% 0% 18% 9% 99% 99%

For the 21 hitters listed here, ZiPS projects that, on average, two who would have achieved the 3,000-hit feat will now fail to do so as a result of a lost 2020 season. For players near 2,000 hits like Joey Votto (1,866) and Yadi Molina (1,963), that’s unlikely to matter; Molina might retire in that case, but ZiPS isn’t capable of modeling this decision of his. The larger hits are taken by mid-career players like José Altuve and Manny Machado, players who are in their prime but old enough that the calendar is a concern.

It would be especially bad news for Nick Markakis in his quest to be the worst 3,000-hit player in major league history. Johnny Damon fell short in his quest to sneak up on 3000 hits and no 2020 could dive-bomb Markakis’s quest for the last 600ish hits:

ZiPS Milestone Probabilities – Pitcher Wins
Player 300 Wins No 2020 250 Wins No 2020
Justin Verlander 32% 14% 90% 85%
Zack Greinke 24% 8% 72% 54%
Clayton Kershaw 30% 22% 60% 55%
Jon Lester 6% 1% 46% 30%
Max Scherzer 9% 3% 40% 35%
Gerrit Cole 12% 9% 36% 30%
Stephen Strasburg 10% 8% 25% 20%
Rick Porcello 6% 3% 22% 16%
Cole Hamels 0% 0% 2% 0%
Chris Sale 2% 1% 15% 12%
David Price 5% 3% 12% 10%

Despite the introduction of the five-man rotation, we’ve been blessed with a surprisingly large number of 300-game winners in our lifetime, most recently the impressive Hall of Fame crew of Greg Maddux, Tom Glavine, and Randy Johnson, with Roger Clemens on the outside for non-baseball reasons. We’re now in an era, however, when the careers of our elite pitchers did not brush up against the end of the pre-sabermetric era, and starting pitchers get fewer decisions than ever before.

Right now there are only nine active pitchers with 150 wins and only two, Justin Verlander and Zack Greinke, who have passed 200 wins. Still, ZiPS thought that there was a 79% chance that one of those 11 pitchers would win 300 games, whether because of Verlander or Greinke being durable, Clayton Kershaw getting that last 5% back, Gerrit Cole establishing 2019-2020 as his new baseline, or maybe even Rick Porcello working his way to 300 wins by virtue of eating innings and having amassed 149 wins through age-30 thanks to an early start.

With a lost season, that becomes a coin flip (46%). The two best candidates, Verlander and Greinke, see the calendar flip unfortunately. It could still happen, but a lot more good fortune is needed.

The one thing the modern era is good for is strikeout records, because there are a lot of punch outs. What does a lost year do for those chases?

ZiPS Milestone Probabilities – Pitcher Strikeouts
Player 4000 K No 2020 3000 K No 2020
Justin Verlander 38% 30% 100% 100%
Max Scherzer 45% 38% 98% 98%
Clayton Kershaw 38% 32% 97% 96%
Zack Greinke 7% 2% 94% 92%
Chris Sale 24% 18% 88% 84%
Gerrit Cole 40% 37% 82% 76%
Cole Hamels 0% 0% 67% 58%
Jack Flaherty 15% 12% 62% 56%
Stephen Strasburg 12% 6% 60% 55%
Trevor Bauer 16% 11% 49% 44%
Jon Lester 0% 0% 47% 28%
Aaron Nola 10% 8% 45% 37%
Madison Bumgarner 1% 0% 42% 35%
Robbie Ray 5% 3% 41% 36%
Shane Bieber 4% 3% 35% 32%

Nobody has established even a 1% chance of catching Nolan Ryan’s 5714 — and nobody will because of the almost 5,400 innings Ryan needed to reach that mark. The odds of catching the 3,000 and 4,000 strikeout marks, still relatively exclusive thresholds, aren’t as impacted as some of the other milestones.





Dan Szymborski is a senior writer for FanGraphs and the developer of the ZiPS projection system. He was a writer for ESPN.com from 2010-2018, a regular guest on a number of radio shows and podcasts, and a voting BBWAA member. He also maintains a terrible Twitter account at @DSzymborski.

newest oldest most voted
emh1969
Member
emh1969

“With a full 2020 season, there was approximately a 57% projected chance that one of these 16 players would finish their careers with at least 700 home runs.”

The individual percentages from the chart add up to 80%. Am I missing something?

CM52
Member
CM52

You’re missing high school math, apparently.

The Stranger
Member
Member

No reason to be a jerk about it. Probability is often counterintuitive. And I’m pretty sure I didn’t learn that one until somewhere in college. It’s obvious if you’ve learned how this works, but it’s not the kind of thing that’s easy to figure out on your own.

The short answer to the actual question is that adding the probabilities of the individual players isn’t how you figure out the probability of at least one of them hitting the mark. Unfortunately, I don’t think I can explain how to do it and why it works that way within the space of this comment, in part because I’m not an expert either.

rosen380
Member

[edit] I should have kept reading the comments… see I was “sniped” several hours ago 🙁 [/edit]

” Unfortunately, I don’t think I can explain how to do it ”

The short version is that the odds that at least one of them gets to 700 HR is easier described as 100% minus the odds that none of them get to 700 HR.

That is pretty easy math- take each of the percentages and subtract them from 100% and then multiply them all together. I get 42% and 74% for with 2020 and with 2020 respectively. Now just subtract those from 100% again and you have the odds off at least one hitting 700 HR (58% and 26%; I assume the variance is that I’m working with figures rounded to the nearest percent)

rosen380
Member

Lets double-check ourselves, with some Excel-based simulations… each pass simulates the 16 players 16380 times:
#1 58%
#2 58%
#3 57%
#4 57%
#5 58%
#6 57%
#7 57%
#8 58%
#9 57%
#10 57%

… probably right 🙂

[edit] Did a couple of runs for the “no 2020” 700HR percents: 27%, 26%, 26%, 27%, 26%…

gottaIch116
Member
Member
gottaIch116

Hi rosen380,

What kind of simulations did you run? What’s the underlying math?

WARrior
Member
Member
WARrior

An easy way to see it is to consider the case where there are only two players, and each has a 50% chance of reaching 700 HR. I think anyone can understand that doesn’t mean that one of them definitely will.

It’s just like flipping two coins, and asking, what’s the probability of (at least) one head? It’s not 100%, obviously. The two coins are flipped independently, and that’s pretty much the same with the two hitters.

eghunter
Member

funny to read comment 🙂

dukewinslow
Member
Member
dukewinslow

given my students are juniors in college, they either weren’t taught probability or remember none of it from high school. Shit, something like 80% of participants get at least one question on the CRT wrong. Math is hard, probability is counterintuitive, and even if you more or less memorized bayes theorem and its extensions (how to get from choose to bayes and vice versa) you could always use the wrong one.

Cave Dameron
Member
Cave Dameron

That’s not how percentages work. Say there were 4 players who each had a 25% chance of getting 700 home runs. You don’t add them up and say there’s a 100% chance that one of them gets to 700.

The Stranger
Member
Member

For the benefit of the original commenter, the way I learned to do it is to multiply the chances that each of them fails to get to 700 to get the chance that they all fall short. So .75^4 equals a roughly 32% chance that none of them hit 700, or a 68% chance that at least one of them does.

The key here is that to find the chances of multiple things happening you need to multiply, not add. Somebody smarter than me will have to explain why, or if there’s a more elegant way to do it.

dukewinslow
Member
Member
dukewinslow

this is the good way. Most of the time people fail to invert the problem and just take 1 minus the product of all the probs. 1-π( Pplayer1, Pplayern)
You want to know the joint probability of one person getting to 700. Which is:
1- no one getting to 700,
which is the product of 1 MINUS the probability of getting to 700, so π( 1-Pplayer1, 1-Pplayern)
probability of one getting to 700:
1-π( 1-Pplayer1, 1-Pplayern)

(Pi notation sucks, apologies)

Six Ten
Member
Six Ten

The simple version of the why is that *at least one* incident occurring is another way of saying *not none*. Negative occurrences of a binary event are impossible; it either happens or doesn’t, it can’t un-happen. So to find at least one, you need to know the odds of not none.

If you assume the individual calculate odds in the table are correct, then we know two things already:

A: the odds of each individual incident happening (each entry in the table)
B: the odds of each individual incident not happening (B = 1-A, for each entry)

For any set of independent outcomes (like who will/won’t hit 700 HRs), you can get the odds of ALL happening by multiplying all instances of A above. Or you can get the odds of NONE happening by multiplying all individual B above.

Since *at least one* is equivalent to *not none*, the way to find it is:
1. Multiply all B together
2. Subtract that number from 1

dukewinslow
Member
Member
dukewinslow

just to note that I find people tend to fail to compute the B number, and just take 1- the product of the A’s.

Dknapp26
Member
Member
Dknapp26

Why do you multiply instead of add?

Flip two coins, what are the chances of…

2 heads:
Lets say we flip the first coin, and get tails. Now there’s no reason to flip the second coin, we already know we won’t get 2 heads. So lets say we flip a first coin 100 times, and it comes up heads 50 times (because its a 50% chance). Now we take just those 50 times it came up heads, and flip a second coin for those. Half of those flips will be heads, meaning we will have 25 sets of 2 flips for heads.

Lets repeat in matheyer terms…
50% = .5
100% = 1

100 * .5 = 50
50 * .5 = 25
so
100 * .5 * .5 = 25
so
(.5* .5)=(25/100)=.25=25% chance of getting 2 heads on 2 coin flips

1st flip heads, 2nd flip tails:
We do this the same way as 2 heads. The only difference is that now we’re looking for tails on the second flip

50% chance our first flip is heads, OF THAT 50%, there is a 50% chance our second flip is tails

.5*.5=.25=25% chance of 1st flip heads and second flip tails

No heads:
We also do this the exact same way. In fact, no heads is the same as saying “2 tails). Since heads and tails have the same 50% chance, there is no difference for us here. If we were dealing with any other split though, our multiplication math would still work…

P1 = Probability of heads on flip 1
P2= Probability of heads on flip 2
P3= Probability of 2 heads flips

P1*P2=P3

We can use this for any discreet combination of flips. but what if we don’t care about the order?

At least 1 head:
This is where things get hard.

First, lets realize that getting at least one head, means any time you are not getting tails on every flip

So, we already know the chances that we DONT get at least one head… they’re the same as the chances we get 2 tails, which is .5*.5=.25=25% chance

And we know the chance we get… something/anything… it’s 100%, this is the full universe of outcomes we can possibly have while flipping two coins.

If we remove the chance of getting all tails (25%) from our full universe of possibility, we see that there is a 75% chance of getting at least one head.

Matheyer…
.5*.5=.25
1-.25=.75=75% chance of at least one head

Now to apply this to our question here…

“What are the chances of at least 1 700 homer player” = 1-“what are the chances of all 16 of these players failing to reach 700 homers”

Chance of failing = 1-chance of succes
e.g. Pujols chance of failure = 1-.22=.78

“Chances of all 16 failing” = P1*P2*P3…P16 where P is the chance of each individual player failing to reach 700 homers

1-(.78*.83*.91 … .99)=.58

Captain Tenneal
Member
Captain Tenneal

Adding them is valid, too, but it answers the question “How many 700 home run hitters do we expect?”, rather than “What are the chances that at least 1 of these players hits 700 home runs?”

For example, adding the probabilities for 500 HR gets you 290%, which translates to 2.9 expected positive results. If you look back at that list 20 years from now on, you would expect that roughly 3 of the players actually made it to 500 home runs.

So for 700 HR, there is a 57% chance that someone makes it there, but we expect 0.8 700 HR hitters from this list. This is because of the small chance that multiple players reach 700.

The Stranger
Member
Member

That’s a cool bit of math that I had to think about for a few minutes. Thanks for sharing.

Jeremy
Member
Member
Jeremy

If you add percentages together you need to also subtract the odds that more than one person gets 700, to avoid double-counting those scenarios.