Will a Player Hit .400 This Season?
The 2020 season, assuming it happens and is completed, is sure to have some quirky statistics that will be tough to wrap our heads around. The home run leader might not even get to 20 dingers this year. A three-win season might lead all of baseball. And while batting average has fallen out of favor as the be-all, end-all of a hitter’s talent at the plate because walks matter and getting a double is better than getting a single, hits are an undoubtedly good pursuit for batters. As such, the aura of batting average still maintains some glow when contemplating the history of baseball. The pursuit of a .400 batting average in a shortened season due to a pandemic will not and should not be viewed with the same historical significance as Ted Williams’ run in 1941, or even George Brett’s 1980 campaign or Tony Gwynn’s strike-shortened 1994 season, but it would make this season a little more fun.
Ty Cobb, George Sisler, and Rogers Hornsby all put up batting averages above .400 nearly 100 years ago, while Ted Williams was the last player to hit that mark nearly 80 years ago. The list of players who have even hit .375 since then is a short one: Stan Musial’s .376 (1948), Williams’ .388 (1957), Rod Carew’s .388 (1977), George Brett’s .390 (1980), Tony Gwynn’s .394 (1994), and Larry Walker’s .379 (1999). The last player to hit above .350 was Josh Hamilton, who hit .359 in 2010. History has shown that if a very high batting average is your goal, the odds are very much stacked against you in a full season. Shrink the season down to just 60 games, though, and we might get a fighting chance.
For recent examples of players who has gotten close, Eno Sarris pointed to Cody Bellinger, who hit over .400 for quite a while to start last season. Jayson Stark mentioned José Altuve’s 60-game run at .388 back in 2017, and this C. Jackson Cowart piece put together a list of 60-game leaders going back to 2002, with Chipper Jones’ .409 in 2008 the best of all the starts. But what about this season? To keep things as simple as possible, I looked at the projections in our Depth Charts and considered every player projected to receive at least 150 plate appearances. I looked at their projected batting average and current talent level, and ran them through a binomial distribution. To provide some context, here are the top 20 players as well as their chances of hitting .400 if given 540 at-bats in a season.
| Name | Projected AVG | Probability of .400 (162) | Odds (1 in:) | 
|---|---|---|---|
| Luis Arraez | .311 | 0.000007443 | 134359 | 
| Christian Yelich | .303 | 0.000001052 | 950777 | 
| Jose Altuve | .301 | 0.000000626 | 1596448 | 
| Howie Kendrick | .299 | 0.000000369 | 2712592 | 
| Nolan Arenado | .297 | 0.000000214 | 4664672 | 
| J.D. Martinez | .297 | 0.000000214 | 4664672 | 
| Mike Trout | .296 | 0.000000163 | 6144816 | 
| Freddie Freeman | .296 | 0.000000163 | 6144816 | 
| Rafael Devers | .296 | 0.000000163 | 6144816 | 
| Charlie Blackmon | .296 | 0.000000163 | 6144816 | 
| Alex Verdugo | .296 | 0.000000163 | 6144816 | 
| Ketel Marte | .295 | 0.000000123 | 8119312 | 
| Juan Soto | .294 | 0.000000093 | 10761145 | 
| Daniel Murphy | .292 | 0.000000052 | 19078906 | 
| Jeff McNeil | .291 | 0.000000039 | 25522544 | 
| Michael Brantley | .291 | 0.000000039 | 25522544 | 
| Ozzie Albies | .290 | 0.000000029 | 34249244 | 
| Xander Bogaerts | .290 | 0.000000029 | 34249244 | 
| Cody Bellinger | .290 | 0.000000029 | 34249244 | 
| Vladimir Guerrero Jr. | .290 | 0.000000029 | 34249244 | 
Those odds are not very good. Even assuming the odds of getting a hit in 31% of at-bats (as with Luis Arraez above) only results in a 1-in-134,000 chance of hitting .400 over an entire season. The odds of seeing a perfect game in any particular game are eight times more greater than a talented player hitting .400. That’s part of the reason it doesn’t happen over the course of a normal season. But what happens in a 60-game slate? Here’s the same list above except with just 200 at-bats.
| Name | Projected AVG | Probability of .400 (60) | Odds (1 in:) | 
|---|---|---|---|
| Luis Arraez | .311 | 0.47% | 211 | 
| Christian Yelich | .303 | 0.22% | 452 | 
| Jose Altuve | .301 | 0.18% | 552 | 
| Howie Kendrick | .299 | 0.15% | 677 | 
| Nolan Arenado | .297 | 0.12% | 835 | 
| J.D. Martinez | .297 | 0.12% | 835 | 
| Mike Trout | .296 | 0.11% | 928 | 
| Freddie Freeman | .296 | 0.11% | 928 | 
| Rafael Devers | .296 | 0.11% | 928 | 
| Charlie Blackmon | .296 | 0.11% | 928 | 
| Alex Verdugo | .296 | 0.11% | 928 | 
| Ketel Marte | .295 | 0.10% | 1033 | 
| Juan Soto | .294 | 0.09% | 1151 | 
| Daniel Murphy | .292 | 0.07% | 1435 | 
| Jeff McNeil | .291 | 0.06% | 1605 | 
| Michael Brantley | .291 | 0.06% | 1605 | 
| Ozzie Albies | .290 | 0.06% | 1796 | 
| Xander Bogaerts | .290 | 0.06% | 1796 | 
| Cody Bellinger | .290 | 0.06% | 1796 | 
| Vladimir Guerrero Jr. | .290 | 0.06% | 1796 | 
Those odds are much better. If you take those 20 hitters, the odds of one of them hitting .400 this year is around 1-in-50, and if we use all hitters, we end up with around a 3% chance that somebody hits .400 in 2020. That’s a really good chance. I’ll also point out that in 61 Double-A games in 2018, Vladimir Guerrero Jr. hit .402, with his average on the season across three levels ending up at .382. While the cumulative probability of some player hitting .400 is 3.1% using a binomial distribution, the odds are actually higher than that, as pointed out by Dan Szymborksi here:
You see this in ZiPS in shorter seasons. In a 200 AB season, ZiPS sees Altuve having a 1-in-130 chance of hitting .400 compared to the 1-in-589 you'd get for a .305 hitter binomial would tell you.
— Dan Szymborski (@DSzymborski) June 28, 2020
Projections are in part an estimate of a player’s true talent. After Arraez above, the next 19 players are within 13 points of each other in terms of batting average. It’s a tightly bunched group, but we can use reliability tests to show why those players are not actually as tightly grouped as we might think; those single estimates are more accurately a range of estimates.
Because we are talking about hitting .400, let’s start there. If we observed a player hitting .400 over 540 at-bats in a season, we could rightly estimate their true talent as somewhere between .302 and .350, with some volatility and randomness pushing their batting average up to .400 during that time. We might then say we would estimate their talent level/projection at .326 (averaging .302 and .350), but we are really just estimating based on a large range. If we do the same exercise above over only with 200 at-bats, we could fairly estimate the player’s true talent as somewhere between .266 and .332, with an average right at .299, which is where we see Howie Kendrick above.
In a normal year, a player would need a very high talent level to even think a .400 season was possible. In the last 40 years, Wade Boggs, Ichiro Suzuki, Todd Helton, Albert Pujols, Tony Gwynn, Mike Piazza, Vladimir Guerrero, and Joe Mauer were the types of players who might have had a shot at .400. Over just 200 plate appearances, though, we open the possibilities to a whole other tier of hitter. The possibility may still be unlikely, but it is certainly more realistic.
As Dan noted in the next tweet in his thread, the chances of hitting .400 go up considerably if you move up a player’s talent level, but don’t fall that much if you drop the level down due to the already slim chance. For example, if you took the top 20 hitters and moved half of them up by 15 points and half of them down by 15 points and repeated the binomial distribution, the odds of a .400 season go to about 1-in-14 even though the total averages remain the same. If we think of the projections more like a spread than a spot-on determination, we can better simulate the odds of a .400 season for the league, even if we aren’t going to get too much closer on an individual basis.
Doing thousands of simulations of a 60-game season or having Dan run ZiPS probabilities for .400 hitters might make for a more exacting approach, but simply taking the top 100 hitters and randomly spreading out the projections 20 points or less in either direction gives a 5.3% chance of seeing a hitter put together a .400 season. One out of 18 might not seem outrageously high, but it’s definitely better than we’ve ever seen and could provide an added bit of intrigue to the summer.
Craig Edwards can be found on twitter @craigjedwards.
 
								
Aside from Dan’s caveat (where we would expect someone totally unexpected to hit a hot streak, upping the chance considerably) I do wonder about the choice of a binomial distribution here. It seems like you would want to predict the variance over a 60-game stretch compared to a full season. I don’t see how dichotomizing outcomes here makes sense since it is a continuous measure we’re predicting.
The variance of a binomial also changes with sample size since it is n*p*(1-p). I am assuming the variance you are referring to is the within-player variance on hits in a 60-game stretch vs. 162. Since hits are a binary variable and we would know the number of trials (i.e., at bats), that is the motivation for the choice in distribution. We are modelling a set of trials here.
However, if you wanted to weigh the probability of some points more or less based on preconceived beliefs about the variance, you could put a prior on the binomial distribution. The conjugate prior for the binomial is the continuous beta distribution which is defined by 2 shape parameters, so that may be what you are looking for here.
I am clearly confused about the methodology here. Is he running a trials for each PA for each player and reporting the percentage of times the player winds up above .400? That wasn’t what I thought he was doing, but that would be one way to do it.
The binomial is the multiple trial generalization of a Bernoulli. Think about 1 at-bat: a player either gets a hit or he doesn’t. This can be modeled by a Bernoulli random variable, and the only parameter associated with a Bernoulli distribution is p, the probability of success (or in this case, the probability of a hit, which is one in the same as batting average if we are limiting trials to hits and outs).
Now, consider 200 at-bats. We want to see the portion of the time a player gets at least 80 hits given his true talent level. The binomial distribution does exactly this. Think about it like you have an unfair coin where heads occurs 30% of the time and tails 70%, and flip it 200 times. You want to see the probability of 80 or more heads, just like we want the probability of getting 80 or more hits for a .300 BA player. I got it to be 0.163%, I used R to find it quickly, just type: 1-pbinom(0.4*200-1, 200, 0.3)