A Mathematical Approach to Predicting the Home Run Derby

Tonight, sluggers from around the league will repeatedly hit baseballs very far distances. Yes, I am technically describing the Home Run Derby, but in 2019 baseball terms, we might as well call it “Monday.”

Even in this era of three true outcomes, juiced baseballs, and many, many home runs, I’m still looking forward to tonight’s Derby. It’s a fun event, and for a sport that so desperately needs to follow through on its promise to “let the kids play,” the Home Run Derby is one of those opportunities for baseball to showcase how fun it truly is.

On a different level, I am also excited to see Vladimir Guerrero Jr., Pete Alonso, and Josh Bell hit jacks 500 feet; Carlos Santana attempt to win in front of the home crowd (a la Todd Frazier or Bryce Harper); and Joc Pederson show off that smooth swing.

But I am also enthused by the very format of the Derby. First introduced in 2015, the bracket-style competition adds to the drama. These current rules have been in place since 2016:

“Eight players participate in the derby in a bracket-style, single-elimination timed event. Each player has four minutes to hit as many home runs as possible. Hitters are awarded an additional 30 seconds if they hit two home runs over 440 feet (130 m). Hitters are also allowed one 45 second timeout to stop the clock (two in the finals).

The eight competing players are seeded 1-8 based on their home run totals. While the lower seed hits first, the higher seed hits second in all rounds. The round ends if the higher seed exceeds the total of the first hitter. In the event of a tie, three sets of tiebreakers are employed: first, a 90-second swing-off (with no timeouts nor bonus time awarded); second, each player gets three swings; whoever hits more home runs in the three swings will be declared the winner; thereafter, sudden death swings will occur until the tie is broken.”

One added benefit of the bracket-style Derby is the March Madness-like prediction contest. This year, Major League Baseball is offering a $250,000 prize to the winner of their online bracket competition. So while I do very much enjoy watching the Home Run Derby and gawking at these hitters’ raw abilities, I also enjoy filling out my bracket.

Before filling out my 2019 bracket, I considered whether there was a more effective method of predicting the Home Run Derby than gut feel. The answer to a question like this is almost always “yes.” I decided to dive into the data, trying to answer a follow-up question: What makes one hitter a better Home Run Derby hitter than another?

Immediately, there are shortcomings in answering the question, both in the format and the rules. The Home Run Derby is unlike any regular game situation. Simply put, it’s batting practice to the extreme. We’re already at a disadvantage when using regular season data. Those pitchers are trying to get the hitters out; the Home Run Derby pitchers are trying to get shelled. A second issue is in the format of the Derby. I attempted to run a regression to predict the number of home runs hit by any given player, but we must remember that the player who hits second only has to beat the first player’s total by one. For example, if Player A hits 10 home runs in his four minutes, and Player B hits 11 home runs all in one minute, Player B wins. We don’t know how many home runs Player B would have hit in his four minutes, and there isn’t enough data out there to conduct a regression on a home run per four minute pace.

Nonetheless, I went ahead with the study, considering numerous variables before settling on one: barrel rate. (This is barrel per batted ball event.) Even with the limitation that prevents higher seeds from hitting more than x+1 home runs than their opponent, there was still a moderate correlation (r=0.555) between a hitter’s barrel rate (at the time of the Derby) and the number of home runs that they hit in the first round (n=24). These data are from Derby batters who participated under the current rules (2016-present):

But, if we strip all the higher seeds from the data (severely limiting our sample size in the process), the correlation becomes weaker (r=0.345):

Regardless, barrel rate appeared to be the variable which most correlated with a hitter’s success in the Derby, and I do have a theory to potentially explain why this is the case. Barrels are a result of two input factors: launch angle and exit velocity. That is, if a hitter hits a batted ball at an optimal launch angle with the right amount of power, it will result in a barrel. Here’s a more full explanation:

“The Barrel classification is assigned to batted-ball events whose comparable hit types (in terms of exit velocity and launch angle) have led to a minimum .500 batting average and 1.500 slugging percentage since Statcast was implemented Major League wide in 2015. […] To be Barreled, a batted ball requires an exit velocity of at least 98 mph. At that speed, balls struck with a launch angle between 26-30 degrees always garner Barreled classification. For every mph over 98, the range of launch angles expands.”

If a hitter is more successful at barreling up actual pitchers’ pitches compared to their peers who are also participating in the Derby, shouldn’t they be relatively successful at barreling up batting practice pitchers’ pitches, too? In other words, I’d imagine all hitters’ barrel rates go up against a BP pitcher. But, if they all go up by the same amount, the best hitters at hitting barrels remain the best hitters at hitting barrels. For example, Alonso’s barrel rate against in-game major league pitchers is 18% while Santana’s is 9%. Let’s say they each get a 25-point bump when facing a BP pitcher. Alonso’s barrel rate would now be 43%, while Santana’s would now be 34%. By distance, Alonso’s barrel rate is just as good compared to Santana’s.

(Granted, some hitters are likely better BP hitters than others. We can’t reasonably assume that all hitters receive the same barrel rate bump when facing a BP pitcher versus a real pitcher. That was just for the example to help explain my theory.)

In addition to trying to predict the number of home runs each hitter will hit, I also looked at something simpler: How often did the player with the higher barrel rate win? I found that across the 21 matchups that have been played over the past three years, the player with the higher barrel rate won 15 times, a 71% winning percentage.

Of course not all barrel rate differentials are created equal. I’d feel much more confident in selecting the hitter who has a barrel rate five points higher than his opponent than a hitter who has a barrel rate 0.5 points higher than his opponent. In fact, of the 11 matchups which featured a barrel rate differential of at least five points, the batter with the higher barrel rate won every time. And, after running a chi square test of association, we find that this is statistically significant, with a p-value of just .01:

Barrel Rate Differential
Barrel Rate Differential Higher Barrel% Win Lower Barrel% Win Higher Barrel% Win%
0.0-1.9 1 3 0.250
2.0-3.9 2 1 0.667
4.0-4.9 1 2 0.333
5.0+ 11 0 1.000
Totals 15 6 0.714

What does this mean in terms of predicting the 2019 Home Run Derby? Here are the first-round matchups:

2019 Home Run Derby, First Round
Matchup Barrel Rate Barrel Advantage
Matt Chapman (1) 12.4% +3.9 pp
Vladimir Guerrero Jr. (8) 8.5% -3.9 pp
Alex Bregman (4) 5.6% -4.9 pp
Joc Pederson (5) 10.5% +4.9 pp
Josh Bell (3) 14.8% -0.1 pp
Ronald Acuña Jr. (6) 14.9% +0.1 pp
Pete Alonso (2) 17.7% +8.3 pp
Carlos Santana (7) 9.4% -8.3 pp
Through games played on Saturday, July 6.

One of our four first-round matchups has a barrel rate differential of at least five points. The 11-0 factoid will be put to the test right from the start. If we continue this process of selecting the hitter with the higher barrel rate to win each round, this is what our bracket would look like:

Home Run Derby Predictions
Round 1 Round 2 Round 3 Champion
Chapman (1)
Guerrero (8)
Chapman (1)
Pederson (5)
Bregman (4) Chapman (1)
Pederson (5)
Pete Alonso (2)
Bell (3)
Acuña Jr. (6) Alonso (2)
Acuña Jr. (6)
Alonso (2)
Alonso (2)
Santana (7)

There is a lot of chalk here. Admittedly, seed position might be a confounding variable. As stated above, hitters who have hit more home runs during the regular season are rewarded with the higher seeds. That is why Vladimir Guerrero Jr. is seeded eighth; he only has hit eight homers this year. Since home runs tend to be barrels by the Statcast definition, hitters who receive higher seeds also tend to be those with higher barrel rates. In fact, over the past three Home Run Derbies, the higher seed has won 14 of 21 matchups, a 67% winning percentage. That’s only slightly off from picking the hitter with the higher barrel rate (15 of 21). There isn’t a large enough sample size to create any significant distinction between those figures.

At the end of the day, we’ll just have to wait and see how these predictions do, and whether the trends that I have identified continue to hold. Enjoy the Home Run Derby tonight, and let’s all hope Pete Alonso is the batter who takes home the crown.

We hoped you liked reading A Mathematical Approach to Predicting the Home Run Derby by Devan Fink!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs




Devan Fink is a Contributor at FanGraphs. You can follow him on Twitter @DevanFink.

newest oldest most voted
London Yank
Member
London Yank

This is a fun idea. You might do better to explore multiple explanatory variables (e.g. fly ball %, exit velocity, etc.) and then fit a lm to predict HR per round. For example in R it would take the form of:

hr.model <- lm(HR ~ %barrel + % fly ball + %exit velo)

You can use the step(hr.model) function to drop parameters that don't increase your explanatory power, and the predict() function to make predictions for each player. If you can find a model with good explanatory power you can then make HR estimates for each player with error bars.