FanGraphs Prep: Ups, Downs, and Rolling Averages
This is the seventh in a series of baseball-themed lessons we’re calling FanGraphs Prep. In light of so many parents suddenly having their school-aged kids learning from home, we hope is that these units offer a thoughtfully designed, baseball-themed supplement to the school work your student might already be doing. The first, second, third, fourth, fifth, and sixth units can be found here, here, here, here, here, and here.
Overview: A short unit centered on calculating rolling averages. Calculating the mean, median, and mode are fundamental concepts in math. But when we’re dealing with a dataset spread out over weeks, months, or years, simply calculating the average value for the entire dataset hides the data’s peaks and valleys. For a baseball player, those are the hot and cold streaks that everyone goes through during the season.
Learning Objectives:
- Identify and apply a rolling average.
- Explain how changing an interval affects interpretation.
- Consider the potential uses of a rolling average in baseball.
Target Grade-Level: 9-10
Daily Activities:
Day 1
Khris Davis famously hit .247 four seasons in a row from 2015–2018. If we take his total hits and total at-bats over those four seasons, it’s no surprise that his combined batting average is .247.
Year | At-bats | Hits | AVG |
---|---|---|---|
2015 | 392 | 97 | 0.247 |
2016 | 555 | 137 | 0.247 |
2017 | 566 | 140 | 0.247 |
2018 | 576 | 142 | 0.247 |
Total | 2089 | 516 | 0.247 |
Does this mean that Davis hit .247 throughout the entirety of those four seasons? Of course not; intuitively, we know that these season-long averages gloss over the ups and downs a player goes through during the year.
If we break up Davis’s batting average into his monthly splits, we can get a better sense of how Davis performed throughout the year.
Month | At-bats | Hits | AVG |
---|---|---|---|
March/April | 111 | 26 | 0.234 |
May | 75 | 18 | 0.240 |
June | 95 | 21 | 0.221 |
July | 99 | 32 | 0.323 |
August | 103 | 23 | 0.223 |
Sept/Oct | 93 | 22 | 0.237 |
Total | 576 | 142 | 0.247 |
We see that outside of a really hot month in July, Davis’s batting average was below his season average in five of the six months of the season. But months are a pretty arbitrary way to break up the season. They’re an easy way to break up the season but they don’t really tell us much more about a player’s ups and downs. It’s not like Davis suddenly realized the calendar had turned to July and started hitting really well.
To really get a feel for how a player fared as the season wore on, we can use a rolling average. Instead of calculating the average for an entire dataset, a rolling average calculates the average over a given interval. If we wanted to express this as an equation, it would look something like this:
For a rolling average over an interval of n-days, we take the average of the values over the previous n-days in our sample. Let’s look at a sample from Davis’ 2018 season. The data below is a selection of his per game performance between June 15 and July 15, 2018:
Date | At-Bats | Hits |
---|---|---|
6/15/2018 | 4 | 0 |
6/16/2018 | 3 | 0 |
6/17/2018 | 2 | 0 |
6/19/2018 | 4 | 0 |
6/22/2018 | 3 | 1 |
6/22/2018 | 5 | 2 |
6/23/2018 | 5 | 0 |
6/24/2018 | 3 | 0 |
6/25/2018 | 4 | 0 |
6/26/2018 | 3 | 0 |
6/27/2018 | 5 | 1 |
6/28/2018 | 4 | 1 |
6/29/2018 | 3 | 2 |
6/30/2018 | 3 | 1 |
7/1/2018 | 4 | 2 |
7/3/2018 | 4 | 1 |
7/4/2018 | 4 | 2 |
7/6/2018 | 4 | 1 |
7/7/2018 | 5 | 2 |
7/8/2018 | 5 | 3 |
7/9/2018 | 3 | 1 |
7/10/2018 | 5 | 1 |
7/11/2018 | 5 | 2 |
7/12/2018 | 4 | 1 |
7/13/2018 | 2 | 0 |
7/14/2018 | 3 | 0 |
7/15/2018 | 1 | 1 |
If we wanted to calculate a 10-game rolling batting average beginning on July 10, we’d sum the total number of at-bats for the previous 10 games beginning on that date (40), the sum of the total number of hits over the same time period (16), and calculate a batting average (.400).
Calculate the rolling average for each date beginning on June 26 through July 15.
It looks like Davis ended the month of June in a deep slump but quickly got things turned around as the calendar turned to July. That’s what we might have assumed based on his monthly splits but calculating his rolling averages for this time period gives us better data for what his peaks and valleys actually looked like.
Day 2
Taking a player’s rolling average for any statistic on a given date in a season isn’t really that useful. Remember, over a small span of time, a player can have a wide range of outcomes that might not reflect their “true talent,” as we learned in our last FanGraphs Prep lesson on regression to the mean. The true usefulness of rolling averages becomes clear when we start graphing the rolling averages over a long time period. Here’s what Khris Davis’ batting average looks like when each game from 2018 is graphed individually.
It’s really messy, and hard to make heads or tails of. It looks like there are some peaks and valleys but it’s difficult to come to any conclusions by looking at the data this way. Here’s Davis’ 10-game rolling batting average for his 2018 season.
All that noisy data suddenly becomes much smoother and easier to interpret. We can finally see the ups and downs Davis went through during that season. Luckily, FanGraphs has a tool that can help us calculate and graph these rolling averages easily. Here’s Khris Davis’ player graphing tool, which allow you to play around with different intervals for your rolling average.
What happens when you increase the rolling average interval to 25 games? Does the shape of the line change? What happens when you decrease the rolling average interval to 5 games? How might changing the interval change our interpretation of the data?
Experiment with using different statistics in the tool or adding multiple statistics to a single graph. How might you use rolling averages to describe how a player is performing during a season?
Day 3
Some industries use rolling averages to create trend lines to help them project future performance. Using what you learned in our last FanGraphs Prep lesson on regression to the mean, explain whether this would work for baseball statistics. If the rolling average interval was large enough, would that affect how reliable a potential projection might be?
Jake Mailhot is a contributor to FanGraphs. A long-suffering Mariners fan, he also writes about them for Lookout Landing. Follow him on BlueSky @jakemailhot.
Will all of these FanGraphs Prep lessons be archived somewhere easily accessible at the conclusion of this series?
You can find them all here: https://blogs.fangraphs.com/category/fangraphs-prep/
Awesome, thanks!