The Meaning of Small Sample Data
We’re a week into the Major League season, which means most regular hitters have roughly 20 to 25 at-bats, and each team’s best pitcher has maybe thrown 13 or 14 innings. These are the smallest of small samples, and almost anything is possible over the course of five or six games. Right now, we have things like Jose Iglesias leading the American League in Batting Average and Kevin Kiermaier leading the AL in Slugging (.941). Among the many dominant pitching performance from the first week, you’ll find names like Aaron Harang, Tommy Milone, and Jason Marquis.
For years, the analytical community has strongly advised against reading anything into early-season results, making the phrase “Small Sample Size” into a term you’ll even hear on broadcasts. We have an entire entry in the FanGraphs Library devoted to sample size, and another on regression to the mean, which is a related concept. If you’re reading FanGraphs, odds are you’re probably aware of the fact that you shouldn’t jump to conclusions based on a week’s worth of data. The Braves are not the best team in the National League. The Tigers aren’t the ’27 Yankees. Over any given week, weird stuff is going to happen, and we just notice it more at the start of the season because it’s the only thing that has happened yet; if you look at any seven day stretch throughout the year, you’ll find similarly odd results.