The Meaning of Small Sample Data

We’re a week into the Major League season, which means most regular hitters have roughly 20 to 25 at-bats, and each team’s best pitcher has maybe thrown 13 or 14 innings. These are the smallest of small samples, and almost anything is possible over the course of five or six games. Right now, we have things like Jose Iglesias leading the American League in Batting Average and Kevin Kiermaier leading the AL in Slugging (.941). Among the many dominant pitching performance from the first week, you’ll find names like Aaron Harang, Tommy Milone, and Jason Marquis.

For years, the analytical community has strongly advised against reading anything into early-season results, making the phrase “Small Sample Size” into a term you’ll even hear on broadcasts. We have an entire entry in the FanGraphs Library devoted to sample size, and another on regression to the mean, which is a related concept. If you’re reading FanGraphs, odds are you’re probably aware of the fact that you shouldn’t jump to conclusions based on a week’s worth of data. The Braves are not the best team in the National League. The Tigers aren’t the ’27 Yankees. Over any given week, weird stuff is going to happen, and we just notice it more at the start of the season because it’s the only thing that has happened yet; if you look at any seven day stretch throughout the year, you’ll find similarly odd results.

But there’s a problem with just saying “Small Sample Size” all the time: it forces you to draw a line in the sand. At some point, you have enough data for it to not be considered a small sample anymore, but the terminology suggests that it magically transforms out of being a small sample at some point, worthless beforehand but useful afterwards. This assumption has been somewhat reinforced by the very-useful-but-often-misinterpreted research by Russell Carlton (and others) on what are generally called stabilization points; the number of trials — usually notated in plate appearances, balls in play, or batters faced for a pitcher — at which a given statistic only needs to be regressed about halfway back to the mean for it to contain useful information.

Unfortunately, the “Small Sample Size” creed and the availability of published numbers for these stabilization points have led to the idea that data is useless up to that point, and then useful after that point, which is wrong on both ends. In reality, every small piece of data contains a little bit of signal and a lot of noise, and as you begin to stack small pieces of data on top of each other, the noise begins to cancel out, making the signal more visible. If you have one piece of data, you probably have so much noise that you can’t even hope to find the signal in the result, but your hope marginally increases at data point number two, then again at number three, and so on, until you’ve collected enough pieces of information that the noise begins to fade as randomness cancels itself out.

In other words, the climb to a a reasonable sample size is more like stairs than an elevator. The fact that strikeout rate’s stabilization point is 60 plate appearances doesn’t mean that you have no information at 59 PAs and plenty of information at 60 PAs. That one extra trip to the plate doesn’t magically transform the previous 59 into being useful. You had almost as much information at 59 as you did at 60, and should treat the 59 PA sample not much differently than you do the 60 PA sample, even though one is north of the stabilization point and the other is not.

So while making conclusions based on the first week’s worth of results is a bad idea, so too is discarding the small bits of data we do now have for 2015 as if they tell us absolutely nothing. They certainly don’t tell us enough to where we should have radically different opinions than we do a week ago, but in extreme cases, the week one performances were so extreme that they should move the needle enough for us to notice. And because we now have rest-of-season projections on the site for both ZIPS and Steamer, we can identify a few of these instances where an extreme performance over five or six games does tell us enough to slightly adjust our expectations for the rest of the season.

No one had a more extreme week-one performance than Adrian Gonzalez, for instance. He’s currently hitting .609/.667/1.391, good for an .838 wOBA and a 478 wRC+. 13 teams still have fewer home runs this season than Gonzalez does by himself. Obviously, he’s not going to keep this up, and we shouldn’t take his opening week performance to mean that Gonzalez is going to launch 40 homers again. He’s still 33 years old, after all, and his power has been waning the last few years. But the fact that he was even physically capable of hitting five home runs in six days tells us that Gonzalez is unlikely to be playing hurt or to have suffered from an off-season decline that hadn’t yet been captured in his performance record.

His power spike in week one makes the worst-case-scenario outcomes — which drag down the mean forecast — less likely, and thus, his projection can already start climbing. And that’s exactly what we see when we look at the difference between his pre-season and his rest-of-season ZIPS numbers: Gonzalez’s wOBA is now projected at .358 for the rest of the season, up from a .341 mark at the beginning of the year. That 17 point increase is by far the largest upwards move of any player to start the year, which makes sense, given that no one has even come close to hitting as well as Gonzalez has in the season’s first six games.

Now, keep in mind, a .358 wOBA forecast isn’t so different from a .341 wOBA forecast that we should start dramatically altering our perceptions of him as a player. But 17 points of wOBA over 600 plate appearances is worth about eight runs, or almost an entire win worth of value. On a WAR/600 basis, ZIPS saw Gonzalez as a +3.1 WAR player before the year started, but after a week’s worth of games, now projects him as a +3.9 WAR/600 guy over the rest of the year. Gonzalez going bananas in the first week doesn’t mean he’s going to keep this up, but it does mean enough to add essentially one win to the Dodgers projected total for the season.

Of course, I’m highlighting Gonzalez because he’s the most notable change, and most players have seen their projections move far less; the second largest positive wOBA jump belongs to Miguel Cabrera, whose ZIPS forecast has only increased by eight points. Miggy has shown that he’s probably healthy enough at the moment so that the parts of his forecast that had him breaking down are less likely to occur in the early part of the year, and we can be somewhat more confident now that he’s still one of the game’s elite hitters than we were a week ago. It’s not a huge move, of course, but if we don’t think we’ve learned anything from Cabrera torching the ball the first week of the season, then we’re reacting too slowly to the data.

Reacting too slowly is certainly a better alternative to reading far too much into these kinds of samples, so if you had to choose between not changing your opinion at all or solely basing your opinion on a tiny number of data points, you’d be better off with the lack of change. But because guys like Dan Szymborski and Jared Cross have created algorithms that do the work for us, we don’t have to settle for either of those two incorrect options. By incorporating the most recent data into our projections, but still heavily leaning on the past track record of a player’s performance to guide his future forecast, we can climb the stairs to a reasonable sample size rather than ignoring everything that happens up to a certain point and then drawing conclusions after a magical platform is reached.

The 2015 data we have now shouldn’t be the basis of any kind of strong conclusions, and that will hold true for quite a while. But it’s also not entirely worthless, and shouldn’t be treated as if it is. When we see things like Jim Johnson or Joe Kelly throwing 96 mph, or Anthony Gose elevating and driving the ball for the first time in his life, we should be aware of the fact that these results matter a little bit. It doesn’t mean that these guys are going to be great this year, but that outcome is slightly more likely now than it was before. Incorporate the new data at a reasonable pace, and you’ll end up better off than if you ignore it entirely.





Dave is the Managing Editor of FanGraphs.

newest oldest most voted
hscer
Guest

Regardless, I have decided that Bryce Harper will hit .261/.346/.522 all year. (Just kidding–although he could.)

I feel like there should immediately be a link to this piece on the Sample Size page of the Library.