Small Sample Usefulness

Over the last decade or so, the main sabermetric truisms have been, in no particular order; we like hitters with plate discipline and power, bunting is bad, the modern day bullpen is inefficient, and don’t make decisions based on small sample sizes. The latter is brought up often early in a season, when strange things happen like Mike Hampton striking out 10 batters in a game or Emilio Bonifacio hitting .600 for a week. We trot out the old “it’s early, don’t make any rash judgments” line, and work to convince people that what they’ve seen so far isn’t likely to continue.

However, like most truisms, this is often taken to a non-logical extreme. People have begun to lean on “small sample size” like a crutch that helps them defend their original position in the face of evidence that should convince them that they might not be correct. The evidence might not be overwhelming, but as it begins to pile up, remaining wedded to your preseason thoughts is just as ignorant as overreacting to the performance.

Let’s use Victor Martinez, as an example. I talked about him whacking the ball in April the other day. He’s had a great month, hitting .388/.438/.624 and generally being one of the best hitters in baseball. We can be pretty sure that Martinez won’t keep hitting this well, of course, as it is a small sample of data so far.

However, regardless of what your preseason projection for Martinez was, you should now be quite a bit more optimistic about his performance over the rest of the year than you were on Opening Day. Dan Szymborski released an updated ZIPS projection that accounts for April data (thanks Dan, great stuff!), and ZIPS now thinks Martinez will hit .305/.380/.467 the rest of the season, up from a preseason projection of .293/.366/.447. That small sample size that we’re not supposed to get excited about has increased his projection for the remainder of the season by 34 points of OPS. That’s a significant change.

April is only one month of the season. Things won’t end the way they are now. We do have to be careful about drawing conclusions on small sample sizes. However, let’s not fall into the opposite trap, either – there is useful information to be gleaned from the beginning of the season. Pretending like nothing has changed is just as uninformed as pretending like the current performances will be sustained.

Don’t hang onto your preseason projections like they’re gospel. You’ve got new information in front of you. Use it.





Dave is the Managing Editor of FanGraphs.

34 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Ryan B
14 years ago

When does a small sample size cease to be small? I’m sure everyone can agree that a week of Bonifacio is small, but what if one week had turned into two? or three? At what point does it become meaningful?

lookatthosetwins
14 years ago
Reply to  Ryan B

There isn’t a certain point when things become meaningful. Every bit of information has a certain amount of meaning. One day of going 4 for 4 might add a half of a point to a projected OPS. One week might add 5 points. These numbers are completely made up, but the point is that every game, every at bat is meaningful, it just becomes more so as the sample size increases.