‘Stabilizing’ Statistics: Interpreting Early Season Results by Piper Slowinski May 6, 2011 As I’m sure many of you are aware, doing early season baseball analysis can be a difficult thing. It’s tempting for saberists to scream “Small sample size!” whenever someone makes a definitive statement about a player, and early season results should always be viewed with a heavy dose of skepticism. After all, it’s a heck of a long schedule: the season started over a month ago, but we’re still less than 20% of the way finished. With most players, we have years and year of data on them – whether in the majors or minors – so why should we trust their results over a mere 100 plate appearances? More data almost always leads to better predictions, so at this point in the season, trusting 2011 results over a player’s past history is a dangerous thing. At the same time, completely ignoring 2011 results is a horrible idea too. Some players do make dramatic improvements in their game from year to year, and there are always players that age at a different rate than expected — young players that develop fast (or slow) and old players that age quickly (or slowly). Some of a player’s early season results might be the result of a slump or streak, but sometimes there’s also an underlying skill level change that’s tied in with that slump or streak. So how do we untangle what’s random variation and what’s a skill level change? Scouting information is huge when evaluating players in small samples, but sadly, not many of us are scouts. But stats can still help; you just have to know where to look. This is common sense to anyone that’s played fantasy baseball, but some statistics are more fluky than others. Even very casual baseball fans can recognize that ERA and Wins bounce up and down from year to year, and players’ batting averages fluctuate like crazy over the course of a season. And while some statistics shouldn’t be trusted even over the course of a full season, there are some statistics that stabilize quite rapidly. Thanks to research by Pizza Cutter (which can always be found in the FanGraphs Library), we can see that there are four statistics that have stabilized so far in 2011 for most players: swing and contact rates for position players (50-100 PA), and strikeout and groundball rates for starting pitchers (150 BF). When I say “stabilize”, I don’t mean that these rates won’t change at all over the remaining course of the season. Instead, all it means is that once a player approaches these sample sizes, you can consider that there’s something more than just random variation going on: there’s some underlying change in a player’s approach/skill level/process/etc. in play as well. Matt Garza isn’t guaranteed to finish the year with a 12 K/9 rate because his strikeout rate has “stabilized”, but at the same time, I wouldn’t be surprised if his final strikeout rate is higher than what it’s been in the past. With this in mind, let’s take a quick look at some of the early season standouts in each of these stats: Swing Rate – 50 PA Jose Bautista is only swinging at 33% of pitches thrown to him, which isn’t all that surprising considering that pitchers are acting like he’s the reincarnation of Barry Bonds and only throwing him strikes 34% of the time. But Bautista doesn’t even have the lowest swing rate in the AL; that belongs to Carlos Santana at 31%. This can’t be a good long-term strategy: while Santana is walking a lot, he’s also getting thrown strikes 43% of the time and striking out at a higher rate than last season. His plate approach still needs some refining. On the other end of the spectrum, Vladimir Guerrero is giving new meaning to the phrase “swing at anything”. He’s swinging at 64% of pitches he sees, which is crazy high even for him (career 60% swing rate). Other players with notable high rates: Alfonso Soriano (58%) and Robinson Cano (57%). Contact Rate – 100 PA The list of players with low contact rates shouldn’t surprise anyone. Adam Dunn, Carlos Pena, and Nelson Cruz have all made contact on only 66% of their pitches, but that rates isn’t a large aberration for any of them; they’re simply sluggers that strike out a lot. Mike Stanton is looking to join their group, though, making contact on 68% of the time. Where aren’t any big surprises on the other side of the list either. There’s nobody dramatically performing better than expected, and the list if topped with slap hitters like Michael Brantley, Ichiro, and Denard Span. Strikeout Rate – 150 BF Now we switch over the pitchers, and there are immediately some odd results so far this season. Matt Garza leads the majors with nearly 12 K/9? While Garza has always had the stuff to be a dominant pitcher, he’s never struck out more than 8.3 per nine over the course of a season before. And on the opposite end of the spectrum, there are a number of pitchers with low strikeout totals so far this season. Wade Davis, Clay Buchholz, and Jordan Zimmermann are all young starters that have struck out over 6 batters per nine in past seasons, but are averaging only around 4.5 strikeouts per nine (or less, in Davis’ case) this season. These pitchers are just barely over the 150 BF threshold, but I’d keep my eyes on them just in case. Groundball Rate – 150 BF As if there wasn’t already enough reason to be worried about John Lackey, it turns out his groundball rate has plummeted this year from 45% to 33%. Meanwhile, his rotation-mate Jon Lester has increased his groundball rate for the third year in a row, bringing it all the way up to 58% so far this year. The largest increase in the majors, though, comes from the enigmatic Charlie Morton, who has increased his groundball rate from 47-50% all the way to 64%. Are all of these players with dramatic increases or decreases in their stats going to continue to perform at this rate over the rest of the season? No, of course not. But in each of these cases, the sample size has grown large enough that we can realistically consider that their skill level may be different than what we originally projected for them this season. Only time will tell in all of these cases, but don’t ignore the early returns. There’s value to be found in them if you look in the right places.