Saber-Friendly Tip #1: The Linguistics of BABIP

May 20, 2011

Through some conversations with colleagues, I’ve recently had a bunch of thoughts floating around in my head about how to best present sabermetric stats to an audience. I posted some of these thoughts recently in an article, and I’m planning to continue listing tips every now and then. And of course, a bit thanks to Sky Kalkman’s old series at Beyond the Boxscore for the title inspiration.

Batting Average on Balls In Play (BABIP) is one of the mainstays of sabermetric analysis. In fact, I’d suggest it’s one of the most commonly used saber-stats; it’s important whether you’re talking about batters or pitchers, and it’s useful in explaining why players aren’t performing as you’d otherwise expect. If you’re trying to analyze a player and talk about how they will perform going forward, how can you not talk about BABIP?

But despite being such an important statistic, many people are initially skeptical of BABIP. What do you mean to tell me that batters don’t have control over where they hit the ball? Why should I believe that there isn’t a large amount of skill involved in BABIP? To say that there’s a large amount of variation and luck involved in BABIP (and therefore, batting average) seems counterintuitive to people. After all, many baseball fans grew up with the idea that hitting for a high average is very much a skill, not the product of skill and some luck.

So recently, I’ve started trying something a little bit different: presenting BABIP as a percentage. And so far, I think it’s helping.

In other words, instead of writing out a sentence like, “Carlos Santana has a .233 BABIP — much lower than his.277 BABIP from last season — suggesting that his batting average should increase going forward,” I’m starting to write my analyses like so:

When Carlos Santana has put the ball in play this season, he’s only had balls fall for hits 23% of the time. The league average rate for a hitter is normally around 29-31%, while Santana had 28% of balls in play fall for hits last season. Since hitters have little control on if they hit a ball right at a fielder or slightly in the gap, Santana should have more balls fall for hits going forward and therefore, increase his batting average.

I think by using the percentage you accomplish two main things: you rid your article of an acronym and a decimal-heavy stat (both of which can turn people off), and you disconnect BABIP from batting average. As we mentioned above, people grew up thinking of batting average as a skill-driven stat, so when they hear “Batting Average on Balls In Play”, their implicit assumption connects the stat with a skill. Why shouldn’t better players have higher BABIPs? And why shouldn’t better pitchers have lower BABIPs against them? When you’re used to thinking of batting average as a skill, it’s tough not to automatically associate BABIP with skill too.

Also, our normal language surrounding BABIP reinforces that skill connection too. “Carlos Santana has a .222 batting average; he has a .233 BABIP.” It is something he has done, acquired as a result of his skill and performance. But when you use a percentage instead, your language becomes more passive and you imply a sense of uncertainty. Instead of saying a player is actively “hitting” or “produced” a .350 BABIP, you’re saying that 35% of his balls in play fell for hits. It’s no longer the hitter that’s driving these balls in play; it’s simply some balls fell in while some didn’t. Your semantics are matching up with the purpose of the statistic, and helping the reader better understand your point.

In the end, you should present BABIP however you think best serves the audience you’re trying to reach. At a site that’s already saber-heavy, it’s obviously fine to use BABIP since most readers would already understand the stat and it makes your articles more concise. But if you’re trying to reach out to a more mainstream audience, or trying to explain BABIP to someone that’s never heard of it before, it’s not a bad idea to slide that decimal point over two places and then round. Using a percentage instead of BABIP does more justice to the concept linguistically, and you might find your audience more immediately receptive to your point.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG