Expected BABIP for Pitchers

May 22, 2008

Recently on FanGraphs, we’ve been referring to a stat called xBABIP or Expected Batting Average on Balls in Play to help justify a pitcher’s current BABIP. There’s been a few questions about what this stat means, so I thought it’d be as good a time as any to try and explain the ins and outs of this particular metric.

The initial concept of BABIP is that pitchers do not have control over what happens to balls once they are hit into the field of play.

BABIP typically fluctuates from year to year with a baseline of around .300. If a pitcher has a particularly high or low BABIP, we may say he’s been lucky or unlucky. Things are of course not quite this simple, but for the most part the rule holds true.

In enters ball in play data; we know how many line drives, fly balls, and ground balls a pitcher allows in to play. Line drives fall for hits the most often and ground balls fall for hits more often than fly balls. What types of batted balls a pitcher allows into play are going to effect a pitcher’s overall BABIP.

BABIP by Type (2007):
Fly Balls – .15
Ground Balls – .24
Line Drives – .73

Ideally, the formula is going to look something like this to find out a player’s expected BABIP:
expected BABIP = .15 * FB% + .24 * GB% + .73 * LD%

For more accuracy you could remove home runs from the batted ball percentages at a rate of 92% from fly balls and 8% from line drives. You could even account for infield fly balls and remove that from total fly balls, but the formula above will get you pretty far.

Dave Studeman a couple of years ago calculated that adding .12 to LD% was good enough for a ball park estimate of a player’s expected BABIP. This is what you’ll often see writers on FanGraphs refer to as xBABIP.

The best way to use this statistic is to attempt to validate a pitcher’s current BABIP. For instance, a pitcher might have an high line drive percentage and a high BABIP. This would give a pitcher a high xBABIP as well and you could say: “Yes, his high line drive percentage is responsible for his high BABIP.”

While this is useful for looking at past performances, the difference in xBABIP and BABIP should not be used in an attempt to evaluate future performance. This is because LD% and BABIP are somewhat independent of each other. While there is some correlation between LD% and BABIP, it isn’t enough to suggest that they will always track each other.

LD% in itself is highly variable and it would be difficult to say that a pitcher with a BABIP of .300 and a LD% of 22% (xBABIP of .340) should do considerably worse going forward because you really don’t know what his LD% is going to be the rest of the season. His xBABIP of .340 was his expected BABIP and will not be his expected BABIP in the future. Typically a pitcher’s expected BABIP in the future will be around the original baseline of .300.

8 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Brett

17 years ago

Cool.

So one of the main points Voros originally made is that BABIP correlates very poorly from year to year. But we see that there is a relationship between BABIP and hit type, specifically LD% as it’s coefficient makes it the dominant term in your equation. Does this mean that LD% doesn’t correlate well year to year?

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG