Sloan Analytics: Rosenheck on BABIP

March 6, 2013

Last weekend’s MIT Sloan Sports Analytics Conference included a number of Evolution of Sport presentations. Among the best was a study of BABIP factors titled “Hitting ‘Em Where They Are,” by Dan Rosenheck. He is the sports editor of The Economist and a writer for The New York Times’s Keeping Score column on sports statistics, and he gave an overview of his study prior to presenting it on Day Two of the conference.

——

Dan Rosenheck: “It was a great surprise to find out that one of the distinguished presenters on the Baseball Analytics panel was Voros McCracken. His discovery, in 1999, was that BABIP allowed by starting pitchers is, at the very least, extremely noisy and hard to predict from year to year. It was a revolution in sabermetrics and opened the door to a vast amount of research. It changed the way many of us understand the game.

“The BABIP question has been the Great White Whale of the sabermetric enterprise. It is the mystery that, 14 years later, has continued to defy the best efforts of quantitative analysts using public available data. Tom Tango’s FIP assumes that all pitchers have exactly league-average BABIP ability. Even a small increase in predictive ability of that question leads to a huge increase in the accuracy with which you can predict how valuable players will be.

“I studied a bunch of variables I thought might have something to do with hit suppression on balls in play. I came up with two — both FanGraphs stats — that seem to have significant predictive power. The first is pop up rate. The second is z-contact, which is when batters swing at a strike — balls in the strike zone — thrown by a pitcher. What percent of those times does the batter make contact? It turns out that, just like inducing pop ups, it reduces BABIP and correlates consistently year to year. Getting batters to swing and miss at your strikes has strong predictive power on hit suppression.

“I came up with a simple model with two curved fits, using data from 2005-2011, with an R-squared of .15. It accounted for 15 percent of the variance in BABIP for starting pitchers relative to rest of that team’s starting rotation. That factors out for defense and ballpark.

“Fifteen percent might not sound like a lot, and the data is noisy, but it’s a lot relative to zero, which is what FIP will tell you. This little equation correctly identifies every single major BABIP outlier of the last decade. If you look at its leader boards, the guys who most often appear as being projected to have the lowest BABIPs relative to their team, using only data from prior seasons — no cheating — it is Tim Wakefield, Ted Lilly, Barry Zito, Johan Santana, Matt Cain. It is the famous exceptions, one right after the other, after the other.

“The second thing is that it works out of sample. I calculated this equation in March 2012 on data from 2005-2011, and when I applied it to the 2012 season, the R-squared actually went up. It predicted the out-of-sample data even better than the in-sample data. There’s no over-fitting, no cheating or spurious relationships. This is real.

“The third thing that works well is you could have a 15 percent R-squared with a very narrow range of predictions. Let’s say you have the best guy at five points below his teammates and the worst at five points above. That might marginally improve your forecast, but it’s not game changing. This equation gets the magnitudes right. It can forecast very big outliers. The guys who have the lowest BABIPs — Chris Young when he was with the Padres, Jered Weaver now, some of the Ted Lilly seasons — it’s projecting these guys for 30, 40 points of BABIP below their teammates. Huge magnitudes, far and above what you would see in any of the standard projection systems like ZIPS, Steamer or PECOTA. I don’t think any of them are projecting anything close to 40 points of differential. And it’s getting them right.

“The reason the R-squared went up last year is that it made a very bold prediction that Jered Weaver was going to have a BABIP over 40 points lower than his teammates. It got it right to .001 of accuracy. That’s lucky, and just one great prediction, but overall it’s not just improving your accuracy at the margins. It’s identifying big outliers to a big degree.

“I will post my data online, so if anyone wants to poke holes in it, all the better for our understanding of this troublesome phenomenon. I think the best avenue for future research is looking at this equation — at basically the favorite and least favorite pitchers — and asking, ‘What do they have in common?’ The guys who have high pop up rates and low z-contact rates are the guys projected to be good hit suppressors, so what do they throw? How hard do they throw? Are they deceptive? And vice versa for the pitchers the equation doesn’t like.

“I had two hypotheses. I thought tall pitchers, like Young and Weaver, might be good at this. I also thought guys who throw a lot of changeups might be good at this. Cole Hamels and Johan Santana come up very high and they’re great changeup artists. But, in fact, the height and changeup percentage in my high and low BABIP samples were identical.

“I don’t have any piercing insights as to what the guys who are good at this are doing to be good at this. Fortunately, the data is available to everybody and the internet has plenty of smart people who can move our understanding of this issue even farther forward.”

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG