Some of you may recall that before being promoted from a FanGraphs Community Research writer to an actual FanGraphs writer, my primary focus was on the relationship between batted ball types (infield fly balls, in particular) and BABIP for pitchers. At the time, I’d been leaving park factors out of the equation in a [vain] attempt to keep things simple, but now I want to give them a bit of attention.
Now, Guts! is a great resource on FanGraphs, but it does leave out BABIP, HR/FB, and — I believe — something else I’d like to talk to you about for a second. If you’re a big fan of the batted ball stats here, this bit of information might be completely earth-shattering, leaving you sobbing in a heap on the floor, pondering how your life will never be the same again: IFFB% may not mean what you think it does. Now, FB%, for example — that’s defined as fly balls divided by batted balls, right? Many of us might therefore assume that IFFB% equals infield fly balls divided by batted balls… but it doesn’t. IFFB% is actually infield fly balls divided by fly balls. This means that IFFB% doesn’t tell you much about a player unless you have the context of his FB% to go with it. It also means that IFFB% * FB% equals what you probably thought IFFB% was, which is IFFB/(Batted Balls).
Hopefully you’ll be able to read this clearly through your tears: I’m going to introduce a new, not-officially-FanGraphs-sanctioned term here: IFFB% * FB% = PU%. PU%, or popup percentage — again, it’s what you probably thought IFFB% meant — is the percentage of batted balls that are infield flies. This leaves OFFB%, or outfield fly balls, as the remainder of FB% (i.e., FB% = PU%+OFFB%).
So, without further ado, here’s a sortable list of the park factors I came up with for the 2009-2012 seasons, with the exceptions noted at the bottom:
* Twins’ factors based on 2010-2012 data only
** Marlins Park excluded due to 2012 being first year (insufficient sample size)
*** Citi Field’s walls were moved closer in 2012
Park factors are halved (based on the assumption that a player will play half of their games there).
If you hadn’t heard, The Padres and Mariners will be moving the fences in a bit this year, by the way. My apologies if I neglected to mention any significant park dimension changes that happened between 2009 and 2012.
If you’re like me, you might find this table interesting; if you’re a normal person, skip right ahead:
Correlations Between Park Factors
An obligatory refresher for those who haven’t taken statistics in a while (or ever): correlation coefficients (“r”) range between -1 and 1. A correlation of “0” means the two factors being compared have no apparent connection, whereas “1” indicates the two factors move together in a perfectly linear way, and “-1” means they move perfectly linearly in opposite directions.
I bolded the connections I thought were the most interesting. Now, to discuss them more in-depth:
High LD% factor = high BABIP factor
This should come as no surprise to those of you who read my first Community article, in which I pointed out LD% and [what I’m now calling] PU% as the two main factors for explaining pitcher BABIPs. LD% for a pitcher is hard to predict from year-to-year, and park factors aren’t entirely consistent on a yearly basis either, but many of the line drive park factors do make a lot of sense, and you can reasonably expect the factors behind them to exert their influence yearly.
Specifically, let’s look at the top two parks in terms of high LD% — Colorado and Texas. What do they have in common? Well, the most obvious is thin air; Colorado due to its altitude, and Texas presumably due to heat and perhaps dryness. Thin air, of course, offers less resistance to a batted ball, but it also should theoretically allow for less break on pitches. Most of the stadia (that’s fancy talk for “stadiums”) at the low end of the list also make sense, having thick marine air. KC is an exception… but then again, its BABIP factor isn’t in-line with those of its surrounding teams on the LD% list. This could have something to do with scorer’s bias issues, such as the one discussed here. Another possible contributor to LD% differences is the batter’s eye in each stadium.
High PU% = low BABIP
This shouldn’t be a shocker to those of you who’ve read my previous work. Popups are pretty close to automatic outs. Let’s talk about how stadium characteristics might influence PU%. The first thing that comes to mind is that a greater amount of foul territory should lead to a higher PU%; that’s because a foul IFFB is only recorded if caught.
Another possible factor, judging by the Rays’ home field being firmly at the top of the list, is the dome factor. You might think the whitish background of the dome against a popup might not be so conducive to catching it, but perhaps the lack of sun and wind helps to make up for that. And it’s not like fielders in non-domed parks never have to deal with whitish backgrounds — clouds and haze are a thing, after all.
High HR/FB equals high BABIP, high OFFB%, and low PU%?
It’s worth reminding you at this point that home runs are excluded from consideration in BABIP, but not in batted ball stats. That’s one reason why fly ball pitchers tend to have lower BABIPs — they may allow more HR, but those don’t count as a knock against their BABIPs. The other reason is that fly balls, especially popups, make for easier putouts.
So, if HR aren’t part of BABIP, why would HR/FB have an apparent strong-ish connection to BABIP? The most obvious is that a high HR/FB is a sign of harder contact being made, for whatever reason, which you might expect to lead to a higher BABIP. Of course, you would also expect a higher HR/FB in small stadia, where perhaps more balls are bouncing uncatchably off of outfield walls and dropping for hits.
Now, PU% and OFFB% generally move together, both moving against BABIP, whereas HR/FB moves together with BABIP. That’s why I found it interesting that HR/FB divides PU% and OFFB%. I think that’s easy to explain in the context of an individual pitcher, but maybe not so much in the context of park factors. I’d like to hear your theories on it.
Oh, but before we make too much of this, I should tell you that the HR/FB factor appears to be the most prone to fluctuation of the bunch.
Putting it all together, kind of
As I am wont to do, I’ve regressed some of the various park factors to see how they might be able to explain each park’s BABIP factor:
BABIP = 0.48*LD% + 0.37*GB% – 0.05*PU% + 0.11*OFFB% + 0.09*HR/FB
The formula itself is, since it only applies to park factors, as useful as a poopie-flavored lollipop (Patches O’Houlihan) but it does have a 0.696 correlation to a stadium’s BABIP factor, meaning it can explain nearly half of the differences in BABIP factors (with a 0.484 R-squared). The park it has the hardest time explaining — by far — is Fenway, no doubt largely thanks to The Green Monster’s extreme BABIP-boosting ways. Take Boston out of the mix, and the correlation shoots to 0.772 (0.596 R-squared). Remove the second-biggest outlier, Kauffman Stadium in KC (with its suspiciously-low LD% factor) and the correlation goes to 0.816, explaining 2/3 of the differences. The exclusion of these outliers lends itself to the creation of a formula not tainted by them, which you probably don’t care about, yet here it is anyway:
BABIP = 0.52*LD% + 0.28*GB% – 0.03*PU% + 0.12*OFFB% + 0.11*HR/FB
That one achieves a 0.829 correlation to the remaining BABIP park factors (0.687 R-squared).
That can be whittled down to:
BABIP = 0.552*LD% + 0.320*GB% + 0.124*HR/FB
…which has a 0.821 correlation to BABIP factor, but if you remove any of those three factors, the correlation takes a major hit (though PU% and OFFB% together can mostly compensate for the loss of GB%).
I haven’t talked about what might contribute to a park’s GB% factor… well, groundskeeping might have a bit to do with it, but my guess is that it’s mainly due to less foul territory, and therefore fewer easy foul ball outs.
For those who are curious, the formula I used to calculate each factor was:
(Home Pitching + Home Batting) / (Away Pitching + Away Batting) * 100
… which is a pretty standard park factor formula. I then halved it like so: 0.5 + Factor/2 … this is based on the assumption that the player plays half their games away at a neutral-factor stadium. When you consider that some teams play in divisions full of non-neutral opponent stadiums (e.g., Texas faces a bunch of pitcher’s parks), that’s probably not such a safe assumption to make, buuut it’s how park factors are done, and it’s a topic for a different conversation.
Steve is a robot created for the purpose of writing about baseball statistics. One day, he may become self-aware, and...attempt to make money or something?