What Factors Affect HR/FB Rate?

February 16, 2011

In the past few weeks, home run per fly ball rate has been a hot topic here at FanGraphs and around the baseball blogosphere. Dave Cameron sparked a lively discussion with this post about Matt Cain, and then showed that Giants’ pitching coach Dave Righetti has been able to get better than average HR/FB rates out of a number of his starters. Another interesting article suggests that HR/FB rate may have something to do with pitch movement.

At this point, it seems clear that HR/FB rate may not be completely luck. It may have a lot of luck involved, but there are some factors that can increase or decrease this statistic. Using the vast FanGraphs database and some regression modeling, we can hunt for these factors and find out just how much they matter.

The data examined is from 2002 to 2009, leaving last season out of the data set in order to test the model’s forecasting ability later. This data is for starting pitchers only, and only those who threw 80 or more innings. Also, pitchers who changed teams during the season were excluded.

There were numerous independent variables tested, and countless models examined. For brevity, I won’t take you through ever iteration of the process, but instead tackle categories of variables.

Park Factor

This is the variable which is probably the most obvious to include. Many studies have shown that a player’s HR/FB rate depends on his home park, and this analysis concurs. The variable’s coefficient is positive and significant at the 99-percent level, which means that there is less than a 1-percent chance that the relationship between this variable and the dependent variable (HR/FB) is completely random.

The only surprise here is that the coefficient for park factor has a relatively low elasticity, which means it does not have a large effect on HR/FB rate. This is probably due to the fact that park factor really only controls for about half of the games a pitcher starts. This variable could be improved by making it a weighted average of all of the parks the pitcher played in over the course of the season, but that is simply too time consuming for this end.

Pitcher Skill

Using xFIP as an independent yields a significant coefficient, but that is too vague of an answer. What exactly about xFIP matters for HR/FB rate?

It turns out that both K/9 and BB/9 yield significant coefficients at the 99-percent level. The former has a negative coefficient and the latter has a positive, meaning that more strikeouts and fewer walks led to a lower HR/FB.

For K/9, this shows that pitchers who miss bats regularly may be able to suppress their HR/FB rates. This is not very surprising – if hitters are unable to put the ball in play against a pitcher, they may be less likely to square up a pitch and drive it when they do make contact.

For BB/9, the interpretation is a little less obvious. The theory put forth here is that walk rate measures a pitcher’s control. A pitcher with good control is less likely throw balls out of the strike zone, thus the low walk rate, and also is less likely to groove one down the middle.

Plate Discipline

For this section, FanGraphs’ variables for swing rates in and out of the zone, and contact rates in and out of the zone were tested, as well as first pitch strike percentage and swinging strike percentage.

Of all of these variables, only one showed evidence of a relationship to HR/FB rate: O-Contact%. This variable had a significant negative coefficient, meaning that the more often that hitters were making contact on pitches outside the strike zone, the lower the HR/FB rate was for the pitcher.

This tells us that balls hit outside the strike zone are simply less likely to be home runs. This does not mean they are not likely to be hits, or even extra base hits, but they are not leaving the park at the same rate as balls hit within the zone.

This variable has a 99-percent significance and a large elasticity, meaning that this variable highly effects the HR/FB rate. If a pitcher could consistently cause hitters to make contact with pitches outside the strike zone, they may be able to hold down their home run rate. Can pitchers continually bait hitters into swinging at pitches that they can hit, but not hit well? That’s an area for more research.

Batted Ball Percentages

The four batted ball types were examined as independent variables: fly ball percentage, line drive percentage, ground ball percentage, and infield fly ball percentage. Of these four, the only variable which proved to have a significant effect on HR/FB is IFFB%.

Including IFFB% yields a negative coefficient which is significant at the 99-percent level. This makes a ton of sense. Infield flies have a zero probability of being home runs. Together with O-Contact%, these variables help remove batted balls which have little or no chance of leaving the ball park from a pitcher’s HR/FB percentage.

What is more interesting here is that none of the other three batted ball variables have a significant coefficient. While it has been shown that fly ball pitchers usually have better HR/FB rates than ground ball pitchers, the results of this model suggest that it may have more to do with the increase in popups, rather than an ability to get a higher percentage of outs on fly balls to the outfield.

Pitch Velocity

The variables tested had to do with the frequency and velocity of pitch types in a hurler’s arsenal. This category produced truly interesting results.

From the pitch type variables, one makes it to the final model: average fastball velocity. The statistic FBv was shown to have a negative effect on HR/FB rate, significant at the 90-percent level. The better fastball a pitcher has, the better his HR/FB rate.

At first this seems counterintuitive because fast pitches travel farther when hit. However, this result shows that HR/FB rate may have more to do with being able to square up a pitch. High velocity pitches may simply be tougher to make solid contact with, allowing pitchers who throw harder to get away with more mistakes.

Putting It All Together

After countless regression models and variable combinations, the final model is thus:

Elasticity is defined as the percent change in x for a percent change in y. For this use, you can interpret the K/9 elasticity as a five percent increase in strikeout rate translating into a one percent decrease in HR/FB rate.

The r-squared for the model is 10-percent. Yes, this is a low r-squared. No, it does not invalidate the model. The significance of the independent variables are what is important here, and all of those values are more than acceptable. The fact that the r-squared is only 10 percent after fitting six significant independent variables shows that HR/FB is still very much impacted by randomness or other outside factors. This model gets us closer to the heart of HR/FB rate, but it does not explain all of its variability, and perhaps nothing can. This does suggest that, while pitchers have some ability to impact their HR/FB rates, there are significant variables that they do not control, which explains the year to year fluctuations for pitchers who undergo no other noticeable changes.

The next logical question is “does the model work?” Using this model to project 2010 HR/FB rates yields a smaller sum squared error and mean squared error than a naive estimator of 10.6 HR/FB rate. So, preliminarily, “yes, it does.”

In the coming days, this model will be used to further the discussion about HR/FB rate and look at if it is possible to continually outperform the mean.

37 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Bradley WoodrumMember since 2020

14 years ago

“What is more interesting here is that none of the other three batted ball variables have a significant coefficient.”

What specifically do you mean by “significant”? I see you included BB/9 with an 85% significance — how low were the other variables?

I ask because the typical cut-offs for significance (99%, 95%, and 90%) and annoyingly arbitrary and disgustingly misleading. My curiosity compels me to discover what you decided for a cut-off point.

D4P

Reply to Bradley Woodrum

On a related note, did you include each of the four batted ball type percentages in the model simultaneously, or just one at a time?

Jesse WolfersbergerMember since 2020

Reply to D4P

Excellent questions. I went with a 80% cutoff. That’s lower than some use, but including a semi-significant variable creates less problems than excluding something that turns out to be relevant.

For testing, I included the variables together, one at a time, and nearly every combination in-between. Tons of iterations, trust me.

notdissertating

Instead of only looking at p-values, you might instead ask if you can reject reasonably large effects. The elasticities you present are easily interpretable, and at least for those you present in the table seem to be pretty precisely estimated. In other words 0 is not necessarily the most interesting null hypothesis here.

I’d also be interested to see results using standardized versions of all the regressors as an alternative to elasticities.

Finally, I am uncomfortable with cherry-picking regression specifications based on statistical significance (at least I think this is what you did). This can lead to severe bias and also invalidates all of your p-values. This is somewhat resolved using the out-of sample 2010 prediction, but it would be better to pre-specify just a few different specifications and test those.

Still, really interesting, and good work.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG