What I Hate About Line Drives

January 7, 2009

This is my first post at FanGraphs, and I would like to thank David Appelman for inviting me onboard. I have previously written for Seamheads.com and StatSpeak.net, and frequent “The Book” blog. If you’d like to know some more about my background, check out this article I wrote a few months ago.

Today I am going to start off by climbing up on my soapbox to address one of my pet peeves, the use of Line Drive rates as a predictor for Batting Average on Balls in Play (BABIP). The standard practice is to estimate BABIP by LD/Balls in Play + .12. It is claimed that LD rateas are more stable than BABIP from year to year, and that when the actual observed BABIP varies from the predicted by a large margin, this indicates a future regression to the mean.

I’m in the process of updating my park factors for 2008, along with adding in 1999, 1955 and 1953 that the folks at RetroSheet have included in their most recent release. I’ve added a couple more categories, foul flies and line drives. Now, I’ve never heard anyone mention park factors when using LD rates, but in fact they are quite large. I might guess that there could different opinions of what is a line drive from one ballpak to another, or maybe it’s the air or the hitting background. I limited my LD factors to 2003-2008, when the RetroSheet data has complete information on whether a ball is a line drive, ground ball, fly ball or popup on every batted ball, including hits. In Arlington, a batter is 18% more likely to have a batted ball coded as a LD, which may have helped Milton Bradley to have the 2nd highest LD rate in 2008 – while in Minneapolis, it’s 20% less likely. Four of the lowest six LD rates belong to Michael Bourn, Geoff Blum, Ty Wigginton and Hunter Pence, and Minute Maid Park has the second lowest LD park factor at 0.82. This is not saying that Houston batters hit fewer line drives – it’s that Houston and it opponents both have 18% fewer balls scored as liners in Houston than they do on the road.

PARK_ID PARK_NAME            First   Last    PAw     LDf	   
PHI12   Veterans Stadium     2003    2003    4768   1.23	   
ARL02   Ballpark Arlington   2003    2008   26850   1.18	   
TOK01   Tokyo Dome           2004    2008     283   1.13	   
CIN09   Great American       2003    2008   28827   1.11	   
DEN02   Coors Field          2003    2008   29158   1.10	   
STL10   Busch Stadium III    2006    2008   13967   1.09	   
KAN06   Kauffman Stadium     2003    2008   27530   1.09	   
WAS11   Nationals Park       2008    2008    4790   1.09	   
TOR02   Rogers Centre        2003    2008   27513   1.08	   
SFO03   Phone Co Park        2003    2008   29439   1.07	   
MON02   Stade Olympique      2003    2004    7684   1.07	   
STL09   Busch Stadium II     2003    2005   14280   1.06	   
STP01   Tropicana Field      2003    2008   27830   1.06	   
DET05   Comerica Park        2003    2008   28008   1.06	   
PHI13   Citizens Bank Park   2004    2008   24640   1.06	   
MIL06   Miller Park          2003    2008   29354   1.06	   
WAS10   RFK Stadium          2005    2007   14885   1.05	   
OAK01   Oakland Coliseum     2003    2008   26719   1.03	   
SEA03   Safeco Field         2003    2008   26683   1.01	   
CHI12   Comiskey Park II     2003    2008   28644   1.00	   
NYC16   Yankee Stadium       2003    2008   28722   1.00	   
MIA01   Dolphin Stadium      2003    2008   29849   1.00	   
CLE08   Jacobs Field         2003    2008   28136   0.99	   
BAL12   Camden Yards         2003    2008   29103   0.99	   
PIT08   P.N.C. Park          2003    2008   27652   0.98	   
PHO01   Bank One Ballpark    2003    2008   28810   0.98	   
SJU01   Hiram Bithorn        2003    2004    2598   0.98	   
SAN01   Jack Murphy          2003    2003    4943   0.98	   
LOS03   Dodger Stadium       2003    2008   29555   0.98	   
CHI11   Wrigley Field        2003    2008   28663   0.96	   
SAN02   PetCo Park           2004    2008   24432   0.95	   
NYC17   Shea Stadium         2003    2008   29299   0.92	   
BOS07   Fenway Park          2003    2008   28311   0.86	   
ATL02   Turner Field         2003    2008   29016   0.86	   
ANA01   Anaheim Stadium      2003    2008   26490   0.86	   
HOU03   Minute Maid Park     2003    2008   28271   0.82	   
MIN03   Metrodome            2003    2008   28048   0.80

Point Two – are line drives really more predictive? It’s said that if a player’s BABIP is not close to his LD+.12, that it’s becuse of luck, and this should be expected to correct itself next season. Expect the overachiever to come back to Earth.

For all the batters from 2003-2008, in non-bunt plate appearances, I added up the base hits, line drives, ground ball, fly balls and popups. I compared the predicted BABIP to the observed one in each season, which showed a root mean square (RMS) error of .045. Then I compared each years predicted value to the next years observed, and the RMS was .048 – slightly larger. For pitchers, the RMS was .039 in the same season, .039 in the next. I don’t see the evidence of future regression.

Complete line drive data is only available since 2003, and for a few seasns in the 1990s. In the seasons when it was not available, a “true talent level” of BABIP can be estimated by using a rolling weighted mean of past data, commonly referred to as Marcel. I used a seasonal weight of 0.7 – the most recent season is weighted at 1.00, the one before that at 0.70, two seasons back at 0.49, etc, each previous year 0.7 times the next. In this test, I did not use any regression to the league mean. The RMS of LD+.12 compared to the Marcel for the same season was .048 for batters, .046 for pitchers. The Marcel compared to the observed BABIP in the NEXT season was .041 for batters, .039 for pitchers. Historical BABIP data is better than the current season’s LD rate.

If LD data is available, so are GB, FB & PU. I tried a more complex model using .15*FB+.24*GB+.73*LD to estimate BABIP. This worked much abtter at reducing the mean errors, even surpassing historical BABIP. For batters, the yearly RMS came down from .048 to .036, for pitchers from .041 to .031.

Still, you can’t assume that every batter has the same rate of hits on their ground balls. Some batters hit more balls to the left side than the right, some run fast and some run slow. Instead of trying to profile each batter on each type of batted ball, I will continue to use Marcel to weight each batter’s historical BABIP in my projections.

On the other hand, DIPS theory states that a pitcher has little control over the outcome once a ball has been put into play. There is clearly an ability to be a flyball or groundball pitcher. Line drives are considered mistakes, and that may be evidenced ny looking at the six-year totals which show the lowest LD rates nelonging to Mariano Rivera, Fausto Carmona and Derek Lowe, while the highest belong to guys like John Van Benschoten, Edwin Jackson and Tony Armas Jr. Using the FB-FB-LD estimator on the six-year totals drops the pitchers RMS all the way down to .016.

Even so, some pitchers consistently defy the estimates. Roger Clemens, Brian Bannister, Chien-Ming Wang, Carlos Zambrano, Dan Haren, Brandon Webb, Chris Young and Greg Maddux all do at least .020 better than estimated. On the other end, Zach Duke, Sidney Ponson and Glendon Rusch all under perform by at least .020. Is it the ballpark? Is it their defense? The batters they faced? Or is it their own skill or lack of it?

Here’s my plan (I won’t have the answers next week) I want to compile park factors for each type of batted ball in each ballpark – what is the normalized rate of hits for flyballs to left in Dodger Stadium? Then do a WOWY analysis of fielders, showing the rate that each fielder allows more or fewer hits than expected on each groundball, flyball, linedrive and popup. Finally, each batter’s rates. Then go back and look at how many times each pitcher faced each batter, and with which fielders, and in which ballparks. Once those are controled, see how many hits, plus or minus, are left over for each pitcher.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG