Park Factors and ERA Estimators: Part III by Bill Petti February 28, 2012 When we last left the question on Park Factors’ effect on ERA estimators we found that the estimators performed the best in hitters’ parks when looking at starting pitchers. FIP and xFIP performed better than tERA or SIERA when predicting the next year’s ERA for this group of pitchers. For the other park types, the pattern looked similar to what we generally see — SIERA generally performs best, while all estimators provide better leverage over a pitcher’s YR2_ERA. But what if we want to predict how pitchers with certain batted-ball profiles (fly ball vs. ground ball) will perform in different parks? If we’re trying to predict how C.J. Wilson (lifetime 1.68 GB/FB ratio) will perform moving from Texas to Anaheim — or Michael Pineda’s (0.81 GB/FB ratio) move from pitcher-friendly Safeco to hitter-friendly Yankee Stadium will turn out — in which estimator(s) should we have more faith? That is the focus of Part III. I used the same methodology as Part II to determine park type. I then coded each pitcher as ground ball or fly ball based on their GB/FB ratio. A pitcher’s GB/FB is one of the most consistent metrics (for starter pitchers, the year-over-year correlation is 0.87, which is highest for all outcome metrics), so there was little concern about a pitcher changing their batted-ball profile between seasons. A GB/FB greater than 1 was coded as ground ball; less than 1 was coded as fly ball. In the end, 1,387 season pairs were included in the analysis: Park Type Fly ball Ground ball Hitter 157 71 Pitcher 144 99 Neutral 650 266 Total 951 436 Here’s what I found: Generally, estimators perform better for ground ball pitchers. Across all parks, estimators pick up additional power when they’re applied to ground ball pitches relative to YR1_ERA, with SIERA doubling the predictive power of YR1_ERA. But when we split the parks into hitter- and pitcher-friendly, we find that tERA performs the best across both types of pitchers in hitter’s parks; SIERA works best in pitcher’s parks, at least doubling the predictive power of YR1_ERA for both fly ball and ground ball pitchers. Here are the results for all park types: All Parks Fly ball (n=951) Ground ball (n=436) Metric R RSQR RMSE Sig R RSQR RMSE Sig ERA 0.315 10% 1.137 .01 level 0.310 10% 1.167 .01 level FIP 0.365 13% 1.115 .01 level 0.392 15% 1.129 .01 level xFIP 0.375 14% 1.110 .01 level 0.404 16% 1.123 .01 level tERA 0.363 13% 1.116 .01 level 0.422 18% 1.113 .01 level SIERA 0.400 16% 1.098 .01 level 0.449 20% 1.097 .01 level The results are right in line with what we find overall for ERA estimators — with ground ball pitchers picking up a decent amount of power, relative to YR1_ERA. When we restrict this to hitter-friendly parks, we find that the relative power of the estimators increases — but their overall effectiveness declines. Except, that is, for tERA. tERA actually picks up 2%, in terms of variance explained in YR2_ERA, for fly ball pitchers. And while all estimators decline when applied to ground ball pitchers in hitter’s parks, tERA managed to maintain the same R-squared and explains seven-times the variance in YR2_ERA than Yr1_ERA. Let’s use Pineda’s move to Yankee Stadium for our first example. A good place to start is to examine his tERA, given his batted-ball profile and the park type. Last year, his ERA was 3.74. His tERA was 3.42. Hitter Parks Fly ball (n=157) Ground ball (n=71) Metric R RSQR RMSE Sig R RSQR RMSE Sig ERA 0.272 7% 1.023 .01 level 0.141 2% 1.172 FIP 0.367 13% 0.989 .01 level 0.301 9% 1.129 .05 level xFIP 0.335 11% 1.001 .01 level 0.288 8% 1.134 .05 level tERA 0.388 15% 0.980 .01 level 0.382 15% 1.094 .01 level SIERA 0.333 11% 1.002 .01 level 0.332 11% 1.117 .01 level When we switch over to pitcher-friendly parks, we find that ERA estimators are far better at predicting ground ball pitchers’ performances. Almost every estimator more than doubled the R-squared of YR1_ERA, with SIERA accounting for almost 30% of the variance in ERA in the second year. Previously, the effectiveness of the estimators declined when moving to pitcher-friendly parks. That pattern holds here — at least for fly ball pitchers. Not so for ground ball pitchers. Here, we find the strongest R-squared numbers outside of starters in hitter-friendly parks (see Part II). And what about C.J. Wilson? His move to Anaheim suggests that we start with SIERA for clues as to how he will perform this year. Last season, Wilson posted a 2.94 ERA and a 3.44 SIERA. Pitcher Parks Fly ball (n=144) Ground ball (n=99) Metric R RSQR RMSE Sig R RSQR RMSE Sig ERA 0.238 6% 1.152 .01 level 0.337 11% 1.134 .01 level FIP 0.267 7% 1.143 .01 level 0.458 21% 1.070 .01 level xFIP 0.301 9% 1.131 .01 level 0.487 24% 1.052 .01 level tERA 0.268 7% 1.143 .01 level 0.425 18% 1.090 .01 level SIERA 0.340 12% 1.116 .01 level 0.527 28% 1.024 .01 level Finally, we find that — in neutral parks — the estimators again resemble their performance across all parks. There’s a little explanatory power picked up for ground ball pitchers, versus flyball pitchers, but overall there weren’t many surprises. Neutral Parks Fly ball (n=650) Ground ball (n=266) Metric R RSQR RMSE Sig R RSQR RMSE Sig ERA 0.335 11% 1.158 .01 level 0.328 11% 1.168 .01 level FIP 0.384 15% 1.135 .01 level 0.390 15% 1.138 .01 level xFIP 0.402 16% 1.126 .01 level 0.423 18% 1.120 .01 level tERA 0.375 14% 1.140 .01 level 0.418 17% 1.123 .01 level SIERA 0.427 18% 1.117 .01 level 0.474 22% 1.089 .01 level ERA estimators are just that–estimators. Even the very best only explain ~30% of the variation we see year to year in actual ERA. That being said, they offer a better barometer of a pitcher’s true performance and likely future performance than simply relying on ERA alone. We gain some additional leverage by breaking up the population of pitchers, since there is much that can be masked in a large aggregation. In this current case, we find that tERA provides us with the best estimation for both ground ball and fly ball pitchers in hitter-friendly parks relative to ERA and to other estimators. SIERA, however, takes the crown when looking at both types of pitchers in pitcher-friendly parks. Generally, SIERA is the leading estimator for ground ball hurlers, since it, at a minimum, doubles the variance explained relative to YR1_ERA across all park scenarios.