Park Factors and ERA Estimators: Part II by Bill Petti February 22, 2012 In my series’ first part, I looked at the effect that Park Factors have on various ERA estimators. The original question I attempted to answer was whether certain estimators were better suited for predicting performance, depending on whether a park is hitter-friendly or pitcher-friendly. The short answer was that ERA estimators did a much better job in hitter-friendly parks than pitcher-friendly parks, relative to YR1_ERA. One question I didn’t answer was whether the effectiveness of estimators in various types of parks also varied by pitcher role (i.e. starters versus relievers). Generally speaking, ERA estimators perform better when you restrict the analysis to starters only — since relievers tend to be more volatile year-over-year. The question is whether this same pattern will hold given park factors’ impact. And as predicted, ERA estimators do a better job predicting performance for starters versus relievers. The current data set includes 533 pairs of starter seasons and reliever seasons where the pitchers threw in the same parks in the first and second years, and did so as starters or relievers both years. Before segmenting by park type, we see results that are consistent with previous analysis regarding ERA estimators and their predictive powers for starters and relievers: All Parks Starters (n=533) Relievers (n=640) Metric R RSQR RMSE Sig R RSQR RMSE Sig ERA 0.361 13% 0.841 .01 level 0.265 7% 1.293 .01 level FIP 0.415 17% 0.820 .01 level 0.320 10% 1.271 .01 level xFIP 0.416 17% 0.820 .01 level 0.310 10% 1.280 .01 level tERA 0.405 16% 0.824 .01 level 0.346 12% 1.260 .01 level SIERA 0.414 17% 0.820 .01 level 0.361 13% 1.251 .01 level The results by park-type were similar, in that they performed better when focusing on starting pitchers. Additionally, the estimators did the best job relative to YR1_ERA for starters throwing in hitter-friendly parks. I used the following criteria to classify pitchers as starters: >=100 innings pitched, >=15 games started, and >=50% of all appearances were games started. The goal here was to narrow the focus to pitchers that spend the majority of their time starting games versus simply racking up innings pitched. I then took the original collection of 1400 season pairs and further restricted the population to those pitchers that were classified as starters or relievers in both seasons in the dyad. That reduced the population for this study to 1173 total season pairs–533 for starters and 640 for relievers. Additionally, I relaxed the coding criteria just a tad for Hitter and Pitcher parks (essentially, reducing and increasing the park factor cutoff for Hitter and Pitcher friendly parks, respectively). This was done to increase the n-sizes since segmenting by pitcher-type significantly decreased the sample I had to work with. Here’s the breakdown by pitcher and park type: Type of Park Starters Relievers Neutral 360 410 Hitter 85 115 Pitcher 88 115 Total 533 640 Let’s look at the results for each park type in more detail. Hitter-friendly Parks Hitter Parks Starters (n=85) Relievers (n=115) Metric R RSQR RMSE Sig R RSQR RMSE Sig ERA 0.386 15% 0.708 .01 level 0.191 4% 1.268 .05 level FIP 0.472 22% 0.676 .01 level 0.324 10% 1.222 .01 level xFIP 0.478 23% 0.674 .01 level 0.218 5% 1.261 .05 level tERA 0.458 21% 0.682 .01 level 0.358 13% 1.206 .01 level SIERA 0.460 21% 0.681 .01 level 0.257 7% 1.248 .01 level As with the previous analysis, we find that ERA estimators perform better for starting pitchers in hitter-friendly parks. The estimators go from a 3-4% advantage in YR2_ERA variance explained to 6-8% in hitter-friendly environments. Additionally, their RMSEs all decrease in size. FIP and xFIP outperformed tERA and SIERA, but by only 1% and 2% respectively. Generally speaking, the estimators all performed about the same for starters. Relievers, however, was a completely different story. Not only where the results not as robust in terms of variance explained, but FIP and tERA emerged as the stronger estimators. FIP doubled the variance explained of xFIP and tERA almost did the same for it’s batted-ball colleague, SIERA. Pitcher-friendly Parks Pitcher Parks Starters (n=88) Relievers (n=115) Metric R RSQR RMSE Sig R RSQR RMSE Sig ERA 0.379 14% 0.868 .01 level 0.177 3% 1.122 FIP 0.380 14% 0.868 .01 level 0.228 5% 1.110 .05 level xFIP 0.410 17% 0.856 .01 level 0.246 6% 1.105 .01 level tERA 0.418 17% 0.852 .01 level 0.198 4% 1.118 .05 level SIERA 0.423 18% 0.850 .01 level 0.283 8% 1.094 .01 level In pitcher-friendly parks, we found that the explanatory power of the estimators relative to YR1_ERA was roughly a wash. While FIP suffered a bit of a decline relative to it’s performance across all parks for starters, the rest of the estimators performed relatively the same. tERA and SIERA picked up 1% each in terms of variance explained, but all estimators saw their RMSEs increase. Results for relievers in pitcher-friendly parks were worse than for all parks. xFIP and SIERA came in first, with both at least doubling the explanatory power of YR1_ERA. Both estimators were also significant at the .01 level, while the other two were significant at the .05 level. (Oddly enough, YR1_ERA was not significant at the .01 or .05 level.) The first conclusion is that the general pattern uncovered in the original analysis holds when we segment pitchers by type–ERA estimators perform much better when used to predict performance in a hitter-friendly park. Second, as we have seen previously, estimating reliever performance year over year is much more difficult compared to starters. While we do gain a bit of leverage for relievers throwing in hitter-friendly parks, there is still a ton of variance that is unaccounted for by the ERA estimators. Despite these findings it is still unclear if we should be leveraging, say, xFIP to better evaluate how a pitcher will perform in a new, hitter-friendly park. While xFIP does perform better in those parks overall, it may be that flyball pitchers future ERA is more affected by park factors and, therefore, SIERA or tERA may be more appropriate for that kind of analysis. That will be the focus of Part III.