Park Factors and ERA Estimators: Part II

February 22, 2012

In my series’ first part, I looked at the effect that Park Factors have on various ERA estimators. The original question I attempted to answer was whether certain estimators were better suited for predicting performance, depending on whether a park is hitter-friendly or pitcher-friendly. The short answer was that ERA estimators did a much better job in hitter-friendly parks than pitcher-friendly parks, relative to YR1_ERA.

One question I didn’t answer was whether the effectiveness of estimators in various types of parks also varied by pitcher role (i.e. starters versus relievers). Generally speaking, ERA estimators perform better when you restrict the analysis to starters only — since relievers tend to be more volatile year-over-year. The question is whether this same pattern will hold given park factors’ impact. And as predicted, ERA estimators do a better job predicting performance for starters versus relievers.

The current data set includes 533 pairs of starter seasons and reliever seasons where the pitchers threw in the same parks in the first and second years, and did so as starters or relievers both years. Before segmenting by park type, we see results that are consistent with previous analysis regarding ERA estimators and their predictive powers for starters and relievers:

All Parks	Starters (n=533)				Relievers (n=640)
Metric	R	RSQR	RMSE	Sig	R	RSQR	RMSE	Sig
ERA	0.361	13%	0.841	.01 level	0.265	7%	1.293	.01 level
FIP	0.415	17%	0.820	.01 level	0.320	10%	1.271	.01 level
xFIP	0.416	17%	0.820	.01 level	0.310	10%	1.280	.01 level
tERA	0.405	16%	0.824	.01 level	0.346	12%	1.260	.01 level
SIERA	0.414	17%	0.820	.01 level	0.361	13%	1.251	.01 level

The results by park-type were similar, in that they performed better when focusing on starting pitchers. Additionally, the estimators did the best job relative to YR1_ERA for starters throwing in hitter-friendly parks.

I used the following criteria to classify pitchers as starters: >=100 innings pitched, >=15 games started, and >=50% of all appearances were games started. The goal here was to narrow the focus to pitchers that spend the majority of their time starting games versus simply racking up innings pitched. I then took the original collection of 1400 season pairs and further restricted the population to those pitchers that were classified as starters or relievers in both seasons in the dyad. That reduced the population for this study to 1173 total season pairs–533 for starters and 640 for relievers. Additionally, I relaxed the coding criteria just a tad for Hitter and Pitcher parks (essentially, reducing and increasing the park factor cutoff for Hitter and Pitcher friendly parks, respectively). This was done to increase the n-sizes since segmenting by pitcher-type significantly decreased the sample I had to work with.

Here’s the breakdown by pitcher and park type:

Type of Park	Starters	Relievers
Neutral	360	410
Hitter	85	115
Pitcher	88	115
Total	533	640

Let’s look at the results for each park type in more detail.

Hitter-friendly Parks

Hitter Parks	Starters (n=85)				Relievers (n=115)
Metric	R	RSQR	RMSE	Sig	R	RSQR	RMSE	Sig
ERA	0.386	15%	0.708	.01 level	0.191	4%	1.268	.05 level
FIP	0.472	22%	0.676	.01 level	0.324	10%	1.222	.01 level
xFIP	0.478	23%	0.674	.01 level	0.218	5%	1.261	.05 level
tERA	0.458	21%	0.682	.01 level	0.358	13%	1.206	.01 level
SIERA	0.460	21%	0.681	.01 level	0.257	7%	1.248	.01 level

As with the previous analysis, we find that ERA estimators perform better for starting pitchers in hitter-friendly parks. The estimators go from a 3-4% advantage in YR2_ERA variance explained to 6-8% in hitter-friendly environments. Additionally, their RMSEs all decrease in size. FIP and xFIP outperformed tERA and SIERA, but by only 1% and 2% respectively. Generally speaking, the estimators all performed about the same for starters.

Relievers, however, was a completely different story. Not only where the results not as robust in terms of variance explained, but FIP and tERA emerged as the stronger estimators. FIP doubled the variance explained of xFIP and tERA almost did the same for it’s batted-ball colleague, SIERA.

Pitcher-friendly Parks

Pitcher Parks	Starters (n=88)				Relievers (n=115)
Metric	R	RSQR	RMSE	Sig	R	RSQR	RMSE	Sig
ERA	0.379	14%	0.868	.01 level	0.177	3%	1.122
FIP	0.380	14%	0.868	.01 level	0.228	5%	1.110	.05 level
xFIP	0.410	17%	0.856	.01 level	0.246	6%	1.105	.01 level
tERA	0.418	17%	0.852	.01 level	0.198	4%	1.118	.05 level
SIERA	0.423	18%	0.850	.01 level	0.283	8%	1.094	.01 level

In pitcher-friendly parks, we found that the explanatory power of the estimators relative to YR1_ERA was roughly a wash. While FIP suffered a bit of a decline relative to it’s performance across all parks for starters, the rest of the estimators performed relatively the same. tERA and SIERA picked up 1% each in terms of variance explained, but all estimators saw their RMSEs increase.

Results for relievers in pitcher-friendly parks were worse than for all parks. xFIP and SIERA came in first, with both at least doubling the explanatory power of YR1_ERA. Both estimators were also significant at the .01 level, while the other two were significant at the .05 level. (Oddly enough, YR1_ERA was not significant at the .01 or .05 level.)

The first conclusion is that the general pattern uncovered in the original analysis holds when we segment pitchers by type–ERA estimators perform much better when used to predict performance in a hitter-friendly park. Second, as we have seen previously, estimating reliever performance year over year is much more difficult compared to starters. While we do gain a bit of leverage for relievers throwing in hitter-friendly parks, there is still a ton of variance that is unaccounted for by the ERA estimators.

Despite these findings it is still unclear if we should be leveraging, say, xFIP to better evaluate how a pitcher will perform in a new, hitter-friendly park. While xFIP does perform better in those parks overall, it may be that flyball pitchers future ERA is more affected by park factors and, therefore, SIERA or tERA may be more appropriate for that kind of analysis.

That will be the focus of Part III.

5 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Corey

13 years ago

Very interesting, nice work. What were the results for neutral parks?

Bill PettiMember since 2020

Reply to Corey

I don’t have it handy, but it was roughly the same as the pitcher-friendly parks.

Reply to Bill Petti

Oh that’s interesting, I was thinking either roughly half way between the two or more similar to the hitter friendly parks.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG