Park Factors and ERA Estimators: Part III
When we last left the question on Park Factors’ effect on ERA estimators we found that the estimators performed the best in hitters’ parks when looking at starting pitchers. FIP and xFIP performed better than tERA or SIERA when predicting the next year’s ERA for this group of pitchers. For the other park types, the pattern looked similar to what we generally see — SIERA generally performs best, while all estimators provide better leverage over a pitcher’s YR2_ERA.
But what if we want to predict how pitchers with certain batted-ball profiles (fly ball vs. ground ball) will perform in different parks? If we’re trying to predict how C.J. Wilson (lifetime 1.68 GB/FB ratio) will perform moving from Texas to Anaheim — or Michael Pineda’s (0.81 GB/FB ratio) move from pitcher-friendly Safeco to hitter-friendly Yankee Stadium will turn out — in which estimator(s) should we have more faith? That is the focus of Part III.
I used the same methodology as Part II to determine park type. I then coded each pitcher as ground ball or fly ball based on their GB/FB ratio. A pitcher’s GB/FB is one of the most consistent metrics (for starter pitchers, the year-over-year correlation is 0.87, which is highest for all outcome metrics), so there was little concern about a pitcher changing their batted-ball profile between seasons. A GB/FB greater than 1 was coded as ground ball; less than 1 was coded as fly ball. In the end, 1,387 season pairs were included in the analysis:
Park Type | Fly ball | Ground ball |
---|---|---|
Hitter | 157 | 71 |
Pitcher | 144 | 99 |
Neutral | 650 | 266 |
Total | 951 | 436 |
Here’s what I found:
Generally, estimators perform better for ground ball pitchers. Across all parks, estimators pick up additional power when they’re applied to ground ball pitches relative to YR1_ERA, with SIERA doubling the predictive power of YR1_ERA. But when we split the parks into hitter- and pitcher-friendly, we find that tERA performs the best across both types of pitchers in hitter’s parks; SIERA works best in pitcher’s parks, at least doubling the predictive power of YR1_ERA for both fly ball and ground ball pitchers.
Here are the results for all park types:
All Parks | Fly ball (n=951) | Ground ball (n=436) | ||||||
---|---|---|---|---|---|---|---|---|
Metric | R | RSQR | RMSE | Sig | R | RSQR | RMSE | Sig |
ERA | 0.315 | 10% | 1.137 | .01 level | 0.310 | 10% | 1.167 | .01 level |
FIP | 0.365 | 13% | 1.115 | .01 level | 0.392 | 15% | 1.129 | .01 level |
xFIP | 0.375 | 14% | 1.110 | .01 level | 0.404 | 16% | 1.123 | .01 level |
tERA | 0.363 | 13% | 1.116 | .01 level | 0.422 | 18% | 1.113 | .01 level |
SIERA | 0.400 | 16% | 1.098 | .01 level | 0.449 | 20% | 1.097 | .01 level |
The results are right in line with what we find overall for ERA estimators — with ground ball pitchers picking up a decent amount of power, relative to YR1_ERA.
When we restrict this to hitter-friendly parks, we find that the relative power of the estimators increases — but their overall effectiveness declines. Except, that is, for tERA. tERA actually picks up 2%, in terms of variance explained in YR2_ERA, for fly ball pitchers. And while all estimators decline when applied to ground ball pitchers in hitter’s parks, tERA managed to maintain the same R-squared and explains seven-times the variance in YR2_ERA than Yr1_ERA.
Let’s use Pineda’s move to Yankee Stadium for our first example. A good place to start is to examine his tERA, given his batted-ball profile and the park type. Last year, his ERA was 3.74. His tERA was 3.42.
Hitter Parks | Fly ball (n=157) | Ground ball (n=71) | ||||||
---|---|---|---|---|---|---|---|---|
Metric | R | RSQR | RMSE | Sig | R | RSQR | RMSE | Sig |
ERA | 0.272 | 7% | 1.023 | .01 level | 0.141 | 2% | 1.172 | |
FIP | 0.367 | 13% | 0.989 | .01 level | 0.301 | 9% | 1.129 | .05 level |
xFIP | 0.335 | 11% | 1.001 | .01 level | 0.288 | 8% | 1.134 | .05 level |
tERA | 0.388 | 15% | 0.980 | .01 level | 0.382 | 15% | 1.094 | .01 level |
SIERA | 0.333 | 11% | 1.002 | .01 level | 0.332 | 11% | 1.117 | .01 level |
When we switch over to pitcher-friendly parks, we find that ERA estimators are far better at predicting ground ball pitchers’ performances. Almost every estimator more than doubled the R-squared of YR1_ERA, with SIERA accounting for almost 30% of the variance in ERA in the second year. Previously, the effectiveness of the estimators declined when moving to pitcher-friendly parks. That pattern holds here — at least for fly ball pitchers. Not so for ground ball pitchers. Here, we find the strongest R-squared numbers outside of starters in hitter-friendly parks (see Part II).
And what about C.J. Wilson? His move to Anaheim suggests that we start with SIERA for clues as to how he will perform this year. Last season, Wilson posted a 2.94 ERA and a 3.44 SIERA.
Pitcher Parks | Fly ball (n=144) | Ground ball (n=99) | ||||||
---|---|---|---|---|---|---|---|---|
Metric | R | RSQR | RMSE | Sig | R | RSQR | RMSE | Sig |
ERA | 0.238 | 6% | 1.152 | .01 level | 0.337 | 11% | 1.134 | .01 level |
FIP | 0.267 | 7% | 1.143 | .01 level | 0.458 | 21% | 1.070 | .01 level |
xFIP | 0.301 | 9% | 1.131 | .01 level | 0.487 | 24% | 1.052 | .01 level |
tERA | 0.268 | 7% | 1.143 | .01 level | 0.425 | 18% | 1.090 | .01 level |
SIERA | 0.340 | 12% | 1.116 | .01 level | 0.527 | 28% | 1.024 | .01 level |
Finally, we find that — in neutral parks — the estimators again resemble their performance across all parks. There’s a little explanatory power picked up for ground ball pitchers, versus flyball pitchers, but overall there weren’t many surprises.
Neutral Parks | Fly ball (n=650) | Ground ball (n=266) | ||||||
---|---|---|---|---|---|---|---|---|
Metric | R | RSQR | RMSE | Sig | R | RSQR | RMSE | Sig |
ERA | 0.335 | 11% | 1.158 | .01 level | 0.328 | 11% | 1.168 | .01 level |
FIP | 0.384 | 15% | 1.135 | .01 level | 0.390 | 15% | 1.138 | .01 level |
xFIP | 0.402 | 16% | 1.126 | .01 level | 0.423 | 18% | 1.120 | .01 level |
tERA | 0.375 | 14% | 1.140 | .01 level | 0.418 | 17% | 1.123 | .01 level |
SIERA | 0.427 | 18% | 1.117 | .01 level | 0.474 | 22% | 1.089 | .01 level |
ERA estimators are just that–estimators. Even the very best only explain ~30% of the variation we see year to year in actual ERA. That being said, they offer a better barometer of a pitcher’s true performance and likely future performance than simply relying on ERA alone.
We gain some additional leverage by breaking up the population of pitchers, since there is much that can be masked in a large aggregation. In this current case, we find that tERA provides us with the best estimation for both ground ball and fly ball pitchers in hitter-friendly parks relative to ERA and to other estimators. SIERA, however, takes the crown when looking at both types of pitchers in pitcher-friendly parks. Generally, SIERA is the leading estimator for ground ball hurlers, since it, at a minimum, doubles the variance explained relative to YR1_ERA across all park scenarios.
Bill leads Predictive Modeling and Data Science consulting at Gallup. In his free time, he writes for The Hardball Times, speaks about baseball research and analytics, has consulted for a Major League Baseball team, and has appeared on MLB Network's Clubhouse Confidential as well as several MLB-produced documentaries. He is also the creator of the baseballr package for the R programming language. Along with Jeff Zimmerman, he won the 2013 SABR Analytics Research Award for Contemporary Analysis. Follow him on Twitter @BillPetti.
So what do you think we should expect from Mat Latos in 2012, then? Having pitched in Petco his whole career he had a shining 3.48 Career SIERA. That drops all the way to a 2.87 Career tERA. Now he’s moving to a park where he StatCorner ranks as a Park Factor of 103 for both LHB/RHB wOBA, so hitter-friendly-ish. There’s a career .46 ERA difference in his home/road splits.
Any thoughts on what his 2012 ERA might look like?