Testing Projections for 2011

by Matt Swartz

February 9, 2012

Each year, baseball fans and commentators across the nation make bold predictions about what they expect in the coming year. They frequently make outlandish claims like “Adam Dunn is going to hit 50 home runs in Comerica Park!” or “This is the year that Joe Mauer finally hits .400!” but such predictions are far more likely to be high than low. Sure, if you said Jose Bautista was going to summon greatness going into 2010, you looked pretty smart, but anyone who predicts performance seriously knows that you need to hedge your bets. While frequently accused of being overly pessimistic about whoever your Home Nine are, on average, they land high about as often as they land low. This field of “projection systems” grows by the year, but there are significant differences between them. Today, I’ll evaluate their 2011 projections for hitters and pitchers.

Firstly, lets peak at the candidates:

MARCEL: Tom Tango’s free projection system, intentionally using a simple formula as a challenge to forecasters.
PECOTA: Baseball Prospectus’ projection system available by subscription, run by Colin Wyers.
OLIVER: The Hardball Times’ projection system available by subscription, run by Brian Cartwright.
ZIPS: Baseball Think Factory’s free projection system, run by Dan Szymborski.
CAIRO: Revenge of the RLYW’s free projection system, run by “SG.”
STEAMER: Free projection system, run by Jared Cross, and his former students, Dash Davidson and Peter Rosenbloom.
You can learn more about these projection systems here.

HITTERS

The projection systems differ significantly with respect to their standard deviations of wOBA, with some hitting projection systems being particularly more risky in estimating the performance of players. The more risky a projection system, the more likely it will be wrong by a lot, which hurts its performance, particularly with respect to its Root Mean Square Error. Thus, riskier projection systems may be right more often, but when they’re wrong, they’re very wrong. So, before we do anything, let’s rank the projection systems in terms of how risky they are:

Projection	StDev of WOBA
Oliver	.0309
Steamer	.0289
ZiPS	.0287
Cairo	.0283
PECOTA	.0278
Marcel	.0234

Marcel is going to have fewer “big misses” than Oliver will, so we’ll want to look at both RMSE (which will punish risky guesses) and Correlation (which will reward better player rankings), as well as average absolute error (which will fall somewhere in between in terms of punishing and ignoring risky projections).

Here is the RMSE table, weighted by PA, and only including guys with at least 200 PA. As you see, PECOTA, a relatively safe projection comes out ahead, even further ahead than Marcel which is even safer. I’ll also include a row for “last year’s stats” to see how predict they are.

Projection	RMSE
PECOTA	.0317
ZiPS	.0318
Oliver	.0321
Steamer	.0322
Marcel	.0330
Cairo	.0333
Last Year’s Stats	.0388

Oliver fared pretty well, despite its risky nature. It takes a step forward when you look at absolute average error.

Absolute average error and root mean square error are differing in terms of how much they punish bad performance. Take System A that misses on Player X by 20 points of wOBA and misses on Player Y by the same amount. Take System B that guesses Player X exactly but misses on Player Y by 30 points. Average absolute error will favor System B, but RMSE will favor System A.

Projection	AAE
ZiPS	.0244
Steamer	.0247
Oliver	.0247
PECOTA	.0248
Marcel	.0257
Cairo	.0264
Last Year	.0303

ZiPS is the champion of AAE, with its somewhat risky projections. They may be wrong by more when they’re wrong, but they’re right more often.

If we then jump forward and look at correlation, we get a whole new winner. Correlation is going to be different because all correlation cares about is rankings for the most part. If you projected Ryan Braun to have a .530 wOBA and Adrian Beltre to have a .430 wOBA, you would have had a great projection year using correlations, despite the fact that Braun’s wOBA was closer to .430 and Beltre’s was closer to .380. Correlation just wants you to rank the guys well. Using correlation, we get the following rankings.

Projection	Correl.
Oliver	.6151
ZiPS	.6139
PECOTA	.6136
Steamer	.6039
Cairo	.5685
Marcel	.5614
Last Year	.4740

Oliver comes out in front if you use correlation. Despite having perhaps overly aggressive estimates of talent level, scaling back your Oliver projections might have been the best way to predict hitters.

PITCHERS

What about pitchers? Well, the leaderboard will look quite different there. Following some of my previous work, I include some ERA Estimators among pitching projections. This time, I’ll convert them into projections by regressing ERA in 2011 against 2010 and 2009 versions of this ERA estimators. This produced the following formulas:

SIERA_proj = .59*SIERA(’10) + .26*SIERA(’09) + 0.47
xFIP_proj = .65*xFIP(’10) + .24*xFIP(’09) + 0.29
FIP_proj = .43*FIP(’10) + .30*FIP(’09) + 0.94
tERA_proj = .38*tERA(’10) + .29*tERA(’09) + 1.08

The projections now have the following standard deviations of ERA among all pitchers with 40 IP in 2011:

Projection	StDev of ERA
ZiPS	.7322
PECOTA	.7238
Oliver	.6356
Cairo	.5314
Steamer	.5207
Marcel	.4453
SIERA_proj	.4188
xFIP_proj	.3854
FIP_proj	.3829
tERA_proj	.3807

Starting off with RMSE—which should punish riskier projections, we see that it does exactly that:

Projection	RMSE
Steamer	.8324
Cairo	.8736
SIERA_proj	.8746
xFIP_proj	.9014
FIP_proj	.9033
tERA_proj	.9050
Marcel	.9066
PECOTA	1.024
ZiPS	1.030
Oliver	1.042
Last Year’s Stats	1.282

ZiPS, PECOTA, and Oliver all had the riskier projections and all fared the worse. Interestingly, despite being more risky than scaled back ERA estimators, Steamer and Cairo outperformed them at RMSE.

What about average absolute error? The rankings look similar, though a few projections swap places.

Projection	AAE
Steamer	.7067
SIERA_proj	.7281
Cairo	.7331
FIP_proj	.7333
xFIP_proj	.7360
tERA_proj	.7361
Marcel	.7474
ZiPS	.7749
PECOTA	.7905
Oliver	.8009
Last Year’s Stats	.8766

Steamer again comes out ahead. Moving to correlation, we see the same type of thing, though surprisingly, Marcel does better and Oliver does worse with correlation, despite its punishment of conservative projections.

Projection	Correl.
Steamer	.4581
Cairo	.4213
SIERA_proj	.4089
xFIP_proj	.3763
Marcel	.3744
FIP_proj	.3739
tERA_proj	.3715
PECOTA	.3705
ZiPS	.3701
Last Year’s Stats	.3265
Oliver	.3163

But on all three, Steamer comes out ahead. I asked Jared Cross what was making his projections so good, and he explained that he was using velocity (as well as handedness) in his pitcher projections, and that was giving them a leg up. He wasn’t the only person to suggest doing something like this. I only started thinking seriously about it recently, but I think it really is the “next big thing” in pitcher projections. Unlike hitter projections which seem to come down to which metric you want to use to test them, pitcher projections come back Steamer in all three tests. Perhaps more interestingly, the better-known projections such as Oliver, PECOTA, and ZiPS, despite doing the best on hitters, they fare the worst with pitchers. Perhaps being good at projecting both pitchers and hitters is as rare as being good at doing both of them.

Of course, these are all just one-year tests, so there is a lot of luck involved for any of these. However, as each of these systems moves forward to their next race, this is where they stand.

48 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Daniel

13 years ago

Adam Dunn hitting 50 homeruns in Comerica would be a feat indeed since he will only play 9 games there this season.

Ronin

Reply to Daniel

This is the year he will hit 50 in Comerica! And about 30 in the other 150 games.

MikeS

But if he does, I’m betting the White Sox win a lot of those games in Detroit.

Newcomer

That claim hinges on the expectation that Detroit can’t resist trading for a DH to fill their “hole” there.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG