Projection vs Projection
It’s almost opening day, and it seems like everyone is talking about projections.
When considering a projection, there are really two questions to be answered – what is the player’s “True Talent Level” right now, and how will he perform next year? Between now and the end of next year, his talent level very well might change, as he’s a year older and might recover from or succumb to injuries. Even then, there’s still the random variance of a single season performance. In this article I’d like to explore how some of the major projection systems work when predicting different subgroups of players.
I tested the following projections: PECOTA (2006-2009), ZiPS (2006-2009) CHONE (2007-2009) and my own Oliver (2006-2009).
By wOBA
The first test was to group the yearly projections to the nearest .010 of wOBA, and then see how that group of players actually performed. There were 468 players who had projections from all four systems, and had at least 350 plate appearances in the major leagues in the following season. As 2009 is yet to be played, and CHONE is not available for 2006, these projections to next year comparisons are for the 2007 and 2008 seasons. All four projections were tested on the same 468 players. The observed results were unadjusted major league stats, so that the results of the test would not be influenced by which park factors or MLE formulas I chose to normalize stats.
To read the results, CHONE of the players would have a wOBA between .375 and .385, averaging .380, 25 of them had 350 or more PAs in MLB in the following seasons, and those 25 players had an average wOBA of .363, so at that level CHONE was .017 high. Oliver was .008 high on 21 projections, PECOTA .027 high on 26, and ZiPS .014 on 26. The last line of the table shows the root mean square error (weighted by number of players). Oliver had the lowest mean error at .006, followed by CHONE .011 and PECOTA and ZiPS at .012 each.
wOBA | CHONE | Oliver | PECOTA | ZiPS | ||||
Obs | Players | Error | Players | Error | Players | Error | Players | Error |
0.250 | 0 | 0.000 | 0 | 0.000 | 0 | 0.000 | 1 | -0.067 |
0.260 | 0 | 0.000 | 0 | 0.000 | 1 | -0.041 | 1 | -0.018 |
0.270 | 2 | -0.057 | 1 | 0.001 | 3 | -0.013 | 1 | -0.043 |
0.280 | 2 | -0.018 | 4 | -0.036 | 2 | -0.045 | 4 | -0.022 |
0.290 | 8 | -0.033 | 9 | -0.017 | 11 | -0.030 | 13 | -0.020 |
0.300 | 14 | -0.005 | 23 | -0.010 | 20 | -0.013 | 20 | -0.012 |
0.310 | 29 | -0.006 | 33 | -0.002 | 31 | -0.007 | 19 | 0.003 |
0.320 | 44 | -0.005 | 53 | -0.005 | 37 | 0.002 | 51 | 0.000 |
0.330 | 74 | 0.004 | 81 | -0.002 | 58 | 0.003 | 56 | 0.000 |
0.340 | 91 | 0.000 | 87 | -0.003 | 66 | 0.004 | 66 | 0.002 |
0.350 | 57 | 0.004 | 68 | 0.001 | 80 | -0.004 | 74 | 0.001 |
0.360 | 50 | 0.009 | 48 | -0.003 | 56 | 0.011 | 55 | 0.012 |
0.370 | 34 | 0.011 | 21 | -0.004 | 33 | 0.012 | 36 | 0.012 |
0.380 | 25 | 0.017 | 21 | 0.008 | 26 | 0.027 | 26 | 0.014 |
0.390 | 9 | 0.003 | 10 | -0.002 | 17 | 0.014 | 19 | 0.020 |
0.400 | 13 | 0.019 | 5 | 0.020 | 15 | 0.019 | 7 | 0.017 |
0.410 | 7 | 0.017 | 2 | 0.011 | 5 | 0.017 | 4 | 0.019 |
0.420 | 4 | 0.037 | 1 | -0.049 | 5 | 0.027 | 6 | 0.029 |
0.430 | 2 | 0.047 | 1 | 0.001 | 1 | -0.035 | 5 | 0.041 |
0.440 | 2 | -0.009 | 0 | 0.000 | 1 | 0.018 | 3 | 0.023 |
0.450 | 1 | 0.025 | 0 | 0.000 | 0 | 0.000 | 1 | 0.026 |
rms | 468 | 0.011 | 468 | 0.006 | 468 | 0.012 | 468 | 0.012 |
By Age
The same 468 players, same rules, but now the players are grouped by age. The combined rms error is about the same for all, at .007 for Oliver and .008 for the other three. CHONE and ZiPS are a few points of wOBA high for most ages. Oliver under projects younger (pre-peak) players at .005-.010 points of wOBA, and over projects older players about the same amount. PECOTA is the opposite, being a little high for the younger players and a little low for the older ones. Oliver shows the lowest total error (bias) of -.002, but because of it’s error correlating with age, Oliver shows the highest r2 correlation factor of .206 (for ages 21-35, which have 12 or more players each).
Age | Players | PA | CHONE | Oliver | PECOTA | ZiPS |
19 | 1 | 411 | -0.004 | -0.007 | 0.016 | -0.022 |
20 | 3 | 1485 | 0.017 | 0.022 | 0.026 | 0.014 |
21 | 12 | 6587 | 0.003 | -0.002 | 0.005 | 0.006 |
22 | 23 | 12205 | 0.002 | -0.006 | 0.011 | 0.004 |
23 | 38 | 21423 | 0.001 | -0.009 | 0.001 | 0.001 |
24 | 36 | 20677 | -0.002 | -0.009 | 0.002 | 0.002 |
25 | 37 | 20538 | 0.000 | -0.006 | 0.001 | 0.002 |
26 | 39 | 21891 | 0.011 | 0.005 | 0.014 | 0.016 |
27 | 44 | 23580 | 0.003 | -0.004 | 0.007 | 0.007 |
28 | 35 | 19038 | -0.010 | -0.011 | -0.008 | -0.005 |
29 | 34 | 17434 | 0.010 | 0.001 | 0.006 | 0.008 |
30 | 32 | 18491 | 0.007 | -0.006 | 0.003 | 0.004 |
31 | 37 | 19013 | 0.020 | 0.008 | 0.011 | 0.015 |
32 | 24 | 13975 | 0.002 | -0.004 | -0.004 | 0.000 |
33 | 18 | 9702 | 0.003 | 0.004 | -0.001 | 0.003 |
34 | 17 | 8545 | 0.005 | 0.004 | 0.012 | 0.014 |
35 | 13 | 7063 | -0.001 | -0.003 | -0.003 | -0.002 |
36 | 7 | 3714 | 0.000 | 0.001 | -0.004 | 0.001 |
37 | 5 | 2295 | 0.007 | 0.010 | -0.011 | 0.005 |
38 | 5 | 2580 | -0.010 | 0.009 | 0.008 | 0.000 |
39 | 6 | 2699 | 0.009 | 0.008 | 0.023 | 0.007 |
40 | 1 | 548 | 0.026 | 0.031 | 0.037 | 0.036 |
41 | 1 | 434 | 0.016 | -0.061 | -0.005 | 0.011 |
rms | 468 | 254328 | 0.008 | 0.007 | 0.008 | 0.008 |
bias | 468 | 254328 | 0.004 | -0.002 | 0.004 | 0.005 |
r2 | 468 | 254328 | 0.031 | 0.206 | 0.037 | 0.033 |
In the final part of this series, I’ll look at how minor league performances are evaluated.
Brian got his start in amateur baseball, as the statistician for his local college summer league in Johnstown, Pa, which also hosts the annual All-American Amateur Baseball Association. A longtime APBA and Strat-o-Matic player, he still tends to look at everything as a simulation. He has also written for StatSpeak and SeamHeads You can contact him at brian.cartwright2@verizon.net
Just a few suggestions:
I think two other “projection” systems should always be added to these “projection surveys.” The first is just a mean of each players last three years. This would serve as a baseline test. Secondly, Marcels should be included, given it’s simplicity. My feeling on this one is that if you can’t do better than Marcels by a reasonable degree then you really need to evaluate whether it is worth your time to do the projection.
Finally, though I understand this is much harder, I like the look you are taking on your age correlation. Since all of these systems are very likely to be close in overall accuracy, the most interesting and meaningful factor is bias. You should consider breaking out more categories – a recent (if dense) study (linked to from and discussed on The Book Blog) indicated PECOTA likely has a bias overvaluing speed. I’d like to see a lot more done in evaluating biases of projection systems as this sort of thing could lead to an understanding of some real effect in baseball.