How Good Is That Projection?

March 17, 2009

FanGraphs is now presenting five different player projections systems: Bill James, ZiPS (Dan Szymborski), CHONE (Sean Smith), Marcel (Tom Tango) and Oliver (Brian Cartwright). The natural question from you the readers is “Just how good is this projection?”

First, we need to understand the importance of sample size. Season statistics are just a sample of a player’s true talent. You might catch a player during a hot or cold streak, and without enough data, be misled into forming an incorrect perception of the player. On one hand, the more data, the better. On the other, we’re trying to capture a moving target. By the time we get a sufficient sample, the player’s true ability might well have changed.

My first test for a sufficient sample size was to compare wOBAs in all consecutive single seasons of unadjusted major league batting stats in my projections database, weighted by the smaller of the two plate appearances. I could presumably get more accurate results by normalizing with park factors, but I did not want to bias these results by using any of my proprietary formulas.

Sample	Players	vOBA
0	94	0.150
50	105	0.117
100	186	0.072
150	331	0.063
200	563	0.058
250	704	0.053
300	519	0.047
350	1044	0.041
400	1114	0.041
450	1122	0.039
500	1063	0.038
550	592	0.034
600	314	0.033
650	185	0.031
700	30	0.028

350 is the minimum necessary for a decent wOBA calculation, with a mean error of .041. This is where the graph begins to flatten, with the reduction in variance between 300 and 350 PAs being almost the same as between 350 and 550. 600 is preferable, even as the error continues to drop at a higher number of PAs. This means that even with 600 plate appearances in both years, any player has a 1 in 3 chance of having a wOBA the next year at least 33 points higher or lower than this year.

Only a few players manage to get 600 PAs in a single season, and only rarely get to 700. In order to get a better sample of the batter’s true talent, more than one season is necessary. The more seasons that are used, the further back into time we reach, increasing the probability that the number we are trying to measure has changed since then. One method to minimize this effect is to give a diminishing weight to past seasons. Tom Tango’s Marcel uses three seasons, weight 5-4-3. Dividing each by 5, so that the most recent year is weighted as 1.0, gives 1.0, 0.8, 0.6. If all three seasons have an equal number of PAs, then last year is 42% of the total, year 2 33%, and year 3 25%. When developing Oliver, I ran tests which showed that when using an unlimited number of years, a factor of 0.7 (1.0, 0.7, 0.49, 0.31, 0.22, etc) minimized the error of each year’s projection compared to the next season’s actual stats. However, with many years last years stats can be as little as 30% of the total sample. I did not feel that this allowed Oliver to be responsive enough to meaningful changes in a player’s yearly stats, and have since lowered my weighting factor of past seasons to 0.5 (1.0, 0.5, 0.25, etc), which puts last year at approximate 50% of the sample.

Comparing season to season projections results in the following mean errors

Sample	Players	vOBA
0	214	0.026
100	620	0.026
200	993	0.024
300	1188	0.022
400	1011	0.021
500	1166	0.020
600	1343	0.017
700	1524	0.016
800	1400	0.015
900	1163	0.014
1000	652	0.014
1100	367	0.014
1200	252	0.014
1300	108	0.012
1400	14	0.010

The year to year errors for the projections are only half that of comparing actual stats, but that is understandable because half of the projection is last year’s stats. The error curve starts flattening at 600 PAs, and after 900 there is virtually no reduction in error.

Presumably, the addition of data from previous years was to give us a more accurate estimate of a player’s “true talent” at this point in time. That is not the same question as what that player’s “true talent” will be at the end of next year, or what next year’s stats will be. The data can be further massaged by taking a player’s upward or downward trends the past few years combined with the average change for a player of his age to estimate where he will be a year from now.

So, to combine these two, I took all Oliver projections with a sample size of 900 or more PAs, and compared them to the following single season of 350 or more PAs. The first set of numbers are copied from the table above, showing the mean error of one single season to the next; the second set is the projections of size 900 or more compared to the next season.

Sample	Players	vOBA	Players	vOBA
350	1044	0.041	68	0.042
400	1114	0.041	101	0.040
450	1122	0.039	117	0.036
500	1063	0.038	166	0.035
550	592	0.034	177	0.033
600	314	0.033	194	0.037
650	185	0.031	171	0.035
700	30	0.028	62	0.038

This seems like a problem – the mean errors of comparing Oliver to the following season were no better than comparing just the previous season to the next! Does this mean that all the work in developing the projection was for nothing? Fortunately, the answer is NO. What we have here is an equation x – y = z. We have worked to reduce the mean error of x in order to reduce z – but y is still the same. The result of a computation is only as accurate as its least accurate input. No matter how perfect any one projection system is, as long as it’s mean error is less than that of comparing any two consecutive seasons, that increased accuracy will be masked by the noise inherent in a single season measurement. That’s why Marcel the Monkey looks so smart – this test cannot show if any other system is better.

Besides, when using only major league statistics, the major tools available to build a projection are past season weighting, park factors, aging, and regression to the mean. I think you could expect any competent system to give virtually the same results. Most projections also incorporate minor league data, which necessitates having to calculate the difference in level of competition from each minor league to the majors, and that’s likely where more variance will be seen between systems.

Next time, comparing accuracies of Oliver, ZiPS, CHONE and PECOTA.

7 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

bikozu

16 years ago

Cool. Some good stuff in here that I haven’t read/seen previously. That factor of .7 thing is pretty interesting too.

When will we get park factored wOBA (the one used on this site that includes baserunning etc) on the projection page? This information could be useful.

Brian Cartwright

Reply to bikozu

Reminds me of one of the points I need to lead the comparative article with – Oliver’s wOBAs are park adjusted, representing the player in a neutral park. Each batting component (SI, DO, TR, HR, BB, SO) is normalized by ballpark, league and age, then reassembled into a batting line, then finally calculate the new rate stats, including wOBA. To the best of my knowledge, ZiPS and CHONE are adjusted to the player’s home park at the time the projection was made. Marcel does not park adjust. Oliver will have lower wOBAs than ZiPS or CHONE for players in Colorado, Milwaukee, Philadelphia, Cincinnatti, etc, but I am also working on a new formula for HR factors which will not dock the premier power hitters as much, but the medium and lower guys more so, when they plau in a small park.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG