Yesterday in my Fan Projections 2010 recap I reported that fans do a worse job projecting players on their favorite team than other fans do at projecting those players. This is an interesting finding: these fans probably have more information about those players, but in spite of that do a worse job projecting them. Why is that?
Commenters suggested a couple of reasons that I will look at here. Samuel E. Giddins writes:
Any chance it’s a selection bias? Only better players get rated more often, and people are less likely to predict a collapse than a breakout?
Samuel is right that better players get ranked more often, but I am not sure this leads to any selection bias. If you look at the graphs from yesterday you can see that across the spectrum players were over-projected, not just the good ones (the blob is below the red line through out, not just for high-WAR players). That people are more likely to predict a breakout than a collapse is probably a fundamental reason for the over projection, but I am not sure that it is a selection bias.
One possible source of bias: what if forecasters who project a lot of non-favorite team players are simply more accurate forecasters than those who only project players on their favorite team
This is an interesting idea, and could very well be possible. I know that David Appelman has the identity of the individual projectors, but all I have is the composite data. Anyway J. Cross could be right that the pool of hometown projectors (I think hometown is a better way to describe this group than favorite-team, which I was using yesterday) might be very different from the pool of non-hometown projectors. Tango suggests a way of addressing this potential bias.
Along the same lines for any player there were more non-hometown projectors than hometown projectors. For my post I look players with at least 10 hometown projectors. This left 206 players who had, on average, 27 hometown projectors and 51 non-hometown projectors. More projectors could mean more accurate projections.
I wonder if this might have something to do with playing time projections. When I was looking at some of the fangraphs PT projections I noticed that there were bigger differences there than in rate stats, usually.
Perhaps the fans of the team are more “sure” of who is going to start and award them a lot of PAs and that means that they overestimate playing time on average and that is why they are going over on WAR? … I wonder how this would look using WAR/PA or something else which could account for playing time.
A couple other commenters, Newcomer and AustinRHL, also asked about playing time. The idea being you could over project because you assigned too much playing time or because you assigned too high a rate (or both). It turns out the hometown fans do a better job of predicting playing time than others (RMSE of 186 versus 192 and average absolute error of 133 PAs versus 138). Hometown fans predict a lower number of PAs compared to others (5 PAs fewer on average). But they do a worse job of predicting rates, projecting too high wOBA and UZR/150. It looks like the over-projected rates outweighs the better playing time projections for the hometown fans.
So it seems like it is the opposite of what Craig suggests: hometown fans do well at predicting playing time, but are overly optimistic with rates. This suggests that it would be best to take a hybrid projection of playing time from hometown fans (who probably have a better idea of the depth chart) and rate stats from everyone else.
I’d put this down to the simple trait of most (all?) sports fans: optimism. No one wants to see their team’s stars regress, or their prospects not pan out, and will focus more on positives than on negatives.
However, it is very interesting in the context of the argument that fans of a certain team are higher on that team’s players because they know more about their abilities, rather than being biased. With these numbers, it certainly seems that bias is a great influence than any kind of special insight they might have.
My guess is the reason hometown fans come out worse is a combination of this optimism and, maybe, the selection bias in projectors that J. Cross explained.
And for a slightly different topic, Lou Struble asks:
Any chance you’re going to do a post on which team’s fans had the most unrealistic projections from last year? I know it might be difficult because not all the players received enough votes for projections, but it’d be interesting and I’d bet it’d generate a lot of discussion.
Lou is right that there is some much variation in the number of players ranked by at least 10 hometown fans: the Rockies, Marlins and Astros didn’t have any players who reached the cutoff while the Red Sox, Yankees and Mets each had 12 and the Mariners had 17. Here are the 19 teams with at least five players who reached the cutoff ranked by mean error (a measure of over/under projection), but also with the mean absolute error (a measure of accuracy).
|Team||Players||Mean Error||Mean Absolute Error|