The Humility of Statistical Projection

It’s “projection week” here at FanGraphs, which is a nice coincidence, since I was going to post about projections, anyway. While I dabble with my own projections (which probably will never see the light of day), no one wants to hear about that. Instead, I’ve just assembled some (very) non-technical reminders that might be helpful when looking at projections.

I’ve often heard the complaint that projections are “arrogant,” “put too much faith in the numbers,” or the classic “they rely on what a player has already done, but they don’t tell you want a player will do.” I want to emphasize that projection systems are not based on esoteric “tricks,” but rather are based on the fact that we don’t know very much about the player from the numbers.

Projection is not divination. I’ve sometimes heard that projection systems aren’t worth looking at because “after all, they projected an .800 OPS for player x and he ended up with an .850 OPS.” That’s a straw man, but it gets at the general point: projections are not prophetic divinations of the future, but attempts to measure what the “true talent” of players at any given point in time. The “general formula” for player performance is: true talent + luck + environment. (I’ll table discussion of parks and aging for now.)

The problem is that we don’t know, at least from the raw stats, what exactly is “luck” and what represents a player’s “true talent.” Moreover, “luck” doesn’t just mean things like BABIP rates. Even a player getting 700 PA in a season will have varying levels of performance around his true talent, what we call “hot streaks” or “cold streaks.” (Cf. Willie Bloomquist, April 2009.) To single these streaks out begs the question: how do we distinguish the “streaks” from the “true talent” parts of the seasons from which the projections draw? Projection systems use different methods; here I’ll mention basic factors that are used by most good projection systems. This may be old hat, but they are worth discussing because of how often they are passed over.

Regression to the mean. This is a very important concept, so important that I’m leery of screwing up the explanation. The best introductory piece I’ve read is one by Dave Studeman. In short: given a lack of any other information about a player, our “best guess” is that he’s an average member of (some particular) population. The more data we have on the player, the more we can separate him from the “average” population. This is one place where sample size issues come into play. [Note that there is a great deal of debate about how to regress, e.g., what the “population” should be. For examples, search at The Book Blog or Baseball Think Factory.]

Weighted average. Say a projection involves the last three years of performance. Do you simply take the three year average? Well, no, true talent can change from year to year. More recent years are thus weighted more heavily (5-4-3 for hitters and 5-3-2 for pitchers are common weights). Alex Gordon had a .321 major-league wOBA in 2009, and a .344 in 2008. Do we automatically assume that .321 is closer to his true talent? No, because the .321 was in only 189 PA, while the .344 was in 571 PA.

This isn’t all there is to projection, but you’d be surprised how much work those basic concepts do. Tom Tango’s Marcel works entirely from a weighted average, regression, and a very basic age adjustment, and it hangs in with the “big boys” pretty well. No projection system will ever be perfect, of course. Part of that is the influence of “luck” and the limited samples we have from all players. Part of it is also that some players don’t have that much information available on them. Players develop differently.

The point is that we simply don’t know ahead of time which players will be exceptions. Projection systems generally do better when looking at how the project groups of players, rather than focusing in on individual successes or failures, as in the case of Matt Wieters (ahem). The point I’ve been trying to make in a roundabout way is that regression, weighted averages, generic aging curves, etc. might miss out on certain players, but are based on studies that show how most players would do. They are humble confessions of ignorance on an individual level, but are still the best overall bet. Expecting anything more leads to folly.

One might express the difference as that between a making a conservative, diversified investment and “just knowing” that Enron stock will continue to rise. Tough choice.

More later this week on “breakouts,” “outliers,” and other traps.

Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.

Newest Most Voted
Inline Feedbacks
View all comments
13 years ago

Don’t forget that when dealing with continuous probability distributions, such as projecting a player’s worth, the probability of any individual point being correct is 0.

Not to mention younger players will obviously have a bigger standard deviation in their projections.

And of course, if each individual player has a 1 in 10 chance of hitting their 90th percentile mark or higher (obvious statement alert), then 1 out of 10 guys will do just that without any special circumstance causing it (such as increased playing time due to injuries, or constant favorable matchups, etc). Just because Marco Scutaro came out of nowhere to hit for a .351 wOBA doesn’t mean the system’s broken. That’s statistics.

13 years ago
Reply to  Matt Klaassen

It’s just obvious that people do not understand the point of using statistics and metrics to project.

They think when guys try to use numbers, that said number-users are creating a certainty in their mind. Untrue. They’re trying to maximize the likelyhood for a good result.

In 2009, 284 players received enough playing time for 300+ PA’s. Most guys will give you relatively close to what you expect (A.J. Pierzynski, for instance). Some will have inexpicable bad years offensively (Rollins). Some slam down their highest expectations, like Mike Bourn.

13 years ago
Reply to  JoeR43

Or, in the wonderful words of MGL

“Let’s say that we had a sophisticated device for measuring the result of a coin flip. Let’s call it, the “looking at the coin lying on the floor” device. OK, we flip a coin 50 times and it comes up 28 heads and 22 tails. No big deal, right? Now we flip it again 50 times and it comes up 23 heads and 27 tails. Oh my God, there must be something wrong with our measuring device!

Get my point? I hope so. “