Nate Silver and Imperfect Modeling

by Dave Cameron

November 7, 2012

If you’re reading FanGraphs, you’re probably familiar with Nate Silver. He’s known nationally now for his political projections at Five Thirty Eight, but of course he made his name on the internet writing about baseball, creating the PECOTA projections, and penning some of the best articles about the economics of baseball written over the last decade.

Even if you’re not a political junkie, it was hard to get away from discussions about Nate Silver over the last few weeks. The final few weeks of the election saw a Nate vs Pundits fight that looked like something straight out of Moneyball. Last night, my Twitter feed probably had more references to Nate Silver than either Barack Obama or Mitt Romney. Needless to say, the performance of his model was a major storyline during last night’s election, especially if you were following the election through the eyes of people who write about baseball for a living.

If you haven’t already heard, Nate’s model did pretty darn well. As in, he got every projection right, going 49 for 49 in states that have projected winners and nailing the fact that Florida was basically a coin flip. But, I’m not writing this post to talk about how Nate Silver is a witch or to bleed political discussion over into yet another area of your life, but instead, I think that there’s an important takeaway from this that applies to baseball and what we do here at FanGraphs: the fact that imperfect models with questionable inputs can still be quite useful.

Nate’s model was similar in structure to many other polling aggregators, including one from Princeton that was even more aggressive with its conclusions. In general, the argument against these models is that the inputs they were using — the polls themselves — were of questionable value and that they were essentially guessing at things like voter turnout based on assumptions that might not hold true anymore.

Even Nate acknowledge the truth in some of these criticisms, as polling data can problematic, and the people collecting the data can have biases that skew the results one way or another. No one thinks polling data is perfect, nor should we think Nate’s model perfectly corrects for these biases because of the results of the voting last night. The “it’s a projection, not a prediction” line cuts both ways – we can’t note that the model could have still been right had the results been different last night while believing that the results prove that the model was clearly right to begin with. The critiques of the model that were true a few days ago are still true today. Criticisms of Nate’s methodology are still valid, as a perfect result in one election does not prove that the model is without flaw.

But, hopefully, we can note that a model does not have to be perfect to be useful, and perhaps we can move away from the idea that imperfect — and even biased — data should be discarded until it can be perfected. In baseball, we deal with a lot of biased data and imperfect models. Colorado is a perfect example. The raw numbers from games a mile high can’t be taken at face value because of the atmosphere, and changes to the environment — such as the introduction of the humidor — make applying park factors to that data a bit of a guessing game. We’ve seen offensive levels in Denver shift back and forth over the years, and we certainly don’t have a perfect way of explaining or accounting for those shifts. If we were to project the 2013 run environment in Coors Field, we’d have to deal with a lot of moving parts, many of which require assumptions that we can’t test, and there’s a decent amount of uncertainty that would surround that projection. But that doesn’t mean we shouldn’t try.

Whether it’s FIP, UZR, ZIPS, the Fan’s Scouting Report, or especially WAR, pretty much every statistical model that we host here on FanGraphs contains some inputs that can be legitimately questioned and requires some assumptions that don’t always hold. These models are imperfect, and the data that goes into them can be biased. But, that doesn’t mean that the alternative of discarding them and just accepting any conclusion as equally valid is an improvement.

That’s essentially where the pundits went wrong with Nate’s model. They didn’t like the conclusions, and some of them raised valid concerns about polling data and whether Nate’s adjustments added or subtracted from simpler, more transparent techniques. But to discard the model entirely was silly, and to pretend like the race was a toss-up was simply wrong. Throwing out the imperfect model with biased data was worse than taking it at face value.

In reality, we shouldn’t do either. The models showed their usefulness last night, but they’re still not perfect, and we shouldn’t just blindly accept every conclusion they spit out in the future. But, we don’t need to discard these models simply because we’ve figured out where their weak points are either. It’s not an either/or situation. We can be informed by imperfect models without being slaves to them.

WAR can inform our opinion of Mike Trout’s value relative to Miguel Cabrera without us turning the MVP Award into the Whoever Has The Highest WAR Award. We can acknowledge the shortcomings of defensive metrics and park factors while also applying the lessons they can teach us in an intelligent way. We can note that FIP doesn’t work very well for Jim Palmer without using that as a reason to keep evaluating pitchers by ERA instead.

Last night was undoubtedly a win for data-based analysis, but let’s be honest, the results don’t always turn out that well. Just as we shouldn’t have discarded Nate’s model had the results been different, we shouldn’t believe his model is perfect because the results did line up with what he projected. His model is still imperfect, but it’s also still useful.

Let’s not let the perfect be the enemy of the good. If we want a takeaway from the Nate Silver vs Pundits argument, let’s note that the pundits went wrong when they discarded his insights because they didn’t like the results and because they assumed the data was too biased to be useful. If a model doesn’t occasionally challenge our preconceived notions of what’s true, it’s not helpful to begin with, and even a model with problematic datasets can still provide useful information that can help inform our decisions.

The takeaway from last night shouldn’t be “always trust Nate Silver” or “always trust the data”. The takeaway should be that even mediocre data is often better than no data, and when you put mediocre data in the hands of smart people who understand its limitations and adjust accordingly, it can become quite useful indeed.

84 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Steve 1

11 years ago

Too bad the mainstream ‘baseball pundits’ won’t read a word of this.

Jack Weiland

Reply to Steve 1

The correct term is “lamestream.”

Gary York

Also it’s too bad that the mainstream “political pundits” won’t read a word of this.

jason B

Reply to Gary York

I think the mainstream punidtry wasn’t questioning the model or the results – they were pretty firmly entrenched in the Obama camp (this isn’t meant to be controversial or inflammatory, most members of the media looked favorably on Obama). But you’re correct in that the right-leaning punditry who should read this and glean some lessons from this whole episode likely won’t (read, or learn).

-1

David

Of ALL the asinine lazy language that’s made it into political discourse over the last decade (and that’s obviously a tremendously large pool in which to fish), “mainstream media” has got to be the stupidest.
In the last week alone, I have read columns by people who are paid to write about politics by the five largest circulation newspapers in the country, as well as columnists/opinion writers/foisters of drivel from papers in three other of the 10 largest cities in the nation assert in various pieces that the very topic they are addressing will not be addressed in the “mainstream media.”
Newsflash… the fact that your sentence was published in a major newspaper of record by definition means you are wrong. And whiny.
[/offtopicrant]

channelclemente

Their Least Coast media’s motto, beware of geeks bearing gifts.

the flumember

Reply to channelclemente

Given that this Fangraphs, shouldn’t that be “bearing GIFs.”

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG