Nate Silver and Imperfect Modeling by Dave Cameron November 7, 2012 If you’re reading FanGraphs, you’re probably familiar with Nate Silver. He’s known nationally now for his political projections at Five Thirty Eight, but of course he made his name on the internet writing about baseball, creating the PECOTA projections, and penning some of the best articles about the economics of baseball written over the last decade. Even if you’re not a political junkie, it was hard to get away from discussions about Nate Silver over the last few weeks. The final few weeks of the election saw a Nate vs Pundits fight that looked like something straight out of Moneyball. Last night, my Twitter feed probably had more references to Nate Silver than either Barack Obama or Mitt Romney. Needless to say, the performance of his model was a major storyline during last night’s election, especially if you were following the election through the eyes of people who write about baseball for a living. If you haven’t already heard, Nate’s model did pretty darn well. As in, he got every projection right, going 49 for 49 in states that have projected winners and nailing the fact that Florida was basically a coin flip. But, I’m not writing this post to talk about how Nate Silver is a witch or to bleed political discussion over into yet another area of your life, but instead, I think that there’s an important takeaway from this that applies to baseball and what we do here at FanGraphs: the fact that imperfect models with questionable inputs can still be quite useful. Nate’s model was similar in structure to many other polling aggregators, including one from Princeton that was even more aggressive with its conclusions. In general, the argument against these models is that the inputs they were using — the polls themselves — were of questionable value and that they were essentially guessing at things like voter turnout based on assumptions that might not hold true anymore. Even Nate acknowledge the truth in some of these criticisms, as polling data can problematic, and the people collecting the data can have biases that skew the results one way or another. No one thinks polling data is perfect, nor should we think Nate’s model perfectly corrects for these biases because of the results of the voting last night. The “it’s a projection, not a prediction” line cuts both ways – we can’t note that the model could have still been right had the results been different last night while believing that the results prove that the model was clearly right to begin with. The critiques of the model that were true a few days ago are still true today. Criticisms of Nate’s methodology are still valid, as a perfect result in one election does not prove that the model is without flaw. But, hopefully, we can note that a model does not have to be perfect to be useful, and perhaps we can move away from the idea that imperfect — and even biased — data should be discarded until it can be perfected. In baseball, we deal with a lot of biased data and imperfect models. Colorado is a perfect example. The raw numbers from games a mile high can’t be taken at face value because of the atmosphere, and changes to the environment — such as the introduction of the humidor — make applying park factors to that data a bit of a guessing game. We’ve seen offensive levels in Denver shift back and forth over the years, and we certainly don’t have a perfect way of explaining or accounting for those shifts. If we were to project the 2013 run environment in Coors Field, we’d have to deal with a lot of moving parts, many of which require assumptions that we can’t test, and there’s a decent amount of uncertainty that would surround that projection. But that doesn’t mean we shouldn’t try. Whether it’s FIP, UZR, ZIPS, the Fan’s Scouting Report, or especially WAR, pretty much every statistical model that we host here on FanGraphs contains some inputs that can be legitimately questioned and requires some assumptions that don’t always hold. These models are imperfect, and the data that goes into them can be biased. But, that doesn’t mean that the alternative of discarding them and just accepting any conclusion as equally valid is an improvement. That’s essentially where the pundits went wrong with Nate’s model. They didn’t like the conclusions, and some of them raised valid concerns about polling data and whether Nate’s adjustments added or subtracted from simpler, more transparent techniques. But to discard the model entirely was silly, and to pretend like the race was a toss-up was simply wrong. Throwing out the imperfect model with biased data was worse than taking it at face value. In reality, we shouldn’t do either. The models showed their usefulness last night, but they’re still not perfect, and we shouldn’t just blindly accept every conclusion they spit out in the future. But, we don’t need to discard these models simply because we’ve figured out where their weak points are either. It’s not an either/or situation. We can be informed by imperfect models without being slaves to them. WAR can inform our opinion of Mike Trout’s value relative to Miguel Cabrera without us turning the MVP Award into the Whoever Has The Highest WAR Award. We can acknowledge the shortcomings of defensive metrics and park factors while also applying the lessons they can teach us in an intelligent way. We can note that FIP doesn’t work very well for Jim Palmer without using that as a reason to keep evaluating pitchers by ERA instead. Last night was undoubtedly a win for data-based analysis, but let’s be honest, the results don’t always turn out that well. Just as we shouldn’t have discarded Nate’s model had the results been different, we shouldn’t believe his model is perfect because the results did line up with what he projected. His model is still imperfect, but it’s also still useful. Let’s not let the perfect be the enemy of the good. If we want a takeaway from the Nate Silver vs Pundits argument, let’s note that the pundits went wrong when they discarded his insights because they didn’t like the results and because they assumed the data was too biased to be useful. If a model doesn’t occasionally challenge our preconceived notions of what’s true, it’s not helpful to begin with, and even a model with problematic datasets can still provide useful information that can help inform our decisions. The takeaway from last night shouldn’t be “always trust Nate Silver” or “always trust the data”. The takeaway should be that even mediocre data is often better than no data, and when you put mediocre data in the hands of smart people who understand its limitations and adjust accordingly, it can become quite useful indeed.