The White Sox and Beating Projections by Dave Cameron February 18, 2013 There are a lot of projection systems floating around the nerdy baseball universe. Here on FanGraphs, we host a lot of them, including ZIPS, Steamer, Oliver, Marcel, and the Fans projections, and then there’s other systems like CAIRO and PECOTA that are hosted elsewhere. Of all the baseball projection systems, PECOTA is probably the most famous because it was created by Nate Silver, and Nate Silver is now pretty famous for his post-baseball career. So, when PECOTA releases their annual projections, mainstream writers pay attention. And Chicago writers, particularly, like to talk about PECOTA’s projections, mainly to remind everyone how wrong they’ve been about the White Sox. For instance, here’s a piece by a local radio anchor that trots out all the usual ad hominem attacks about geeks and their numbers. And here’s another one of this year’s entries, which just gives up on factual information completely: What is it about the White Sox’s rosters and farm system that Baseball Prospectus doesn’t like? To answer that question, I decided to do research on who writes these inaccuracies year after year. What I found shocked and disturbed me. It’s Nate Silver. My whole world of reality collapsed at that moment. How could it be the guy I religiously read for pinpoint accuracy in politics? How could it be that Silver is an accuracy genius in politics, but yet when it comes to the White Sox he transforms into the accuracy of a Republican pollster? After composing myself, I discovered a possible reason. Silver lived in Chicago for many years near Wrigleyville and is rumored to be a Cubs fan. Maybe being a Cubs fan is a weighted bias even Silver’s methodology can’t overcome. I’m not here to defend PECOTA — BP can do that if they’d like — but I will just insert some facts into the discussion. Like, for instance, that Silver grew up in Michigan as a Tigers fan, not a Cubs fan. Or, that Silver hasn’t been in charge of the system since 2009, and the code has been essentially rewritten since he left. And, of course, it would be remarkably silly for any forecaster to create a system that intentionally downgrades the projections of a specific franchise, since that would simply make the system less accurate and hurt his own credibility. The idea that PECOTA has some kind of anti-White Sox bias because Silver went to the University of Chicago and attended some Cubs games is worthy of the tin foil hat brigade. That said, I do think it’s interesting that the White Sox have regularly outperformed PECOTA’s expectations, and I think it’s worth actually investigating, as opposed to what Michael Tomaso did. So, let’s investigate the White Sox overall performance since 2005. Thanks to this helpful link from Mark Gonzalez of the Chicago Tribune, we can see PECOTA’s projections for the White Sox each of the last eight years next to their actual record for that season. Year Projected Wins Projected Losses Actual Wins Actual Losses Difference in Wins 2005 80 82 99 63 19 2006 82 80 90 72 8 2007 73 89 72 90 -1 2008 77 85 89 74 12 2009 73 89 79 83 6 2010 79 83 88 74 9 2011 82 80 79 83 -3 2012 78 84 85 77 7 Total 624 672 681 616 57 The White Sox have won 52.5% of their games over the last eight years, while PECOTA projected them to win 48.1% of those games during the same time period. On a per season basis, that works out to a seven win difference, and beating your projections by seven wins per year for an eight stretch is pretty impressive. The question is why they’ve been able to do that. One possible option is that PECOTA’s just not a great forecasting tool, of course, but that probably shouldn’t be the conclusion that we default to simply based on one franchise’s results. It’s also possible that the White Sox have just done something over the last eight years that is hard to forecast ahead of time, and these are things we can actually test and potentially identify. So, let’s just start with the White Sox win-loss record, and whether or not it matches up with their runs scored and runs allowed. We’ve already noted that the Pale Hose have won at a .525 clip over the last eight years, but has their run differential backed up that kind of record? During their last 1,297 games, they’ve scored 5,991 runs and allowed 5,825 runs, or an average of 4.62 RPG and 4.49 RAPG. Just from the fact that they’ve outscored their opponents, we can already see that they’ve outplayed their projections, and the entire difference hasn’t been due to the timing of when those runs scored. However, it’s still worth noting the magnitude of the difference, so we can put those RS/RA totals into the pythagenpat calculation and see that those RS/RA totals correspond to a .513 expected winning percentage. The gap between playing at a .513 and .525 level isn’t enormous, but over nearly 1,300 games, it adds up to an extra 16 wins. In other words, of the seven win per year difference that we started with, two of those (29%) can simply be explained by the timing of run scoring. There’s simply no real way for any forecasting tool — mathematical or gut-based — to know in advance the distribution of how runs are going to be divvied up throughout the year, but that distribution can have a big impact on a team’s final record. Even if there was an absolutely perfect forecast for the White Sox over the last eight years, it would have missed by two wins per season simply due to fact that run distribution is outside of the realm of forecasting. Projecting aggregate totals with some degree of success is possible, but no one is capable of knowing whether a team is going to score exactly five runs each game or whether they’re going to alternate 10 run games with shutouts. We know what’s more likely, based on historical data and normal distribution, and the best anyone can do is to assume that a team will score and allow runs in something like a normal distribution going forward. So, we’ve basically explained 30% of the White Sox variation from their PECOTA projections. What could explain the other 70%? Don Cooper and the White Sox training staff is probably a large chunk of that. Last year, Jeff Zimmerman had a really great post on 10 year DL trends, and I’m going to steal two images from that post and put them here: The overall health of the White Sox during the last decade has been pretty staggering. Look specifically at the blue pitcher injury bars. From 2002 to 2011, the White Sox pitchers lost fewer than 2,000 days to the DL, while most teams were over 3,000, a lot of teams were over 4,000, and the Rangers were up over 6,000. The White Sox had a remarkable run of pitcher health, and as new GM Rick Hahn told a group of FG readers and authors in Phoenix a few years ago, the organization views Cooper and the training staff as one of the main reasons the team has been competitive during this stretch. Team forecasts are essentially a collection of individual forecasts realigned to account for expected playing time levels. Because specific pitcher injuries are hard to predict, forecasting systems rely on a player’s own personal track record and normal regression to the mean, which accounts for the fact that there is a chance each player will get injured and miss a chunk of time during the season. The White Sox pitchers have continually spent a fraction of the time on the DL that any forecasting system would have projected, and so the team’s innings have been reallocated from replacement level scrubs to the team’s highest quality arms. As a result, the White Sox have had the best pitching staff in baseball since 2005, coming in with both an ERA- and FIP- of 93 during that time. They may often get overlooked because of the hitter’s haven they play in, but Chicago has consistently put together results that were better than many high profile staffs, even if they did it with depth and endurance rather than splashy aces. Chicago has only had 14 pitchers throw at least 100 innings as a starter for them over the last eight years, and of those 14, only three — Orlando Hernandez, Clayton Richard, and Philip Humber — could be described as below average Major League starters during their time in Chicago. That’s remarkable. Even other teams that have focused heavily on pitching during this run have ended up giving long runs to lousy pitchers, simply due to the fact that pitchers break down, and teams either live through terrible performances trying to get them fixed or have terrible replacements come up from the minors. The White Sox simply haven’t had that problem. They might not have had a rotation fronted by Roy Halladay or Cliff Lee, but they also didn’t give 265 innings to Adam Eaton and his 136 ERA-. While most of the analysis about a pitching rotation’s strength focuses on how good the first few starters are, the contributions of the guys at the back end can make a huge difference as well. And, no team has gotten more value from their back-end starters than the White Sox, primarily because they’ve been able to keep them healthy and avoid the roller coaster of minor league fill-ins that most teams inevitably have to endure. Trying to quantify exactly what the difference in wins that Cooper and Herm Schneider have meant to the franchise is a more difficult task than looking at the distribution of runs, but just for fun, let’s add an extra 200 days per year on the DL for the White Sox pitching staff, which would bring them to that 2002-2011 league norm that Zimmerman demonstrated last year. There’s roughly 185 days in a Major League season, so we’re basically talking about the difference of not losing one pitcher for a full year, or about 180 to 200 innings pitched. The average White Sox pitcher during the 2005-2012 timeframe has averaged +2.9 WAR per 180 innings. If we assume that another pitching coach/training staff would have lost that pitcher to injury each season, and the White Sox would have had to replace that pitcher with a collection of replacement level arms, we’ve now accounted for an additional 43% of the White Sox variation from their PECOTA projected records. There’s some assumptions in there that might not be true, but it seems pretty clear that the White Sox track record of pitcher health has added something on the order of several wins per season to the team’s record during the time we’re discussing. Maybe it’s two wins instead of three. I’m not going to argue for this being a precise calculation, but it’s clearly a major factor, and probably an even larger factor than the run distribution we talked about earlier. Between pitcher health and run distributions, we can probably assume that those two factors account for 50-75% of the team’s variation from the PECOTA projections over the last eight years. One of the two factors is basically not forecastable, while the other is something that the White Sox absolutely deserve a lot of credit for, and it should be noted that this appears to be a sustainable advantage for the organization. If you think the White Sox projections from any system are too low, pointing to Cooper and Schneider’s success at keeping pitchers healthy is almost certainly your best answer to why the team may continue to outperform those projections once again. But also, please just keep in mind that projections are not predictions. They are a snapshot of what we think a team’s median true talent level might be, and it should be understood that there’s a pretty sizable margin for error based on things that projection systems simply can’t forecast, and also the errors that come from having imperfect information or imperfect calculations. The standard deviation in wins from the good forecasting systems are somewhere in the range of eight wins, meaning that a team forecast for 77 wins could reasonably be expected to win anywhere from 69 to 85 games without it saying anything about the model having a breakdown. There are simply too many uncontrollable and unpredictable variables to get so precise with our preseason forecasts, and simply because of how bell curves work, there are always going to be teams that fall at the tail end of the distribution, making the forecasts look wildly wrong in retrospect. Most years, there’s one team that beats its forecast by 15 wins, sometimes even 20. Last year, we had two, with Baltimore and Oakland winning far more than anyone expected. Pretty much any team forecast for north of 70 wins has some chance at making the playoffs if the stars align. A 77 win forecast for the White Sox — from any projection system — shouldn’t be taken as a death knell for their season, especially with what we know about their apparent ability to keep their pitchers healthy. But, it’s still worth knowing the consensus projections for the White Sox this year have them as something like the 10th best team in the American League. That doesn’t mean that they can’t win, but it does mean that it’s less likely that they’re going to make the playoffs than most of their competitors. You don’t need a perfect model in order to extract useful information. Projection systems are not perfect models, and as the White Sox have shown, there are important variables that are not being measured as well as they could be. That said, don’t throw the baby out with the bath water, and don’t assume that the people behind the projections are biased towards your team just because they created a system that spits out a number you don’t like. Instead of going on the attack, try to figure out why the projection system says what it does, and see if there’s a reasonable argument for why it might be missing something. With the White Sox, there is definitely a reasonable argument that projection systems underrate their ability to keep their pitchers healthy. The run distribution thing is probably just randomness, and I wouldn’t advise counting on that going forward, but the White Sox may very well outperform their projections again this year. Instead of decrying the systems as useless, maybe just use that information as reason to throw Don Cooper a parade.