The Year in Projecting

Hello. There is still regular-season baseball taking place, even literally right now, but I am an impatient person. So, here’s a post about the year in projections, even though the year isn’t finished. The year is basically finished, and that’s good enough for me. We could re-visit in a month, but I don’t know if I’ll see the point.

You know, at least anecdotally, that it hasn’t been the best year for team projections. We had the Rangers projected as one of the worst teams in baseball, and they’re currently leading their division. We had the Twins projected as one of the worst teams in baseball, and they’re still alive in the wild-card race. We had the Royals projected as just about a .500 team, and they’ve got the best record in the American League. Then there are the Mariners, and the Red Sox, and so on and so forth. It seems like it hasn’t been a banner season for the numbers.

We can do something with that. We can compare, firstly, actual winning percentage against projected winning percentage from just before the season. Do that, and you get a simple R^2 value of 0.27. On its own, that seems like a fairly low relationship, but it’s better to try to put this in context. Thankfully, I have some team-projection data stretching back to 2005, and while many of those projections came from other sources, ultimately all projection systems work similarly so we can use this. Here’s how the projections have done over time, comparing actual performance to expected performance:


On that graph, you see a couple peaks, representing particularly unremarkable regular seasons. This season is nowhere near the peaks. As a reminder, this year’s value stands at 0.27. The average over the whole window is about 0.36. So, in that sense, this has been a bad year for the projections, indeed. They’ve under-performed their usual baseline.

That much, you all probably could’ve guessed. But now let’s look at this again, only in place of actual winning percentage, let’s substitute BaseRuns winning percentage. To be brief, we’re under no delusion that we perfectly understand everything about baseball, but to this point no one’s shown a consistent ability to over-perform its BaseRuns record. (Or the opposite.) It seems like mostly randomness, and randomness can’t be predicted, by definition. We shouldn’t expect projections to capture all the noise. Here’s a different version of the above graph:


It’s almost a perfect mirror of the first graph, but for 2015. As noted, between actual and projected winning percentage, there’s an R^2 of 0.27. But between BaseRuns and projected winning percentage, there’s an R^2 of 0.40, yielding the biggest difference between the two observed. For the above graph, the overall average since 2005 is 0.39, so in that sense this has been a totally normal year for the team projections. They’ve done as well as usual on the BaseRuns. They’ve done worse than usual on actual, meaningful, real-world record.

So that gets to this recent post by Dave. BaseRuns has proven to be a pretty good estimate of performance. Teams, historically, haven’t strayed very far from their BaseRuns records. This year, relatively speaking, has been nuts. Consider the following graph, tracking the average differences between actual records and BaseRuns records. This is expressed below in terms of winning percentage.


Used to be, the average difference was a hair above three wins. Topped out at 3.8, in 2009. This year, the average difference is on track to be 5.5 wins. Not only is that the biggest; it’s the biggest by a huge, huge amount. As Dave said, this really is the year that BaseRuns hasn’t been up to task. It hasn’t been nearly as accurate as before.

That doesn’t mean the equation is no longer applicable — this isn’t proof that BaseRuns is broken. Weird things do happen, and sometimes they clump together. Before this year, BaseRuns did the worst in 2009. Then it did the best in 2010. This doesn’t have to be the beginning of something, and even if you wanted to believe in a trend, there’s not much evidence of anything going on before this year. At this point, it’s just a blip. An interesting blip, a blip to monitor moving forward, but you shouldn’t get ahead of yourself. Baseball usually doesn’t make the math obsolete overnight.

But maybe teams are learning about clustering. Maybe they’re way ahead of the rest of us. That’s why this is a thing to pay attention to. Next year’s numbers will be revealing. It’ll be a long wait.

Switching gears real quick, I figured I might as well put the following in this same post. We knew in the first few months this year’s projections weren’t looking great, especially in the American League. But, have things gotten more normal over time? I decided to split at July 31, right after the trade deadline. Things we knew on July 31: actual team record, projected team record, and projected team record based on season-to-date statistics. Thing we know now: team performance since the beginning of August. This is mostly just for fun, but let’s look at some plots.

First, team performance since August 1 against team record at the deadline:


All right, there’s some relationship, but it’s obviously noisy. Now, team performance since August 1 against projected team record at the deadline:


What do you know — that’s much stronger. Still very obviously far from perfect, but it beats the hell out of the first plot. Anecdotally, some wins for the late-season projections: the Mariners, the Red Sox, the Dodgers, and the Indians. The projections never gave up on these teams, and lately they’ve played more like they were supposed to.

Not that that excuses the first few months. That’s why I said this is mostly for fun. Lastly, team performance since August 1 against projected team record at the deadline, based on season-to-date statistics. If I’m not mistaken, this would fold in various trade acquisitions.


It sucks! It’s a sample of one season, but it sucks. If I run a multi-variable regression, this tells us basically nothing, and actual team record tells us only a fifth as much as projected team record. Because we’re looking at only one year, there’s not a whole lot we can conclude based on this, but consider it a little more evidence that you can never afford to just abandon what the projections are saying. That’s an over-statement. This is sports. You can totally afford to abandon the projections. Just, expect to be wrong, if being wrong matters to you.

Overall, the projections contain good information. They do to some extent tell the future, and while this year the projections haven’t been great at projecting record, they’ve been much better with BaseRuns, and deviations between BaseRuns and record seem at this point to be almost entirely random. That’s going to be a thing to watch, because it’s possible that baseball teams are figuring out how to sequence, or something along those lines. The industry is always ahead of the non-industry, and this would be a big area of study. Lots of teams would be interested in figuring out how to beat the expectations. But you don’t want to defer too much to the experts, because they don’t know that much more. This year could absolutely be a blip, and we’ll know more in another 12 months.

Myself, I’m mostly happy with where the projections are. There’s no sign of their getting better, on the team level. Maybe that comes as a surprise, I don’t know. No real progress, over the years. I like how much they’re able to say, and I also like how much uncertainty remains. There must exist some sort of sweet spot, where we can know just enough without knowing too much. I have to think that we’re in it. I imagine we’ll stay in it for the foreseeable future.

Jeff made Lookout Landing a thing, but he does not still write there about the Mariners. He does write here, sometimes about the Mariners, but usually not.

Newest Most Voted
Inline Feedbacks
View all comments
merv thronemerry
7 years ago

The Twins would be closer to the top if they had brought up Berrios a couple of weeks ago…