The Meaning of the Standings So Far by Jeff Sullivan June 8, 2015 Last week, writing for JABO, I examined a huge difference between the American League and the National League. The NL, for the most part, has played out about as expected, to this point. There aren’t many tremendous surprises. In the AL, though, things have gone insane, with a strongly inverse relationship between team performance and preseason projected team performance. You could say the AL is somewhat upside-down, given what we thought it would be, and that’s fun! That’s weird! Who doesn’t like weird baseball? Toward the bottom, I embedded the following plot of information from 2014: For one season of data, that’s team winning percentage over the first two months, and then team winning percentage over the remaining four months. A year ago, there was hardly any relationship, but I wanted to look at more. That’s what this is about. Here, we’ll examine 10 years, instead of just one. The idea, generally: how predictive is early-season performance? We’re a little past the one-third mark, meaning we’re into the middle bit, which leads up to the stretch run. Now, today is June 8, meaning we’ve had a full slate of games played through June 7. So for simplicity, I examined the standings through and then after June 7, using Baseball-Reference’s standings page, for the past decade. While the actual starting date of the season bounces around, this shouldn’t meaningfully change the numbers. June 7 is always around the same point in the season. Results: There is some relationship. Of course there is; there has to be, because two months of baseball aren’t completely and utterly random. In the broadest possible terms, teams that start well tend to do better than teams that start poorly, because good teams are more likely to start well, and bad teams are more likely to start poorly. This is elementary stuff. At the same time, I don’t know — you might be surprised by how weak the relationship is. I don’t know what correlation you might’ve expected, but that’s a small number, and the slope of the line isn’t impressive either. For every 100-point difference in early winning percentage, you see a corresponding 37-point difference in remaining winning percentage. It’s pretty apparent that, while the early results are meaningful, teams don’t just keep playing like that through the end of the year. There’s a lot of shaking up. Which brings us to the next plot. In the past, I’ve dug and recovered preseason team projections stretching back to 2005. They’ve come from a few different sources, and they’ve evolved over time, but projections haven’t changed that much, and they’ve always been based around expected team depth charts. So how meaningful are those preseason team projections, compared against early-season performance? This plot is a lot like the plot above, only instead of early-season winning percentage on the x-axis, we have projected team winning percentage. Then you’ve got actual winning percentage over the rest of the season, after June 7. Just in case it’s not clear, I want to emphasize this: here we’re seeing a comparison of team performance against preseason projections, not updated in-season projections, like the ones we have available today. Those factor in performance changes, injuries, and transactions. So. Results! Maybe you’re surprised, and maybe you’re not. There’s a much stronger relationship between team performance after June 7 and preseason projected team performance, even though those preseason projections are missing 2+ months of information. You can see the r-squared for yourself. And there’s also the slope. The slope of the first plot: 0.37. The slope here: 0.93. For every 100-point difference on the x-axis, there’s a 93-point difference on the y-axis. Those projections have held up well. Obviously, it’s still not an outstanding relationship, but it’s a hell of a lot better. The point being: if you’re trying to figure out how a team is going to do over the rest of the year, you’re better off looking at the projections than you are looking at win/loss record. You’re better off looking at the preseason projections than you are looking at win/loss record, even though those projections are missing sometimes important data. Of course, the best plan would be to blend performance and projections, but the projections are king, here. For the sake of providing recent examples, this is why I don’t think the Twins have disproven anything. This is why it’s too soon to give up on the Mariners and Red Sox. If, in March, you had a given opinion of a team, then that opinion shouldn’t be too different in early June. (Provided the opinion was statistically sound.) Sometimes people want to know about the extremes. So, comparing early-season performance and the team projections, I identified the 25 biggest over- and under-achievers, through June 7. The relevant information: 25 biggest early-season over-achievers Actual Win% through 6/7: .597 Projected Win% through 6/7: .473 Actual Win% after 6/7: .493 25 biggest early-season under-achievers Actual Win% through 6/7: .374 Projected Win% through 6/7: .504 Actual Win% after 6/7: .496 The over-achievers got worse by more than 100 points. The under-achievers got better by more than 120 points. They didn’t regress completely to the preseason projections, but remember, those don’t fold in injuries, or transactions, where successful teams add and unsuccessful teams sell. The regression is strong. The results are undeniable, and though everyone wants to think certain individual teams are exceptions, the burden of proof is on them. You need pretty compelling reasons to throw the projections away, and those reasons have to go beyond “the team’s played differently for two months.” What should go without saying is that the games that have happened have happened, and they count. The Mariners and Red Sox have put themselves in a hole; the Twins and Astros have put themselves in good positions. That’s why you see big changes in team playoff odds, even though the projections themselves don’t budge very much. But, to this point, the Twins and Astros have the two best records in the American League. The rest of the way, they’re projected for the 15th- and 9th-best records. The previous sentence is no less important than the sentence before it.