The Year BaseRuns Failed

by Dave Cameron

August 25, 2015

Around here, you know that we spend a lot of time working with metrics that attempt to strip noise out of results. Often times, we’re less concerned with what has happened and more concerned with what is going to happen, and these component metrics often do a better job of isolating either a player or a team’s overall contribution to the results, while removing some of the factors that lead to those results but aren’t likely to continue in the future.

At the team level, the most comprehensive component metric we host is called BaseRuns, which evaluates a team’s quality based on all the plays they were involved in, without regard for the sequence in which those events occurred. BaseRuns essentially gives us a context-neutral evaluation of a team’s performance, assuming that the distribution of hits and runs isn’t really something a team has a lot of control over. BaseRuns can be thought of as the spiritual successor to Bill James‘ implementation of the pythagorean theorem to baseball, as pythag strips sequencing out of the conversion of runs to wins, but doesn’t do anything to strip the sequencing effects out of turning specific plays into runs scored and runs allowed.

Historically, BaseRuns has worked really well. For the years we have historical BaseRuns data (2002 to 2014), one standard deviation was right around four wins, and the data appears to be normally distributed; 73% of team-seasons have fallen within one standard deviation, 97% of team-seasons have fallen within two standard deviations, and no team had ever exceeded three standard deviations. There have been years here and there where a team sequenced their way to an extra 11 or 12 wins, but they weren’t very common, and that was usually the only break from the norm in that season.

Until this year. Here is the year by year standard deviation in BaseRuns wins versus actual wins for every year that we have the data.

As you can see, the numbers are pretty close to the four win standard deviation in almost every year, with a slight increase in 2009 (4.76) and then a sharp drop-off in 2010 (2.83) before it re-stabilized right back around 4.00 or so again. Outside of that weird 2010 season where BaseRuns was amazingly prescient — every team was +/- five wins, except the Astros, who were +8 — one standard deviation has been between 3.6 and 4.8 wins in every season, more regularly in that 3.9 to 4.3 range.

Then there’s this year. The current standard deviation in actual wins versus BaseRuns is 5.7 wins, and that’s with only about 75% of the season in the books. Keep in mind, this is not a projection of how many games teams will lose versus their expected record over a full 162 game season; this is how many wins they’ve already deviated from in their first ~125 contests or so. The Royals and Twins have already added 10 wins to their ledger through the power of sequencing, while the Cardinals are nine wins up because of the ordering of their events. On the other end of the spectrum, the A’s have already squandered 13 wins because of their horrific sequencing of events, which would be the largest deviation from BaseRuns of any team in any season since we have the data.

Three teams have outperformed their BaseRuns by 12 wins prior, but that was 12 wins over 162 games. The A’s have underperformed by 13 wins in 126 games; if they sustained this pace over the rest of the year, they’d end up 17 wins off their BaseRuns total by the time the season ended. The Royals and Twins are both on pace to end up at +13 wins, no team has done in the previous 13 years we have BaseRuns data for. The Cardinals would end up at +12, tying the previous record for largest deviation.

If we extrapolate the current deviations from BaseRuns out to 162 games for all teams, then the final season standard deviation for the league would be 7.4 wins, nearly double what it has been in previous years. Of course, the teams that have defied BaseRuns so far probably won’t continue to do so at the same rate over the final six weeks of the season, but even if every team plays exactly as BaseRuns would expect over the rest of the year, we’re still likely to end the year with three teams at at least 10 games off of their BaseRuns record, and the Cardinals could easily make it four. Heck, the Reds, Marlins, and Rangers are all close enough where it wouldn’t be entirely crazy if any of them ended up +10 or -10 by the end of the year, so theoretically, we could have seven teams end the year with double-digit differences between their expected records and their actual records.

In the previous 390 team-seasons that we’ve tracked BaseRuns for, that has happened a grand total of nine times, or about 2.3% of the time. This year, we’re looking at somewhere between 10-20% of teams finishing to the very far edges of the spread. It’s been a weird year, to say the least.

Most likely, this is just the opposite of what happened in 2010, where sequencing didn’t really matter much at all, and the standard deviation dropped to its lowest point before shooting right back up to normal the next year. This is probably just a blip, the kind of thing that happens given enough years, and probably doesn’t mean that teams have figured out how to cluster their good performances together in a sustainable way.

But it’s worth keeping an eye on, at least. While no one has ever really been able to show that organizations can continually build teams that specialize in sequencing skills, it is certainly possible to imagine ways in which those skills could be found in the future. A team that could figure out how to build reliably dominant bullpens, for instance, would very likely beat a context-neutral estimation of their win totals on a regular basis, since the runs relievers allow (or don’t allow) have an outsized impact on win-loss records. But because bullpen performances are highly volatile, it’s a difficult strategy to execute on an annual basis.

We certainly shouldn’t discard BaseRuns as an effective model of team quality simply because this year’s results haven’t conformed to the expectations. Historical precedence still suggests that team’s can mostly only control the quantity of positive and negative events they’re involved in, and not the timing of when those events occur. But the timing can make a big difference, and this year, a lot of teams have sequenced their ways to very different outcomes than expected.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG