Low Scoring Teams: Better Than You Might Expect by Dave Cameron January 18, 2013 The Seattle Mariners offense has been awful the last few years. Historically awful, in fact. Over the last decade, the two lowest scoring teams have been the 2010 Mariners (513 runs) and the 2011 Mariners (556 runs). At 619 runs, the 2012 Mariners moved all the way up to just 17th worst in the last 10 years. After three seasons of offensive ineptitude, it’s not a big surprise that the organization has dramatically shifted their focus, and has spent the winter collecting defensively challenged hitters with power, including Michael Morse, Kendrys Morales, Raul Ibanez, and taking a flier on the remains of Jason Bay. None of these guys are great players, but they provide the team with something they haven’t had much of lately, and combined with adjusting the dimensions of Safeco Field, it’s a pretty good bet that the 2013 Mariners are going to score more runs than the line-ups that they’ve put on the field the last three seasons. But, the question remains, will those additional runs scored lead to more wins? To understand the answer to that question, I decided to investigate the win estimators we have at our disposal, and see if perhaps there is perhaps an enhanced return to additional offensive levels for teams who have scored very low amounts of runs. Or, put more simply, does scoring 50 more runs help an offensively challenged team more than preventing 50 runs from scoring would? For the answer, we turn to the pythag family of win estimators, of which Bill James pythagorean expectation is the most famous. You’ve probably heard about his tool, usually just referred to as “pythag” or “pythag record”, which takes a teams runs scored and runs allowed and converts it into an expected win-loss record by squaring and dividing the numbers. For most teams, in most situations, pythag holds up pretty well, but for mathematical reasons that you can read about here if you’re interested, it breaks down at the extremes. So, two noted sabermetricians — David Smyth, inventor of BaseRuns, and the man who goes by Patriot — independently found that pythag could be improved upon by using a variable exponent, rather than simply squaring runs scored and runs allowed for every team in every situation. That formula has come to be known as pythagenpat, and it is generally accepted as the most accurate win estimator in that family of tools. Since it might help clarify why pythagenpat is preferable to pythag for these kinds of discussions, let’s look at how both tools project changes to the Mariners run differential in terms of wins and losses. We’ll just start with a baseline of 619 runs scored, using their 2012 season total, and then both add and subtract 100 runs from the total. Team RS RA Diff Pythag Pythagenpat Pythag Wins Pat Wins Seattle Mariners 619 651 -32 0.475 0.477 77 77 Add 100 Runs 719 651 68 0.550 0.546 89 88 Prevent 100 Runs 619 551 68 0.558 0.551 90 89 You can see the slight differences between the two systems start to show up here, as pythagenpat is more conservative by one win in both scenarios, whether you’re adding or subtracting 100 runs. However, it’s interesting to note that both systems actually prefer the run prevention approach, giving slightly higher expected winning percentages if the team prevented 100 runs from scoring rather than if the team added 100 runs of offense. It gets even more extreme if we make it 200 runs in either direction. Team RS RA Diff Pythag Pythagenpat Pythag Wins Pat Wins Seattle Mariners 619 651 -32 0.475 0.477 77 77 Add 200 Runs 819 651 168 0.613 0.606 99 98 Prevent 200 Runs 619 451 168 0.653 0.633 106 103 Here, you can start to really see the two diverge, as pythag begins to overestimate the amount of wins a team that allowed 450 runs to score would get. Again, pythagenpat is slightly more conservative, but again, both systems suggest that the team would win more games by preventing runs, rather than increasing their runs scored. The difference isn’t so large that the strategy should be on focusing solely on run prevention, but the results from both pythag and pythagenpat suggest that it doesn’t really matter too much which way the organization goes, whether it’s scoring more runs or preventing runs from being scored. Now, that’s what the models tell us, but I understand that not everyone trusts statistical modeling, so let’s just take this a step further and look at pythg, pythagenpat, and actual winning percentage for every team over the last decade. That gives us a sample of 300 seasons and 24,294 games played. Over that span, how well did the models predict actual winning percentage from a team’s runs scored and runs allowed? And, are the models breaking down at the extremes, causing us to question whether a run prevented is really as valuable as a run scored to a low offense club? The easiest way to evaluate those questions is to break the 300 teams down into smaller groups. I started off sorting by runs scored (by fewest number), then broke the 300 teams into deciles, or 10 groups of 30. Here is how those 10 groups performed. Decile W L RS RA Diff Pythag Pythagenpat Actual Group One 69 93 612 730 (118) 0.413 0.420 0.427 Group Two 75 87 661 718 (57) 0.459 0.462 0.465 Group Three 75 87 694 747 (53) 0.464 0.466 0.466 Group Four 79 83 716 744 (28) 0.481 0.482 0.485 Group Five 80 82 733 734 (1) 0.499 0.499 0.497 Group Six 80 82 752 772 (20) 0.487 0.487 0.494 Group Seven 85 77 775 738 37 0.524 0.523 0.526 Group Eight 89 73 799 716 83 0.555 0.552 0.551 Group Nine 85 77 830 782 47 0.529 0.528 0.526 Group Ten 92 70 888 778 110 0.566 0.564 0.565 Breaking news: Teams that score more runs win more games than teams that don’t, all else being relatively equal. Because we chose our groups by runs scored, the runs allowed number doesn’t fluctuate a great deal until you get to the last two groups, which contain a decent amount of teams that play in hitter friendly ballparks, leading both RS and RA to be higher than they are in the first eight groupings. Still, the pattern of run differential being the primary driver of wins and losses holds up pretty well, as both pythag and pythgenpat come pretty darn close to estimating overall group winning percentages for each decile. However, I think a visual aid will help you see that there is a pattern of the models missing slightly for certain types of teams. For the low run scoring groups, pythag and pythagenpat had a decent amount of divergence, as the floating variable suggested that the expected win totals were slightly higher despite their poor offenses than pythag would suggest. In reality, pythagenpat was right, but not right enough, as the low scoring teams outperformed their pythagenpat too. It ended up being a middle ground between pythag and actual, and only went half as far as it needed to in order to reflect actual wins and losses. This trend of pythagenpat giving slightly higher winning percentages than traditional pythag holds through the four lowest scoring groups, and actual winning percentage is higher than both estimators in three of the four groups, with it aligning with pythagenpat once. Actual winning percentage is higher in all four groups than traditional pythag suggests. The results are the opposite on the other side of the spectrum, with pythag overshooting actual wins and losses for each of the three highest run scoring groupings, while pythagenpat is very close to the actual at higher runs scored and runs allowed totals. Or, if you’d prefer to just see the differences by group between actual and the estimators, here’s a breakdown of the gaps by decile: Decile Pythag Minus Actual Pat Minus Actual Group One (0.014) (0.007) Group Two (0.006) (0.003) Group Three (0.002) (0.000) Group Four (0.005) (0.004) Group Five 0.002 0.002 Group Six (0.007) (0.006) Group Seven (0.002) (0.003) Group Eight 0.004 0.001 Group Nine 0.004 0.003 Group Ten 0.001 (0.001) Here, you can clearly see pythgenpat’s superiority to pythag, even though we’re talking about thousandths of a point in terms of win percentage. But, for nearly every grouping, pythagenpat is closer to actual than pythag, and at the very lowest end of run scoring, the differences actually becoming meaningful. For that first decile, which contains the 30 lowest scoring teams of the last 10 years, the difference in wins over a season between pythag (.413, 67 wins), pythagenpat (.420, 68 wins), and the actual (.427, 69 wins) begin to show up in whole numbers. Here’s where this wraps back into our original question about whether adding offense to a low run scoring club is preferable to adding run prevention, if we assume that an even number of runs saved or allowed can be achieved. The models suggested that preventing runs would be slightly preferable, but in testing the models, we actually found that they have slightly underestimated the number of wins that low scoring teams would have, and (very) slightly overrated the number of wins by higher run scoring teams. This suggests that if there has been a bias in the models over the last decade, it has been a pro-offense bias, suggesting that teams with low numbers of runs scored might be slightly better than even pythagenpat suggests, and are almost certainly better than traditional pythag would lead you to believe. The differences we’re talking about here are small in magnitude, so please don’t take this as a critique of pythagenpat. If you’re questioning the model’s accuracy, these numbers sure reassure you that it’s fundamentally sound, and it does do a good job of modeling wins and losses from runs scored and runs allowed. And, because the apparent bias might swing in favor of higher offensive levels, we don’t need to question the original conclusions supported by pythagenpat about whether a run scored or run saved is more valuable to a team with a base of 619 runs. Traditional pythag, pythgenpat, and actual winning percentage all support the idea that teams with bad offenses should be just as interested in improving their pitching and defense as they are improving their offense. There is no evidence of additional benefit from improving a bad offense rather than improving a strong run prevention squad. There is simply no way to look at this data and suggest that there are strong levels of diminishing returns for run prevention, or that the models overrate the likelihood of a team with a bad offense’s chances of winning. If anything, the data points to the models slightly underrating those types of teams, and confirming the idea that, when it comes to winning more baseball games, a run is a run is a run. Now, it’s almost certainly easier to improve on a weak offense than it is to improve on a strong run prevention group, or even vice versa. Filling a hole with a moderately useful player is simply not as challenging as upgrading on that a productive member of your team, and it’s certainly engrained within our personal psyche to focus on fixing what’s broken rather than improving areas that are working just fine. I’m not using this data to say that a team with a bad offense should just be content to keep having bad offensive clubs and focus entirely on preventing runs. I am saying, however, that if a team makes a conscious decision to trade 20 runs allowed for 15 runs scored, they’re making a bad decision, no matter how bad their offense was the previous year. What matters is maximizing your ratio of runs scored to runs allowed, not reaching some kind of ideal balance between the two. Making a larger downgrade in pitching and defense in order to fix a bad offense is a trade-off that is likely to result in fewer wins. The same is likely true for swapping out hitters for pitchers, if you had a bad pitching staff last year. Building a baseball team isn’t about simply improving on weaknesses. Building a baseball team is about putting as many good players on the field as possible, and caring too much what kinds of good players those are often leads to poor decision making. Don’t focus so much on scoring more runs or preventing more runs. Just focus on outscoring your opponent. That’s what wins games.