Adjusting Linear Weights for Extreme Environments by Steve Staude February 12, 2013 Well, it’s my first assignment as a real writer, having been promoted for my Community Research articles on pitcher BABIPs and ERA estimators, and I’ve been thrown into the deep end of the pool: linear weights. It’s a tricky subject, but I’ll try to walk you through both the problems with linear weights and how they can be overcome. This article series mainly draws from various works of Tom “Tango,” a.k.a. “tangotiger,” the creator of wOBA and FIP, as well as from David Smyth’s BaseRuns. I’ll go deeper and deeper down the rabbit hole of stat geekishness as the series goes on, eventually emerging with a spreadsheet version of Tango’s Markov run modeler that I made for you all to play with. Where the Markov mainly shines over wOBA is when it comes to extreme run environments, such as unusual offenses or extreme ball parks. Who cares about extreme run environments? Nerds like me, I guess? Tom Tango cared enough to come up with ways to address the shortcomings his original wOBA formulation. If you’ve ever wondered how valuable a certain player is to your favorite team, maybe you should care too; that low-OBP slugger might be more valuable than wOBA might suggest to your low-OBP team. On the other end, a typical walk last year was worth considerably more to the high-OBP Cardinals than it was to the low-OBP Mariners (around 0.04-0.065 more runs each… which adds up over a season). Let me pose a hypothetical: if a team’s offense hits one home run per game, and nobody else ever reaches base otherwise, how many runs per game would that team average? The answer is 1, right? So why do the linear weights used in wOBA, for example, say a home run is worth “1.4 runs above average, and 1.7 runs above the run value of the out“? Well, this, and the other linear weights models of scoring break down in unusual circumstances, due to their assumptions that certain conditions will be at the league average. In this case, the linear weights are overestimating how many runners will be on base when the home run is hit, as it’s going by what’s typical. That’s fine for a team that has a pretty typical number of runners on base for each home run. Yup. So, what’s a home run really worth? That depends on how many runners are on base, right? So, between 1 and 4 runs, you might say. Of course, the home run hitter deserves full credit for driving himself in, but he doesn’t get to take the credit for runners getting on base ahead of him — only for driving them in. So, we now have to divvy up credit for the runs scored on the homer between the players involved. How do we do that? Well, you don’t get 1 credit for each RBI and each run you score, because then a team would end up with twice as many credits as actual runs scored. It would be closer to a half credit for each RBI and run, but it’s more complicated than that. Let’s look at the 4 runs produced by a grand slam, for example: The hitter gets 1 run credit, for the RBI and the run he scored himself + whatever driving in the 3 runners is worth The 3 runners get credit for getting on base, plus for the RBI potential of the type of hit (or walk) that got them on base. In the circumstance of being driven in by a HR, for the purposes of run-scoring potential, it doesn’t really matter whether they got on via walk, single, or whatever, since a HR drives them all in. That’s actually kind of complicated, when you think about what goes into “whatever driving in the runners is worth.” That depends on how likely those further down in the order would have been to drive them in instead. That isn’t a simple thing to figure out; it in turn depends on things like: the initial base positions of the runners the rates of BB, 1B, 2B, 3B and HR (etc.) of subsequent batters the speed and base-stealing abilities of those on base the number of outs Then, of course you also have to consider how many runners are likely to be on base when a home run is hit. This is the most important factor of them all in this equation, if you ask me. Interactions If your eyes haven’t glazed over yet, hopefully you see this: different types of hits have different values to different teams. The fact is, a team’s on-base ability, slugging, and speed all interact with each other when it comes to the process of scoring runs, such that one factor can add or subtract value from another. I’ll now break down some of the ways the abilities impact each other: The more runners that are on base, the more value any subsequent hit has, all else equal, as there are more RBI opportunities. (Up to a point… more on that later) If the team has a very high OBP, it will be able to sustain longer rallies, and will therefore be less dependent on the home run to score runs (i.e., singles, walks, etc. will be more valuable relative to the home run, compared to low OBP teams). In a low-OBP team, however, while a home run is likely to score fewer runs than it will in an otherwise similar high-OBP team, the value of a home run relative to other hit types will be greater, as the team will be less likely to rally. Digging even deeper, if a team hits a lot of home runs, the average value of a home run actually drops, due to more runners having been cleared from the bases by previous home runs. Base running ability becomes more relevant the closer to 1B the runner is, as only during a couple particular types of batted ball (some grounders and some flyouts) will the speed of a runner on 3B make the difference between a run and a non-run. So, good baserunning is relatively more important to a low-OBP team, as there will be fewer rallies that allow the runners to advance. The abilities of the base runners are made less relevant, the greater the value of the hit that advances them; home runs and triples automatically clear the bases regardless of the speed of a base runner (only in rare occasions does a slow or stumbling runner on 1B prevent the batter from reaching 3B). So, good base running is also more important to low-slugging teams. This one is pretty important: the fewer outs the batter makes, the more opportunities (plate appearances) he allows his teammates and himself to have, which by itself allows the potential for more run-scoring. Most probably seem obvious to you now, yet linear weights formulas ignore, or don’t properly deal with these interactions. They assume that a walk, or a single, or a home run is each worth a fixed value based on league averages. Some systems, like wOBA, do now recognize that the values aren’t really fixed, and change them annually to reflect the run environments of the time: Tom Tango came up with these, but FanGraphs has the most updated list of wOBA constants here: http://www.fangraphs.com/guts.aspx. To find the run value of each event (relative to the out), you divide the factor by the wOBAScale constant of the same year. Below, you’ll see five of the highest and five of the lowest years in terms of run values of singles (R/Single) and home runs (R/HR). The run values include the value of not making an out. Year R/Single R/HR R/G AVG OBP SLG OPS wOBA 1930 0.8214 1.723 5.546 0.296 0.356 0.435 0.791 0.356 1936 0.7833 1.683 5.187 0.284 0.349 0.404 0.753 0.349 2000 0.7826 1.696 5.140 0.270 0.345 0.437 0.782 0.341 1994 0.7627 1.686 4.923 0.270 0.339 0.424 0.763 0.333 1950 0.7623 1.689 4.849 0.266 0.346 0.402 0.748 0.346 1908 0.6159 1.603 3.383 0.239 0.297 0.305 0.602 0.297 1968 0.6233 1.603 3.417 0.237 0.299 0.340 0.639 0.291 1917 0.6395 1.605 3.588 0.249 0.311 0.324 0.635 0.311 1943 0.6643 1.630 3.911 0.253 0.323 0.344 0.667 0.323 1976 0.6739 1.623 3.995 0.255 0.320 0.361 0.681 0.315 Notice any trends that explain why the values go up or down? Blatant hint: look at runs per game (R/G). More on that in the next article… So, where do we go from here? If a context-neutral stat for each season is good enough for your needs, then this is the end of the line, and you can just go with the wOBAs listed on FanGraphs. But if you want to see how much a player impacts a particular team, or you want to analyze a hypothetical team (e.g. estimating how many runs your team will score next year based on projected stats), you’ll need to go deeper. In the next part of the series, I’ll tell you more about that.