We Provide Leverage: A Thought Experiment by Ben Clemens February 24, 2020 Last week, when giving our playoff odds a quick once-over, I stumbled across something interesting. In translating from player statistics to our projections, we strip out the impact of reliever leverage. That seems intuitively weird, so I wanted to delve into the thinking behind it and see if I could find a workaround. First, a quick recap of the issue. When we calculate WAR for relievers, we include the impact of leverage. This makes sense — the last reliever off the bench is mostly pitching in blowouts, so their contribution, good or bad, is less important than the closer’s. If you used a dominant reliever in a mop-up role, they’d be far less valuable than if they got to pitch in games where the outcome was uncertain. How do we adjust for leverage? It’s reasonably straightforward. Take a reliever’s gmLI, which you can find in the Win Probability section. Kirby Yates, for example, had a gmLI of 2.16 last year. gmLI is the average leverage index when a pitcher enters the game. You can find a recap of leverage index here, but it’s essentially a measure of how important a given plate appearance is. A leverage index of 1 means that the situation is exactly as important as the average plate appearance, 2 means the situation is twice as important, and so on. With a reliever’s gmLI in hand, we use a conversion formula. Take the gmLI, add one, and divide the result by two. That gives you the number to multiply the reliever’s “raw” WAR by to arrive at the WAR you’ll see in our stats. Let’s use Yates again as an example. His gmLI was 2.16. Adding 1 gives us 3.16. We then divide by 2 and arrive at 1.58. Yates’s “raw” WAR last year (which you can calculate using the method here), which isn’t displayed anywhere on our website, was 2.15. Multiply that by 1.58, and we get the 3.4 number you’ll see on his player page. Okay, now that we have the basics, let’s completely ignore them. Or rather, let’s consider how BaseRuns ignores them. BaseRuns is a formula that converts a team’s raw offensive (or defensive) inputs into runs scored (or allowed). It powers our projections; to work out a team’s expected runs scored per game, we take the weighted average of every projected plate appearance (using our Depth Charts playing time projections) and jam the result into the formula. We do the same for pitchers when calculating runs allowed. On offense, there’s no need to account for leverage. So long as you apportion the plate appearances correctly by lineup spot, you’ll likely do a good job of capturing a team’s offensive performance. Batters with a special skill for coming to the plate in high-leverage situations don’t exist, with the exception of pinch hitters, who bat only a handful of times all year compared to the crush of regular plate appearances. Starting pitchers are, roughly speaking, the same deal. Aside from being the road pitcher with a talented offense behind you, there’s not really any way to affect your leverage; home pitchers always start in a 0-0 game, and road pitchers either get a few runs or don’t. Besides, it’s not a high-leverage spot; it’s the first inning! So blending the stats without regard for leverage works well. Relievers are a different story. They’re the one instance where leverage plays a key part in their value to the team. If you’re combining how important Mike Trout and Joe Average are to a team’s offense, you can weight by plate appearances and come out just fine. But if you’re comparing how Yates and Michel Baez affected the Padres’ fortunes, a blend does you a disservice. Baez faced half as many batters as Yates, but he wasn’t half as important. He had a gmLI of 0.72, as compared to Yates’ aforementioned 2.16. Our WAR multiplier technique would tell you that a given Baez batter faced was about 56% as important to the Padres as a Yates batter faced. That’s an important part of team construction. No one expects the Padres to pick relievers out of a hat when it’s time to bring someone into the game. But if we merely blend their statistics based on projected batters faced, that’s essentially what we’re doing. Something’s off with the method, albeit in a small way: reducing Yates to an average-leverage reliever would “cost” the Padres only 1.2 wins, and he was the second best reliever in baseball last year. Just because the effect is small, however, doesn’t mean we should ignore it. Unfortunately, it’s not as easy as just telling Doctor BaseRuns, the evil genius behind the curtain, to think about leverage when the projections are being tabulated. If we’re going to consider leverage, we’ll need to do it intelligently. Otherwise, we might be introducing more error into the calculation than we’re trying to remove. Solving this by manipulating the BaseRuns formula is daunting. There are so many moving pieces that trying to monkey around under the hood is both tedious and confusing. To approach this in a relatable way, we need to turn to the world of theoretical physics. I know what you’re thinking — put Kirby Yates and Michel Baez in a particle accelerator and smash them together at a meaningful percentage of the speed of light. Aside from being inhumane, that’s also impossible; they’re both far too heavy to achieve such speeds, although if we’re talking about banging schemes, their collision would create quite the resounding din. No, I’m talking about a gedankenexperiment, a term first coined by Danish physicist Hans Christian Ørsted and later used by such luminaries as Albert Einstein and Steven Hawking. While it’s a fancy German word, it’s a simple concept. Design an idealized world. Perform a theoretical experiment in that world, with all the messy noise of reality excluded. Generalize back from that idealized world to an overarching concept. Schrödinger’s cat is an example of such an experiment, and Hawking Radiation, the energy that bleeds off of black holes and makes them evaporate (on a cosmic time scale) was discovered this way. We’re not postulating quantum physics, which makes our job easier. Rather than use BaseRuns, we’ll consider its output: runs scored and allowed, and therefore winning percentage (we use a Pythagorean expected winning percentage to convert BaseRuns into team strength). Also: simulating a full season sounds like a chore. Heck, simulating a full game sounds like more than I want to handle. Instead, let’s consider a single inning. More specifically, let’s consider a hypothetical ninth inning. The home team has two pitchers; Kirby K-Rate, an elite closer, and Bob Bullpenshuttle, a generic reliever. While we’re assuming things, let’s create some run distributions. The offense scores two runs in the inning 10% of the time and one run 35% of the time. This works out to about five runs per game over the long run. Our closer is great — he allows one run 15% of the time, two runs 5% of the time, and zero runs the remaining 80% of the time. Our other reliever, well, yikes: they allow one run 45% of the time, two runs 20% of the time, and zero the remaining 35% of the time. Not great, Bob! Consider a game where the team comes into the ninth inning trailing by a run. The offense’s results go down the left side, and the pitcher results go across the top: Game Results- Home Team Down 1 Runs Away 0 Away 1 Away 2 Home 0 Loss Loss Loss Home 1 Tie Loss Loss Home 2 Win Tie Loss Let’s further stipulate that all ties are 50% affairs, as that will help the math. Armed with these, we can create a joint probability grid. It looks like this when Kirby K-Rate is in the game: Odds of Home/Away Score Combination Runs Away 0 Away 1 Away 2 Home 0 44.00% 8.25% 2.75% Home 1 28.00% 5.25% 1.75% Home 2 8.00% 1.50% 0.50% Merge that grid with the outcomes, and you can work out a win probability: the home team is 22.8% likely to win the game when they enter the ninth inning trailing by a run with Kirby K-Rate on the mound. Next, let’s start assuming game states. Our theoretical team has a 50% chance of entering the ninth inning tied, a 25% chance of entering it up a run, and a 25% chance of entering it down a run. Further, they always use their best pitcher in a tie game and their worst pitcher if one team is ahead. We can use the method above to figure out the chances of the team winning in each of the scenarios. When they’re trailing by a run, they win 11.88% of the time with Bob Bullpenshuttle in there to get shelled. When they’re up a run, Bob takes it home 73.1% of the time. And when they’re tied, they win 62.38% of the time with Kirby K-Rate locking down the opposition. Overall, they have a winning percentage of .524, roughly an 85-win pace over a 162 game season. Now we need to work out a Pythagorean expectation in this world. There’s nothing inherently perfect about the exponent we use in our projection system — it’s called Pythagenpat, and it’s been empirically tested to work in the scoring environments and game lengths that exist in the game today. I’ll save you the hassle of all the various nonsense math I went through in working this out, but an exponent around 0.6 does a good job of minimizing error in this weird one-inning world. It’s not perfect, but it’s not designed to be. Bear that in mind. Next, let’s plug our team into the Pythagorean formula. If you look closely, you’ll notice that the weights I gave lead to a team that both scores and allows 0.55 runs per inning. Pythagorean expectation says that we have a .500 team. But it’s broken! We know, for sure, that we have a .524 team. Let’s do something about that. Given that our baseball doesn’t resemble real baseball, we’ll need to work out the leverage of each situation a new way. I chose to consider leverage as how valuable it was to go from the bad pitcher to the good pitcher. A team that could use the closer in every game would have a winning percentage .193 higher than a team who had to use the bad pitcher. Therefore, the leverage of a given situation is relative to that. It looks like this: “Leverage” By Situation Situation “Leverage” Index Down One 0.56 Tied 1.22 Up One 0.99 Now that we’ve got those leverage levels, let’s do a simple thing: rather than weight each pitcher solely based on the number of appearances when we’re calculating the runs allowed portion of Pythagorean expectation, let’s multiply the inning weight by the leverage index and re-scale to one, like so: Simple and Leverage Weightings Situation Runs Allowed Inning Weight Leverage Weight Down One 0.85 0.25 0.14 Tied 0.25 0.5 0.61 Up One 0.85 0.25 0.25 Our team allows a leverage-adjusted .483 runs per inning, lower than the simple .55 runs per inning they allow. This accounts for the fact that more of the allowed runs come in unimportant situations than you’d expect if they were distributed randomly. Plug that into our modified Pythagorean formula, and we get an expected winning percentage of .5195, very close to the actual .524 we calculated. What does this all mean? Well, not much yet. It’s not as though I’m going to snap my fingers and change all of FanGraphs’ carefully built projection systems to incorporate leverage. It doesn’t mean we have the exact formula for it, or anything like that. But it does mean that you can use leverage to improve your game predictions. If we know more about the situations where pitchers are used, we can refine our naive win/loss projections. This seems, to me, like an absolute win. It’s not a workable system, at least not yet, and it’ll require some thinking and discussion behind the scenes to determine if it’s worth pursuing, and if so how to fold it into BaseRuns. But this thought experiment proves that there’s something there, and I can’t wait to explore it more. I should add, as a postscript, that I don’t believe that adjustment for leverage is appropriate for existing Pythagorean records. Pythagorean expectation is expressly not a way of telling you how many games a team should have won. It’s a useful tool, and it’s key to our understanding of how many runs constitute a win, but teams do plenty of things not captured in Pythagorean expectations that help them win games. This is one of them, and in the case where we’re projecting records with Pythag, it’s worth considering a modification — but that doesn’t say anything about it overall. I still think it’s a useful statistic, exactly as-is.