# How Can We Predict Stolen Base Talent?

Predicting the ability to steal bases is not something you think you need to do. You did not say to yourself over breakfast, “I wonder if Michael Bourn can steal bases?” You already knew he could. And maybe that’s what made breakfast so delicious.

But if we want to push the frontier of base running, if we want to see the end of the home run era become the beginning of the efficient base running era, we have to do this thing we thought we did not need to do. We have to be able to predict stolen bases.

We recently broached the idea of maximizing base-stealing efficiency. We learned: Base running league-wide has become more efficient and more valuable, but the volume of steal attempts has not grown like we would expect. Judging by the league-wide frequency of home-run rates, teams can now afford a stolen-base success rate around 66%, but have instead opted to steal at comparatively conservative 74% success rate.

In fact, if we look at the matter historically, we see that teams since 1951 have become increasingly more conservative when it comes to the acceptable rate of SB success — and likely important to that development, the home run rate has eased upward too:

Why would you steal bases if a homer does the job even better? So, on the whole, this trend makes sense. But with the end of the steroid era, fewer homers has *not* led to lower stolen base standards, and this has ended what we could consider — by linear weights standards — stolen base equilibrium:

So how can we measure the opportunity cost of a more passive running game? And how do teams find the right balance?

One of the problems of studying the running game is its complexity.

In my initial post, I wrote about how each team, given their unique home run rate, had a different acceptable SB-rate. But, even more specifically, each player has a unique SB-rate given the five players following him in the lineup (a player will never be on base for the sixth player, thus the following five only matter, and really only four, because by the fifth batter, the player is only on base still in a bases-loaded, two-outs situation, wherein a steal of home is either impossible or flat DUM).

So, this peculiarity breads a second — that of measuring opportunity cost. Not every base runner is made alike. When we say, for instance, the Phillies need to attempt more steals, we should not be saying, “Ryan Howard, lace up!” (Howard, it should be noted, though, has a 3:1 SB-to-CS ratio in his career. Also, he should lace up his shoes regardless of his base path intentions. It’s just good fundamentals.)

Base stealing efficiency is still a golden standard. Teams should not send runners in obviously hopeless situations. In 2012, Johnny Cueto allowed only 1 stolen base and collected 9 pickoffs. Don’t try to steal on Johnny Cueto. In fact, don’t bother getting a lead. Just plop you butt on the bag until the ball’s in play.

But we have to reasonably assume that, because of managerial dictates and because of player’s personal preferences, the likes of Juan Pierre and Jimmy Rollins passed up on stolen base opportunities in 2012 at which they had a reasonable chance to succeed. Let us pretend all 35 of Rollins’ attempted steals in 2012 were the top 35 chances he had. Let’s say it worked perfectly efficiently and in each of those situations, he had an 86% chance of a successful steal (the success rate at which he finished the season).

We can then suspect the next 35 attempts had a reduced chance of success. Perhaps they were increasingly difficult on an exponential scale. So, maybe the next 5 attempts had an 80% chance; the next 5 a 74% chance; the next 3 a 65% chance; the next 2 a 56% and so on until the final two attempts would have come against Mr. Johnny “10%” Cueto.

If we have, in hand, this kind of sliding scale, one that accurately predicts the success of each additional steal attempt (assuming a perfectly efficient analysis from the runner), we could measure the opportunity cost of additional steal attempts.

And there’s where the matter gets murkier than a river in Hebei. Each player and each team has their own method of and talent for stealing bases. Moreover, player aging makes each data point a moving target.

To put this visually, we have 30 teams, each with their own collective team speed and base-running talent and home-run rates. That means the chance and reward of success for each marginal steal is different for each team. We have 30 different curves, each spaced out along a continuum of steal attempts:

We can see four dimensions here:

**1. Additional steal attempts —**this is a each additional steal. We want to create a model wherein we predict the effectiveness of the next steal attempt (read: the next-best steal attempts) so that we can guess the total run costs of being passive or conservative on the base paths.

**2. Additional run values —** we can determine this using linear weights.

**3. The space between curves —** a team’s home run rate mostly determines this. We discovered that in “The Changing Caught-Stealing Calculus,” and we can adjust for that easily (as we did a little in “The CCCC” article).

**4. THE GREAT MISSING DIMENSION: The shape of the curves, or their elasticity —** this is determined by the players on the roster, their innate steal talents. We need this to move forward.

It would be great to have a simple solution to the missing dimension. Maybe there is and I am simply not seeing it. (Please tell me there is.) Production could be very near the apex, could be at a point along the curves where the gains of an additional attempt are nil. If Jimmy Rollins’s next 5 attempts have a 50% success rate, then he should not attempt those, more than likely.

But the curves could also have steeper slopes — the Padres led the league with 155 steals but still had a 77% success rate. Teams knew they would steal, they didn’t want them to steal, they tried to stop them from stealing, but the Padres kept stealing. In fact, they led the league with a 9.3% steal-per-opportunity attempt rate. If the team attempts more steals and they are still far from the apex, the marginal benefit would decrease at an increasing rate, but the total run output would still improve (up to the point of the apex).

This is the heart of the matter: We want to predict Dimension 4. We want to normalize that dimension so that we can predict the shape of a curved based on certain dimension — let’s say StealX, a mystery stat (or stats) that will give us a measure of stealing ability. Then, paired with HR-rates, we can decide almost exactly what rate of steals a team should employ in a given season, how risky they should make their running game. Using a next-five-home-run-rates method, we could determine the approximate acceptable risk for each player in the lineup.

I would like to think StealX is the Bill James Speed Score or Fangraphs’ Base Running (BsR) numbers, but since both include elements of stolen base success, using them would be tantamount to bringing multicollinearity into your house and smearing it all over your walls.

Perhaps as a measure of team’s self-perceived stolen-base talent, we could look at the rate of steal attempts per opportunities (as in steals per times a player was on base with a base open, excluding steal of home opportunities, as defined by Baseball-Reference). Over the last 20 seasons, teams have used very different levels of aggressiveness, so this gives us a decent sample with which to work:

*NOTE: They don’t show up on the x-axis here, but who do you think had the lowest attempt rate over the last two decades? Would you believe it was the 2000 Athletics? Yes, you would believe it. In fact, you knew it — deep in your chest, behind the lungs somewhere — you knew it already.*

The rate a team attempts steals could suggest the quality of base-stealing talent on their roster, but I fear one major confounding factor is that most managers are — at least subconsciously — aware of the relationship between home rates and steal values. So attempt rates may also reflect the manager’s understanding or expectations of the lineup, or reflect the manager’s personal style. Either way, attempt rates have little in the way of a meaningful relationship with stolen-base success (think: .056 R-squared).

We could look at extra bases taken (not extra base hits, necessarily) as this can act as a proxy for both speed and base running awareness. Again, Baseball-Reference has a healthy compendium of this data (though collating it into a single spreadsheet encompassing multiple seasons is like doing dental work on a G.I. Joe). But every whichaway I regressed these elements on each other produced little more than a .120 R-squared — meaning extra bases and similar such stats explained only, at most, 12% of the fluctuations in either SB success rates or SB attempt rates.

Ideally, I would have batter’s-box-to-first-base stopwatch times for players, some concrete speed stat, independent of the SB-CS ratio. But I don’t. But I do have something else.

I have you. That’s right, Atreyu, I’m looking for your expertise. For the life of me, I cannot fathom the next step (maybe I’m at the edge of the fjord and I should just stop), but I suspect there is some complex, long-forgotten (by myself, at least) statistical technique that will render extra-base-taken or the stolen-third-base-rates somehow newly afresh.

Here is the data I collected:

But don’t feel bound to this set. If there is something I am overlooking, please let me know.

Why is this important? Why is this worthy of your talent and time?

If we can predict a team’s stolen base talent, we can predict a more exact stolen-base attempt rate for that team. And if we can do it on the team level, we can do it on the player level and the lineup level just as easily. We can look at a player’s StealX and the following five HR-rates and say, Jimmy, get aggressive, try for a steal 15% of every opportunity you’ve got.

But first I need your help. First we need to find StealX. We need to predict stolen base talent.

I have a question about the Acceptable SB% in the 2nd graph. Is this the break-even point? Because if so, it suggests to me that teams were stealing too much. An Acceptable SB% equal to the actual SB% would imply that net, teams derived no value from SBs.

Correct. It seems that only recently have teams been able to extract any net positive value from SBs. My guess is that the success rate will continue to improve as teams eliminate attempts where the probability of success is <66%.

Right that’s what I was thinking. Intuitively, you want to keep stealing while the *marginal* success rate is above the break-even point, but the marginal rate will always be lower than the average rate. If the average rate equals the break-even rate, that would imply teams were attempting steals with a probablility below the break-even rate that were pulling the average down to break-even.

This is precisely why we want to find some method of guessing marginal steal success rates. No, you don’t want to have a net 0% (a la Mets and Cubs in 2012 http://www.fangraphs.com/blogs/index.php/the-changing-caught-stealing-calculus-2/), but keeping 5% or so above should, theoretically, keep you in the most productive zone. I look at the late 1980s as the ideal era of base-stealing efficiency.

Gotcha. So your contention is that the marginal SB% is currently above the break-even point. I certainly could see that being true; as you allude to in the piece that’s a very difficult thing to analyze.