A Prelude to a Study: Caught Stealing Variables and Assigning Responsibility

Pitcher-catcher batteries are unpredictable and fickle beasts. The mundane statistics that are commonly used to evaluate batteries — SB, CS, PB, WP — only help to confuse the already fine line between the responsibilities and influences a pitcher and catcher have on each other. Simply, a stolen base does not give us enough information to identify which of the battery mates is responsible. There are many underlying variables, lurking ever so quietly just beneath the box score, that become more muddled over the course of time.

Baseball in its purest form, is a matter of safe or out. Same rule applies for a stolen base and caught stealing. The difference between the two may be a matter of milliseconds, a conjunction of a handful of hidden variables coming together all at once. Yet in the end, it is as simple whether or not the baserunner was safe or out.

However, analytically speaking, I want to know why. I want to know why the outcome occurred, more so than the end result. In doing so we would have more information to predict further outcomes and assign responsibility to who influenced the outcome more — the pitcher or the catcher?

So, next time you pick up a box score, in your local newspaper or when scrolling through the internet, consider these subtle but significant variables when you see a stolen base — as I have:

  1. Baserunner speed and prior success: How fast did this runner make it to the bag? How large of a jump did he have? What kind of lead did he have? Has he had success in the past?
  2. Catcher release time and previous success: How quickly did the catcher catch and release the ball? How fast were his feet? Where was the location of the throw to the bag of interest? Has he caught runners at a high clip in the past?
  3. Pitcher release time and previous success: How fast, from first move to the plate, did the pitcher deliver the ball to the plate? Has this pitcher had success picking runners off and limiting the running game in the past?
  4. Handedness of the pitcher: Which side is the pitcher throwing from? Did this effect the runners lead at first? How did this effect the runner’s time from jump to bag of interest?
  5. Velocity/type/location of the pitch: Was the pitch a fastball? Where was the pitch located? Did the velocity/pitch type/location give an advantage to the runner or the battery?  Did this allow for the optimal transfer for the catcher? Was this by design, or random? How does the average battery perform with the same velocity/pitch type/location?

This piece was initially intended to explore which battery mates, in 2013, provided the most value to their battery mate using a method called WOWY — which is used to evaluate a catcher’s influence on a pitcher, or vice versa, by comparing their performance together with their performance without each other. It’s a useful method, but its not without its flaws. Especially when trying WOWY with my bBRS numbers — my metric on quantifying how many runs a battery was worth in the running game — the lurking variables above seemed to jump off my spreadsheet.

The problem with that approach is that WOWY inherently assumes previous performance should dictate performance in the scope of the battery of interest.  WOWY works great when attempting to find the responsibility in terms of SB, CS, PB, and WP, mostly because these statistics assume that the catcher is responsible for all outcomes.

Now, for something like bBRS, a run based counting statistic, I want to make sure that I equally assign credit where it is due.  I feel with lots of research there may be a better way to assign concrete responsibility to a battery mate rather than reside to the assumptions inherent in the very basic SB and CS statistics.

Methodology 

Between the 2011 and 2012 seasons there has been 7757 stolen base attempts — specifically from first to second base. 5641 of these have resulted in a stolen base — 2116 of these have been caught stealing. Our population will act as all the stolen base attempts from first to second base during this time period — primarily because that is how far MLB.TV goes back.

I cannot go through all 7757 stolen base attempts, so for the sake of my sanity, instead we will create a representative sample of the population. First I have pulled all the stolen base attempts via retrosheet, and created two different groups — stolen bases and caught stealing. From those stratified samples, I randomly selected 50 stolen bases and 50 caught stealing and combined them into one large “representative sample”. Now, the number of stolen bases to caught stealing is not proportional to the population, but we want to get a good feel for the distinction between both — so it will serve us well enough.

I am keeping track of the handedness of the pitcher, the batter, the count, the outs, the innings, the pitch velocity, and the pitch location — I will also blatantly attempt to rate the catcher’s throw from the plate to second base, but of course this would be a very rudimentary estimation.

To record times, I am using a video software that times with great accuracy. For timing a pitcher’s release, I will start timing from first move to the point in which the catcher catches the ball. For timing a catcher’s throw, I will start from the time the pitch is caught to the point in which the throw is caught at second.

By doing this I want to quantify the distinction between a caught stealing and a stolen base — and who is accountable for it in the long run. By quantifying the hidden variables that ultimately led to the CS or SB we can perhaps assign responsibility for who deserves credit for base running outcomes in general.

Crowd-sourcing

In any well thought out study, it’s probably a good thing to have multiple samples and trials. That’s where you come in:

Below I have linked a google doc — available to all — that will serve as the public crowd-sourced sample. This sample is completely different than my own, but will be used to cross reference my findings with that of the public’s.

The goal here is to find out the distinction between the caught stealing and stolen base by uncovering the hidden variables we identified above. In doing so we may be able to make an assumption on who is more responsible for catching a baserunner stealing. Hopefully, we can do this together.

https://docs.google.com/spreadsheet/ccc?key=0AvirQLvqvWxfdDc1WFUzaEFURFZWdkdldEU4Tnc0Snc&usp=sharing

Have fun! I am putting some trust in you guys. Don’t let me down.





Max Weinstein is a baseball analyst. He has written for Fangraphs, The Hardball Times, and Beyond the Box Score. Connect with him on Twitter @MaxWeinstein21 or email him here.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Peter Jensen
10 years ago

Max- Shouldnt everyone be using the same “video software that times with great accuracy” that you are using? Could you include a link to it? Shouldn’t a few people work with the same stolen base subset that you are using in order to get an idea of the repeatability of your methodology? The volunteer viewers should probably note which pitches are pitchouts to confirm the retrosheet pitch designations.

Peter Jensen
10 years ago
Reply to  Max Weinstein

Max – Is Imovie counting 30 Frames per second? If it is I don’t think that is going to be a small enough interval to make the kind of determinations that you are trying to make. .0333 seconds is a lot in terms of basestealing. An error of one frame is often the difference between success or failure. A 90 MPH throw from the catcher will travel 4.4 feet in a single frame. Also, how are you proposing to collect the information on runner leads and running times from 1st to 2nd? How often do the cameras capture this information?