Rates are obviously important to sabermetrics, particularly when discussing player skill. That’s why we don’t just look at a player’s raw numbers like total hits, walks, or home runs. That’s why batting average, and later, on-base percentage and slugging became popular. If we want to break things down more precisely to examine specific skills, we can look at things like walks, strikeouts and home runs per plate appearance. That works pretty well, and depending on how careful you want to be, at a certain point practicality outweighs precision. But what if you are really trying to look at a player’s skills carefully, is the good ol’ plate appearance really always the right denominator?
The answer to the rhetorical question is “no,” as you might suspect. In a way (loosely) analogous to different stats becoming reliable at different sample sizes, different skills need different denominators. Fortunately for a statistically-limited person such as myself, the process of deriving those denominators is much simpler than figuring out sample stability! Of course, I’m not smart enough to come up with this stuff myself. This sort of analysis was (as far as I know) originally applied to baseball by the legendary Voros McCracken and extended by Tom Tango and others. I originally read about it a few years on the dearly-departed Statistically Speaking blog in posts by Brian Cartwright (the brains behind the OLIVER projections) and Russell Carleton (a.k.a. Pizza Cutter).
For the sake of this post, we’ll work from the perspective of the hitter (although it could go either way). Although not all would agree, I will leave out sacrifice hits (SH), and of course, reached on errors since they introduce their own difficulties of context and/or scoring. We we also leave out intentional walks because those are also determined by situation and the decision of the pitching team. The method we use to go through and figure out the “right denominators” is called the binomial method. At the name implies, that means that we break down the plate appearance (given the exceptions noted above regarded sacrifices and intentional walks, for the sake of this post, a plate appearance will mean AB + uBB [unintentional walks] + HBP) such that at each “step,” two possibilities are available.
[Side note on rates and ratios: on FanGraphs and most other stats sites, most stats are expressed as “rates,” and these are the easiest to understand in many ways. However, analysts like Tango prefer that these sorts of things be expressed as ratios. Our own Matt Swartz, on the other hand, says that the rates versus ratio issue ultimately is one of aesthetic preference.
It was my understanding that there would be no math. Rather than get sidetracked in this debate, I will simply put both expressions down below.]
[Author’s note, August 22, 2011: Originally, I excluded sacrifices flies (SF) from this analysis, but it bothered me, so I’ve put them back in where relevant.]
At the beginning of a plate appearance (AB + uBB + HBP+SF), before we get to either a play ending on contact, a walk, or a strikeout, the batter might get hit. What is this rate or ratio?
Rate: HBP/(AB+uBB+HBP+SF), Ratio: HBP/(AB+uBB+SF)
The next step is a bit trickier. It is obvious that we should separate “contact events” before “non-contact events,” but should walks or strikeouts come next? We could express this as a multinomial, but that would violate the relative simplicity we are going for here. Most versions of this I’ve seen put walks “before” strikeouts, and I agree. If you’re looking for specific grounds for this ordering, perhaps the fact that some strikeouts end with contact (foul tips) would be the reason.
Rate: uBB/(AB+uBB+SF), Ratio: uBB/(AB+SF)
Rate: K/(AB+SF), Ratio: K/(AB-K+SF)
What if the hitter ends the plate appearance on contact (other than foul tips, which aren’t separated out in the official statistics)? Does it stay in the park or not? We need to separate out home runs. What is the appropriate rate or ratio for home runs?
Rate: HR/(AB-K+SF), Ratio: HR/(AB-K-HR+SF)
[Note that the official statistics as given on the player pages don’t separate out inside-the-park homers; given that information, one would want to include those as balls in play.]
So if a ball is in play, how often is it a hit or an out? This is just good-ol’ BABIP.
Rate: (H-HR)/(AB-K-HR+SF), Ratio: (H-HR)/(AB-K-HR-H+SF)
Now we get to types of hits in play (excluding inside-the-park homers given the information we are working with). Once a player gets to first base, he can turn it into an extra base hit or not.
Rate: (2B+3B)/(H-HR), Ratio: (2B+3B)/(H-2B-3B-HR)
Of the hits where the player can reach second, does he go to third?
Rate: 3B/(2B+3B), Ratio: 3B/2B
There, in simplified form, you have it. Each “step” of the plate appearance is broken down in binomial form to determine what the “right” denominator for the skill really is. I would guess this is how most of the more sophisticated projection systems analyze these events. This doesn’t mean the player pages should be revised to conform with this sort of thing. As I alluded to at the beginning of this post, there is a trade -off between precision and practicality. Indeed, Marcel projections stand up pretty well to the big boys by PA as the universal denominator and regressing every skill the same amount. But, if you want to be more precise…
My thanks to Matt Swartz and Tom Tango for discussions on this topic, although neither should be held responsible for my errors.
Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.