A Long-Needed Update on Reliability
It’s been over a year now since Sean Dolinar and I published our article(s) on reliability and uncertainty in baseball stats. When we wrote that, we had the intention of running reliability numbers for even more statistics, including pitching statistics, of which we had included none.
That didn’t happen. So a little while ago, when I was practicing honing my Python skills by rewriting our code in, well, Python (it was originally in R), I figured, “Hey, why not go back and do this for a bunch more stats?” That did happen. Sean was/is swamped making the site infinitely better, though, so I was on my own rewriting the code.
In case you need a refresher, never read our original article, and/or don’t want to now, here’s a quick description of reliability and uncertainty: reliability is a coefficient between 0 and 1 that gives a sense of the consistency of a statistic. A higher reliability means that there’s less uncertainty in the measurement. Reliability will go up with a larger sample size, so the reliability for strikeout rate after 100 plate appearances is going to be much lower than the reliability for strikeout rate at 600. Reliability also changes depending on which stat is being measured. Since strikeout rate is obviously a more talent-based stat than hit-by-pitch rate (well, maybe not for everybody), the reliability is going to be higher for strikeouts given two identical samples. You can think of it like strikeouts “stabilize” quicker than hit-by-pitches.
Reliability can be used to regress a player’s stats to the mean and then to create error bars around that, giving a confidence interval of the player’s true talent. To continue with the strikeout example, I’ll add another point — namely that, the more plate appearances a player has recorded, the closer the estimate of his true talent will be to the strikeout rate he’s running at the time. In fact, strikeout rate is so reliable that, after a full season’s worth of plate appearances, a player’s strikeout rate will probably be almost exactly reflective of his true talent. The same cannot be said for many other stats, like line drive rate, which is mostly random; the reliability for LD% never gets very high, even after a full season’s worth of batted balls.