Consistency Is Inconsistent

Few baseball terms are misused as frequently as ‘consistency’ and ‘volatility’. Much like some of the more arcane statistics that have fallen by the wayside — looking at you, W/L records and batting average — the terms are often conflated with overall performance. We thought they told us more about a player than they actually did. Pitchers perceived to be more consistent are often deemed to be more valuable than their volatile counterparts, as long as their numbers aren’t drastically different. While that perception might make sense from a logical standpoint it fails to hold up under the lens of quantification.

Consistency might keep fans and managers from reaching for the Tums jar, but it should not be used as anything other than anecdotal. It certainly should not be used as a performance marker. Why? Well, because consistency itself is inconsistent.

A pitcher consistent one year is not guaranteed to achieve a similar level of consistency the next year. My interest in consistency was rejuvenated this week when Buster Olney cited “enigmatic inconsistency” as a means of explaining why the Rockies might be willing to trade Ubaldo Jimenez. When Dave Cameron showed how elite pitchers experienced wild fluctuations in their game score, I hearkened to old studies of mine and felt it prudent to recap some of my findings on the matter.

For starters, allow me to backtrack to the assertion that consistently itself is inconsistent. That conclusion was drawn upon running an intra-class correlation on data similar to what Dave showed this week: a standard deviation based metric that measures performance variance on a game-to-game basis. An intra-class correlation is essentially a year-to-year correlation, but over a longer period of time. Instead of running four separate year-to-year correlations for a player over a five-year span, the ICC evaluates the relationship over the five years as a whole. In this case, I measured the per-pitcher relationship of the deviation metric over a five-year span, and found an ICC of just .05, well below the threshold signifying that consistency is a tangible and repeatable skill.

Further, the per-game deviation metric correlated at a coefficient no greater than .10 to any performance marker. In other words, consistency is inconsistent, and it doesn’t automatically lead to better performance. So why is the term used so much? The answer seems to lie in personal preferences. A pitcher like Jon Garland is thought of as more consistent than, say, Joel Pineiro, and that supposedly makes him more desirable to certain teams in specific positions. Understanding consistency is important since teams may make decisions based on the perception of consistency and volatility.

Those assessments, however, are based on past data that may not be on display in the current season. This speaks to the differences between game-to-game and year-to-year consistency. Some may value game-to-game consistency more than anything else, while others might not care how the end park-adjusted ERA is arrived at, as long as it is stable over a three-year span.

Think back to the 2009 offseason, when both Jon Garland and Joel Pineiro were available. A team with decent playoff odds might have gravitated toward Garland given his past consistency. A team with a high level of variance in its playoff odds, meanwhile, might find Pineiro attractive. If Pineiro performed at his lowest level of potential production, the team probably wouldn’t suffer all that much since it wasn’t expected to make the playoffs to begin with. But if his performance reaches its apex, the team might be able to sneak its way into the playoffs. The flaw in this operating mindset is that both types of pitchers might actually be equally projectable when discussing year-to-year consistency or volatility.

At the beginning of 2010, I ran a study to test that theory. The methodology involved culling together four-year spans with park-adjusted ERA for starting pitchers, and comparing the actual fourth-year mark to a Marcel-esque projection of the fourth year. The pitchers were broken up into five bins, based on the standard deviation in the first three years and the median: very volatile, volatile, volatile/consistent, consistent, very consistent. The “very” bins were compared to one another so that pitchers hovering close to the median weren’t compared as if they were innately different.

The results indicated that, whether the pitchers made 5, 10, or 20 starts, the very volatile pitchers all outperformed their projections in that fourth season.

The consistent pitchers, though posting better overall park-adjusted ERAs, either fell right in line with, or worse than, projections. Further, the very consistent pitchers ended up with a lower root mean square error: their performances were easier to project, and at a statistically significant level.

One such reason the volatile pitchers outperformed projections deals with the inherent cause-and-effect relationship. If a pitcher proved volatile in the wrong direction — really bad starts — he would be unlikely to make many more starts, and could end up missing the cutoff used in the study. On the flipside, a consistent pitcher who underperforms is likely to be given more slack based on his track record.

Overall, there is a difference in projectability amongst very consistent and very volatile pitchers, but it is far less evident in larger samples in the same season. Consistency is easier to project on a year-to-year basis than volatility, but, within the same season, it doesn’t correlate to overall performance. And the per-game consistency itself is inconsistent on a year-to-year basis. Consistency might be deemed important, but it really shouldn’t be the primary barometer to use when making decisions.

Eric is an accountant and statistical analyst from Philadelphia. He also covers the Phillies at Phillies Nation and can be found here on Twitter.

Newest Most Voted
Inline Feedbacks
View all comments
12 years ago

The middle paragraphs got pretty jargony with quite a bit of hand waving. Then shortly after you started using park adjusted ERA as a measure of consistency, and I stopped caring.

O for 2 on articles about inconsistency.

12 years ago
Reply to  Telo

Ok, constructive criticism… I think writing an article about consistency without visuals (or rigorous computation) is a mistake. For it to mean anything to us, it needs one or the other. Even Dave showed his methodology, as bad as it was. 98% of us don’t know what an ICCSKMYDK value is. And while you proceeded to make a half assed attempt explain what is it, we still don’t have any frame of reference or connection to the numbers you then provide. Those paragraphs were unconvincing.

The notion of consistency is extremely tough to define for a pitcher, let alone calculate. Should we look at park adjusted ERA on a game by game basis? I’d say no, since we know that sequencing plays a huge role in short term ERA fluctuations. Why not use FIP or some other combination of metrics that are rooted in things the pitcher can control, or at least has more control over.

Throw all of the math and jargon out the window – show me a graph over the last two years of a pitcher’s K%, BB%, FIP, SWSTR%, average FB velocity – I don’t care what exactly – but pick a few things that they have direct control over, things that make sense (then you can add ERA to that and see just how much more volatile that number is) and let’s just LOOK at it. What do we see? Does everyone kind of look the same? Who looks different? Since we have such a tough time quantifying this, let’s throw it out the numbers. That would be a nice intro piece.

I think you got caught in no mans land between mathematically rigorous and accessible to the masses, which is exactly where you don’t want to be. One or the other, because the crappy explanations of your calculations with no real data or visuals give us very little to work with.