The Enigma: My Journey Through Statistical Artifacts in Pursuit of Hot Streaks

A warning up top: This article is about seeking and not finding, about the unique ways that data can mislead you. The hero doesn’t win in the end – unless the hero is stochastic randomness and I’m the villain, but I don’t like that telling of the tale. It all started with an innocuous question: Can we tell which types of hitters are streaky?
I approached this question in an article about Michael Harris II’s rampage through July and August. I took a cursory look at it and set it aside for future investigation after not finding any obvious effects right away. To delve more deeply, I had to come up with a definition of streakiness to test, and so I set about doing so.
My chosen method was to look at 20-game stretches to determine hot and cold streaks, then look at performance in the following 20 games to see which types of players were more prone to “stay hot” or “stay cold.” I started throwing out definitions and samples: 2021-2024, minimum 400 plate appearances on the season as a whole, overlapping sampling (so check games 1-20 vs. 21-40, 2-21 vs. 22-41, and so on), wOBA as my relevant offensive statistic, 50 points of wOBA deviation against seasonal average to convey hot or cold, 40-PA minimum per 20-game set to avoid weird pinch-hitting anomalies, throw out games with no plate appearances to skip defensive replacements — the list goes on and on. Read the rest of this entry »






