April Hitting Stats Mean Nothing… Except When They Kinda Do
As part of my exhausting shtick, I like to respond “April!” to questions in my chats involving player performances in the season’s early going. This is effective shorthand when someone wants to know if, say, George Springer is a bust because he’s put up a .480 OPS in his first two weeks in the majors. It’s also dead wrong. April stats, in their proper context, are meaningful.
“But Dan, a few weeks of baseball is a tiny sample!” That’s correct, but you have to take into consideration the underlying reasons projections can prove to be inaccurate. It’s not just that things change, though they do — pitcher X learns a sweet knuckle-curve or batter Y realizes that not hitting everything into the ground might be good — it’s that it’s challenging to gauge where players stand in the first place. Players’ stats themselves aren’t even perfect at this. Tim Anderson hit .322 in 2020, but that doesn’t actually mean his mean batting average projection should have been .322. We don’t actually know if a theoretical player was “truly” a .322 hitter, a .312 hitter who got lucky, a very unlucky .342 hitter, or a .252 hitter who made a deal with a supernatural or extraterrestrial entity. A .300 hitter isn’t observed, they’re inferred.
The way most, if not all, in-season projections (or any projections, really) function is by applying what we call Bayesian inference. We won’t get into a full-blown math class, but in essence, it simply means that we update our hypotheses to take new data into account. And for players, data comes in all the time: every pitch or swing of the bat is new information about a player. It’s valuable information, too, as only the last handful of seasons have much predictive value and recent performance is the most useful. Read the rest of this entry »