Not everyone is interested in projecting the future, but one common thread in much of modern analytics in this regard is the attempt to describe a volatile thing, such as a play in baseball, using something less volatile, such as an underlying ability. This era arguably began with Voros McCracken’s DIPS research that he released 20 years ago to a wider audience than just us usenet dorks. Voros’ thesis has been modified with new information, and people tend to say (mistakenly) that he was arguing that pitchers had no control over balls in play, but DIPS and BABIP changed how we looked at pitcher/defense interaction more than any peripheral-type of number preceding it.
One of the things I want to try to project is what types of performance lead to the so-called Three True Outcomes (home run, walk, strikeout) rather than just tallying those outcomes. For example, what type of performances lead to strikeouts? I’m not just talking about velocity and stuff, but the batter-pitcher interactions at the plate — things like a pitcher’s contact percentage, which for pitchers with 100 batters faced in consecutive years from 2002 has a similar or greater r^2 to itself (0.53) than either walk rate (0.26) or strikeout rate (0.51) does. Contact rate alone has an r^2 of 0.37 when comparing it to the future strikeout rate.
As it turns out, you can explain actual strikeout rate from this synthetic estimate quite accurately, with an r^2 in the low 0.8 range.

Statcast era data works slightly better; the version of zSO which has that data is at 0.84, and the one that predates Statcast data is at 0.80. Cross-validating using repeated random subsampling (our data is limited, as there’s no “other” MLB to compare it to) yields the same results.
Like the various x measures in Statcast, these numbers shouldn’t be taken as projections in themselves. While zSO projects future strikeout rate slightly more accurately than the actual rate itself does, a mixture of both gets a better r^2 (0.59 for the sample outlined above) than either does on its own. Looking at zSO alone as a useful leading indicator, however, gives us an idea of which players may be outperforming or underperforming their strikeout rates so far this season. All numbers are through Wednesday night.
Read the rest of this entry »