BABIP is a really hard thing to predict for pitchers. There have been plenty of attempts, sure, but nothing all that conclusive — probably because pitchers have a negligible amount of control over it. So naturally, when I found something that I thought might be able to model and estimate pitcher BABIP to a high degree of accuracy, I was very excited.
My original idea was to figure out the BABIP — as well as other batted ball stats — of individual pitches from details about the pitch itself. Velocity, movement, sequencing, and a multitude of other factors that are within the pitcher’s control play into the likelihood that a pitch will fall for a hit (even if to a very small degree). But much more than all of those, pitch location seems to be the most important factor (as well as one of the easiest to measure).
I got impressively meaningful results by plotting BABIP, GB%, FB%, wOBA on batted balls, and other stats based on horizontal and vertical location of the pitch. So I came up with models to find the probability that any batted ball would fall for a hit with the only inputs being the horizontal and vertical location (the models worked very well). I even gave different pitch types different models, since there were differences between, for example, fastballs and breaking balls. I found the “expected” BABIP of each of each pitcher’s pitches, and then I found the average of all of those expected BABIPs — theoretically, this should be the BABIP that the pitcher should have allowed.