Accepting Randomness

Most of the conversations about the Dan Haren trade boil down to how a person feels about pitcher evaluation. There are clearly still a lot of people that simply believe that whatever happens is the pitcher’s responsibility, so if he gives up a bunch of hits and some home runs, he’s doing something wrong and that should be held against him. High BABIP or HR/FB rates are evidence of throwing too many hittable pitches, or that his stuff has deteriorated, or that his command isn’t as good as it was, or some other explanation that we haven’t yet figured out. But, whatever it is, it’s definitely something, and it’s definitely real.

These opinions are generally held because of the outright refusal to accept randomness. The idea that something could happen repeatedly, without cause, is very hard to for a lot of people to swallow. But it’s true, and it’s a very important concept to buy into when trying to project the future performance of baseball players. Random happens.

For instance, did you know that the NFC has won 14 consecutive coin-tosses in the Super Bowl? Since 1997, the AFC has been on the losing side of the flip every single time. The odds of that happening are 1 in 16,384, and yet, it’s happened. Do you think the NFL is weighting coins? Do you think the AFC is perpetually hiring players who are terrible at guessing coin flips? Or do you think it’s just luck?

I’d imagine that most of us agree that it’s the latter. Because a coin has no ability to control what side it lands on, we are willing to agree that the results of what happens when it is flipped is random. However, as a culture, we don’t like to apply that same belief to people. They can make choices, adapt, and do things that affect the outcomes they are involved in, and so many of us assume that nothing that happens to a person is ever random.

Haren’s BABIP has been abnormally high in four of the last five months, dating back to last September. For many people, that’s enough to say that there’s a pattern that rules out any kind of randomness, and that the fact that he’s been giving up hits for what amounts to 2/3 of a season is evidence enough that he’s doing something wrong. However, when you look at the actual odds of that happening by random chance to some pitcher in MLB, you’ll find that it’s not unusual at all.

Using binomial distribution, we can see that the odds of a pitcher with a true talent level BABIP of .300 randomly posting a .350+ BABIP in any given month (of 115 BIP) is about 10 percent. Thus, the odds of that same pitcher posting a .350+ BABIP in any four out of five months is 1 in 2,200. Those seem like really long odds (though nothing compared to the Super Bowl coin, of course) until you remember just how many different five month stretches of pitching there are in Major League Baseball, especially once you introduce selective endpoints, where the time-frame is defined by looking for the beginnings of a potential pattern.

Given the number of potential different five month stretches we could look at across 350 pitchers using selective endpoints, it’s not a surprise at all that we can find a guy who has performed in a way that looks to be a rarity. The sheer quantity of players in the game, and the amount of games they play, means that we will always see performances that had little chance of happening. On its own, it is not evidence that randomness can be ruled out.

Maybe Haren is doing something wrong. Maybe there is a reason for all these no-hitters. Maybe there’s an explanation for Brady Anderson’s 1996 season. We don’t know enough to conclusively say in any of these cases, but neither can you rule out that it may just be randomness at work. If you’re not willing to accept that, you’re going to see a lot of patterns where they don’t exist, and create explanations for things where there are none.





Dave is the Managing Editor of FanGraphs.

212 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Kevin
13 years ago

This could be a good introductory post for those who are just getting into sabermetrics, or those who choose to ignore them.