A Few Different Ways To Look at The Steroid Era, Graphically

August 6, 2013

Trying to point out where the steroid era begins and ends, using data, is not as easy as you might think. While there are substances you can take to increase muscle mass, and that has direct consequences on athleticism, there just isn’t a clear moment where one the numbers pinpoint the beginnings of steroid usage in Major League Baseball. That might be because the pitchers were on them too, or that the steroid era reaches further back than we suppose, or continues more into today’s game than we prefer to think.

Let’s start with something simple. Home runs per balls in play back to 2000, noting that 2005 was when the CBA language introducing testing was agreed upon:

So the steroid agreement totally nailed it. Helped reduce home runs off balls in play by 0.5%. *Does the cleaning up the game dance.*

Less facetiously: this doesn’t seem like a big difference in homer rates. Physicist Alan Nathan penned a paper in 2009 that suggested that a 10% increase in muscle mass could beget a 30-70% increase in home runs per ball in play, by way of increased bat speed. But even if the increase had been on the smaller side, say 30%, we could have expected home runs per balls in play to reduce to about 3.1%. Which we didn’t see.

But not all of baseball was doing steroids. And there were pitchers, doing it too. And where on your body you put that 10% of muscle mass seems very important. And if you zoom out a bit, it gets problematic. Here’s the same graph back to 1994:

Oh. Yeah. Now it looks like there’s no real difference between the beginning of this era and the end. Fluctuations as much as .5% must be random, considering the ebb and flow of this graph.

Why cut it off in 1994? Ken Caminiti admitted to doing steroids, and his career started in 1987. Let’s pull this back to 1980:

What happened in 1993? Did they disperse needles after the season was over? More likely, it was baseball’s expansion that year (the Rockies and the Marlins) that diluted the pitching pool some. You see a little temporary boost in 1998 when Arizona and Tampa were added to the mix too. And add in the Coors Field effect.

But the Steelers were doing steroids in the 1970s… let’s just do the post-war-time HR/BiP graph, and annotate it with important dates, including rule changes and expansions:

If the zooming in and out didn’t do it for you, perhaps the annotated full version may convince you that this is a very complicated issue. Almost every big change in home run rate has a big change in baseball associated with it — if not two. In 1969, the mound was lowered (good for offense) and two teams were added (good for offense), and the strike zone was changed as well, to eliminate the high strike between the armpits and the shoulders. All of these changes pushed home runs on balls in play from 2.8% to 3.05%.

If there is a suspicious place on this graph, it’s in the mid-eighties. Here’s a table:

Year	HR/BiP
1982	2.68%
1983	2.65%
1984	2.63%
1985	2.92%
1986	3.15%
1987	3.67%
1988	2.60%

If we didn’t know anything about Jose Canseco, maybe you could call it random. But we know there were steroids in baseball in the mid-80s, we know that home runs per balls in play took a lurch forward then, too, and we know that this lurch forward was stymied for a while by a 1988 change in the strike zone, and we know that baseball has largely returned to the homer rate that we settled on in 1987.

But before we say that 1987 was when baseball revealed what would it look like in the Steroid Era, we have to at least consider the other possibility. 1987 was the year of the Rabbit Ball as it was commonly called. Larry Granillo wrote a great piece chronicling the players’ comments about the ball, and linking to prominent writers of the day surmising that there were other reasons. Other reasons like pitchers of the late eighties were yuppies missing work ethic. Seriously.

Jay Jaffe, in his book Extra Innings, pointed out that baseball switched ball manufacturers in 1977 and that balls of different era haven’t appeared similar in constitution. You’ll find 1977’s power peak annotated in the graph above as an expansion year, but it seems we could add something about the seams there too.

Hitting a baseball is a complicated mix of athleticism and learned skills. While it might be folly to say that steroids have no effect on the results of that endeavor, it’s also nearly impossible to pinpoint what steroids have meant to baseball’s most hallowed number. The home run rate has changed — drastically at times — over the history of baseball. It’s just that the rules, strike zone, and even the ball has changed during that time, too. Which effect is to blame for 1987’s surge? Maybe the yuppies.

[My definition of balls in play for the purposes of this article was: BiP = PA – (K+BB+HBP). I used this definition because it included HRs, and I used BiP to try and control for fluctuations in the number of balls put in play.]

76 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

walt526

11 years ago

Please either set the vertical axis to 0% or provide confidence intervals. These “trends” are very likely just random chance.

Eno SarrisMember since 2020

Reply to walt526

I can’t tell if you are kidding, but I play with that very idea throughout the article. And there are no confidence intervals. These are observed phenomenon, and there’s no way of knowing how many homers were not homers or vice versa. Nor would it matter on the final scale.

Clark Kent

Reply to Eno Sarris

You have a data set with a reported average, and a whole bunch of data which deviates from that reported average. It’s not a question of whether homers are homers or not, but a question of homoscedasticity.

Brian

Reply to Clark Kent

Allow me!

Homoscedasticity: the property of having equal statistical variances

http://www.merriam-webster.com/dictionary/homoscedasticity

Ah so not error bars or what I would think of as confidence intervals, but a better sense of the standard deviations within the data set. To perhaps see if it was driven by the top end talent or something. Interesting.

Dave Cameron's Puppy

Here you actually probably want to use standard error as you’re trying to measure the difference between the averages any given year. Standard deviation gives you an idea of the spread of the data, not how accurate your average is.

Nate

Do you have HR/BiP for each individual player in these data? If so you could bootstrap the league wide HR/BiP to obtain confidence intervals. This could also have the interesting side effect of certain players showing up more often in the extremes of the distribution, possibly suggesting steroid use.

anothernate

Reply to Nate

I was thinking the same thing. There is also Demming’s famous paper on treating a census as a sample for theoretical motivation.

Anon

Reply to anothernate

Do you have a citation on the paper? A quick check of Wikipedia was fruitless without more detail

Even the most stat minded baseball writers have a long way to go in terms of reporting data. There are never any error bars on any plots here and it drives me nuts.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG