There’s Something About This Year’s Hitters

July 27, 2017

Wednesday night, Justin Smoak hit another home run. It was a rather big home run, a dramatic ninth-inning home run, but as far as these purposes are concerned, all that matters is that a home run happened, nevermind the context. A couple months ago, it looked like Smoak could be breaking out, at last. He hasn’t slowed down even the tiniest bit. He’s been one of the more dangerous hitters in all of baseball, and for as much as the Blue Jays’ season has gone down the toilet, Smoak’s made for a great story. His career WAR is 3.4. His 2017 WAR is 3.1.

I’ve been thinking about Smoak a lot. But then, there are also other dots to connect. A story that’s similar to Smoak’s is that of Logan Morrison. In the AL West, Yonder Alonso has turned himself into an offensive weapon. There’s also been the unexpected breakout of Marwin Gonzalez, and while I don’t want to just go down a list name by name, there have been other big surges, and also a number of shocking collapses. Carlos Gonzalez has fallen apart. Jonathan Lucroy, too. Names and more names and more names.

It feels like hitters have been particularly unpredictable. But there could be a strong element of recency bias — I remember this year’s studies the best, and examinations tend to focus on the biggest surprises. So I tried to dig into the numbers. Turns out it’s more than just a hunch.

I don’t want to bore you with a whole bunch of methodological details. So I’ll try to be quick! I looked at how things have gone since 2002. I chose that season as the beginning because that’s as far back as I can access using our splits leaderboard. I focused my attention on year-to-year wRC+, to see how it’s held up. I set a minimum of 250 plate appearances, and for the second season in every sample, I included data only through July 25, to stay consistent with what we have for 2017.

That’s confusing. Let me try again. For every full season, between 2002 – 2016, I gathered data for players with at least 250 PA. Then I looked at how those same players hit in the following seasons, between 2003 – 2017, with the same PA minimum, and only using numbers through July 25. Do you get it? I hope so. Here is the overall plot, with a sample exceeding 2,800:

Here now is the plot for just 2016 – 2017:

I don’t know what, if anything, might jump out. In the second plot, you get a smaller slope. That’s indicative of something, but to go in a slightly different direction, perhaps you’re more accustomed to looking at r-squared values. Let’s do that. For the sample from 2016 – 2017, you get an r-squared of 0.08. Seems pretty small, right? But we don’t have any frame of reference. Let’s establish that part. Here are all of the r-squared values from the past decade and a half:

For the entire sample, the r-squared value is 0.21. The very highest mark is 0.32, from the first pair of seasons. The previous low was 0.14, for 2010 – 2011. This time around, again, we’re at 0.08. I’m not good enough at this to tell you exactly how significant that is, but the weakest relationship has been established. It’s not just anecdotal. Offensive numbers in 2017 bear the least resemblance to the same numbers from 2016, relative to the decade and a half under the microscope. It might not be anything earth-shattering, but it does look like something has been going on.

This is something that can be further explored when the season is over. Might as well let the last two months play out, in case the numbers feel like getting around to normalizing. And, ideally, at season’s end, this kind of examination can be performed comparing numbers against projections, instead of just numbers from the previous year. Projections and previous-year numbers will tend to look pretty similar, but the projections are always the recommended baseline. I didn’t use them here because I don’t have ready access to preseason projections from the past several years. I know they’re out there, though, and so this study could and should be repeated in November or December. It’ll be interesting to find out whether this season’s hitters have really been the least predictable. It’s something that could be a fluke, and it’s something that could be more than a fluke.

Take that drop in r-squared for 2010 -2011. The next year, the relationship bounced back. It wasn’t the start of a trend. But that year was that year, and this year is this year. Every year is different, and as you attempt to imagine explanations for why this might be a real thing, you’re probably drawn to the idea that there’s so much information available out there. Hitters are changing in response to Statcast data. Pitchers are adjusting, themselves, in response to similar information. It feels, at least in theory, like changes could be happening faster than ever, in either direction. Players can be more precise about identifying their strengths and weaknesses, so there’s less guesswork. Hitters have a better idea how they might improve. Pitchers have a better idea how hitters might be attacked. It’s informational warfare, to an extent never before experienced. It’s all kind of a grand theory, and nothing else, but it at least feels like it could be true. It’s a start.

Maybe it’s nothing. Maybe this won’t hold up at the end of the year; maybe this won’t hold up when the numbers are compared to the preseason projections. I don’t know, because the season is only two-thirds over, and I put this all together in just a few hours. This’ll eventually be examined in much greater depth, I assume. For now, just know, if you feel like it’s been a strange season for hitters, there’s a reason behind that feeling.

The Yu Darvish Conundrum

Welcome Back, Brandon Morrow

Jeff made Lookout Landing a thing, but he does not still write there about the Mariners. He does write here, sometimes about the Mariners, but usually not.

29 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

OddBall Herrera

7 years ago

Re: Smoak and Alonso, I would say that the Fangraphs articles about their hard hit % and FB rate from before the season started proves that this was at least to some extent seen coming 🙂

OddBall Herrera

7 years ago

Reply to OddBall Herrera

To make a note serious point along the same vein, it would seem logical that an influx of data would increase volatility in the short term, because adopters are going to use it to identify and address weaknesses, resulting in sudden changes in profile & performance. Also, adopters are going to start exploiting non-adopters. When pitchers start going up in the zone, all those guys stuck on upper cut swings will bringing the proverbial knife to the gun fight.

One would expect this to flatten out as the best use of things like statcast becomes baked in, and as people unable to, for example, make mechanical changes to implement what the numbers are telling you, get squeezed out.

LHPSU

7 years ago

Reply to OddBall Herrera

Perhaps patient hitters would fare better, since they have an easier time laying off the high pitch?

kenai kings

7 years ago

Reply to OddBall Herrera

interesting.
can I retranslater? players adjust and those that don’t are killed off.

RonnieDobbs

7 years ago

Reply to OddBall Herrera

I don’t really think it works like that. Baseball has and will always be 100% about making adjustments – its not like this is a new phenomenon. Most certainly, non-baseball people are learning about adjustments and angles but I don’t think Statcast data is changing much for players. If making adjustments were simple, then everyone would be an all-star, but of course it is a matter of who is better at making adjustments and who doesn’t have to make as many.

IMO, this data-revolution is just the materialization of live baseballs. Guys would have figured it out really quickly without reports I am certain. The idea that the game is evolving which is driven by exciting new technology is one take on it… it is certainly marketable. What a great way to draw in new fans. Promote the narrative that fans are part of or at least witnessing a revolution. Doesn’t everyone want that? Personally I think it is pretty arrogant and short-sighted to think that past players were not adjusting based on outcomes and data. Statscast just tells us what we already know but it goes by a different name, like sprint speed.

When we talk about adjusters, these are literally the same guys who were playing in the stone-age, 3 or 4 years ago. They are likely doing exactly what they always did, but changing balls changes outcomes – or you can spin it as an altered process, but I don’t really buy that. IMO, the volatility is the changing baseballs if that wasn’t clear. Some guys are taking advantage and some aren’t. When more bad swings go for the ultimate outcome, that is going to change the data in itself dramatically. I don’t know how much NFL crossover there is around here, but it is kind of like passing and receiving stats these days. Its hardly the same game it used to be – so many rules changed so quickly. I have no idea what normal looks like anymore in that context. When the environment changes, everything changes with it. Of course, the way you sell it is bigger, faster, smarter, stronger…

kenai kings

7 years ago

Reply to RonnieDobbs

this is the truth of it.

Let MLB go down the road of technology with NFL,NHL etc.. if we must. My only hope is that baseball beyond MLB doesn’t hitch a ride.

Aaron Judge's Gavel

7 years ago

Reply to RonnieDobbs

Yes although what Oddball hypothesized seems like a reasonable explanation, this seems, well, more reasonable. Plus let’s not forget that teams had access to HITf/x data since 2008: “HITf/x data measures the speed and direction of each batted ball throughout its trajectory in the PITCHf/x camera frames, which cover roughly the area between home plate and the pitcher’s mound. The reported speed is the average speed over this distance, which will be slightly lower than the initial speed off the bat due to the drag force.”

So if the ball changed mid-2015, maybe we wouldn’t expect a rash of adjustments in preparation for the 2016 season since nobody realized this was a ‘thing’ yet. One half season of data could easily just be an outlier. Then 2016 happened, and now after a year and a half, teams and players figured out there was something there. Now in 2017, we have Zack Cozart, Scooter Gennett, and Marwin Gonzalez all slugging comfortably above .500.

I think Ronnie has it right here: “When more bad swings go for the ultimate outcome, that is going to change the data in itself dramatically.” I think if we ask the question: if the ball changed in mid-2012 (when HITf/x was available), would we see the same volatility in the hitter data once we get to 2014? Heck, even if HITf/x wasn’t available, teams (and players) would be able to see that more fly balls are leaving the yard, so we’d generally expect more guys to try to hit more balls in the air.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG