I Voted for Justin Verlander

I submitted my American League Cy Young Award ballot at the very beginning of October. The results were just released yesterday, almost smack in the middle of November. A funny thing happens between the beginning of October and the middle of November: A lot of time passes, time that includes the entirety of the MLB playoffs. As I focused on other events, I mostly forgot about my selections. I was reminded yesterday that my own ballot read:

  1. Justin Verlander
  2. Blake Snell
  3. Gerrit Cole
  4. Blake Treinen
  5. Corey Kluber

I was one of 13 voters to put Verlander in first. The other 17 voters, though, put Snell in first, and as such, Snell won, and Verlander was, once again, the runner-up. Clearly, it was a close race, and I think it should have been a close race. I don’t think that Verlander got robbed, and I don’t think that Snell is an undeserving winner or anything. But one of the perks of being an award voter is that voting grants you automatic editorial content. So on the off chance you care about my own thought process, allow me to quickly explain why Verlander was my first-place pick.

For so many observers, this race was going to revolve around two statistics. Verlander threw 214 innings, or 33.1 more innings than Snell. Yet Snell finished with an impressive 1.89 ERA, more than a half-run better than Verlander. Both Snell and Verlander were charged with just three unearned runs apiece, so if you want to explain why Snell wound up the winner, you could say this: In those 33.1 extra innings, Verlander allowed 22 extra runs. That works out to an average of almost six runs per nine innings. Should that really be a point in Verlander’s favor? That’s replacement-level pitching. That might even be worse than replacement-level pitching.

There’s no question that Snell finished with excellent results. He timed them well, too — his ERA in the second half was 1.17. Verlander’s was 2.95. As the award spotlight got brighter, Snell turned in quality start after quality start. But the first half matters, too. The first half is quite a bit longer than the second half. And I do give Verlander credit for simply pitching more often. Compared to Snell, Verlander pitched to 133 more batters. That’s good for the bullpen, and that’s good for roster management. Pitchers who eat innings can have a positive cascading effect. But I don’t want to go into every single detail. I doubt you want me to go into every single detail, either. I prefer to just present the core of my argument. To what extent was Snell responsible for his results?

This table provides a simple overview:

General Comparison
Pitcher GS IP ERA- FIP- xFIP-
Justin Verlander 34 214.0 62 67 72
Blake Snell 31 180.2 46 72 75

Verlander started more games than Snell. He threw more innings than Snell. He finished with a better park-adjusted FIP than Snell, and he finished with a better park-adjusted xFIP than Snell. If the park-adjusted ERAs matched up with the peripherals, Verlander presumably would’ve won the Cy Young running away. But Snell’s lead in the ERA- column is enormous. Too enormous to ignore, it turns out.

How did Snell end up with such a sparkling ERA? He wound up with a BABIP of .241. But it goes even beyond that. For both pitchers in question, check out their wOBA splits:

wOBA Comparison
Pitcher Overall Bases Empty Runner(s) On RISP
Justin Verlander 0.259 0.273 0.230 0.246
Blake Snell 0.246 0.270 0.203 0.175

Essentially the same with the bases empty. But with the bases not empty, Snell got more outs. And with runners in scoring position, Snell blew Verlander away. That’s a margin, in the last column, of 71 points. Those are higher-risk plate appearances, so any pitcher would look better if he’s pitching his best with runners on second and/or third.

You know there’s a “but,” though. Snell would deserve credit, indeed, if he pitched better with runners on base. But with runners on, Snell’s strikeout rate dropped, and his walk rate got higher. Verlander struck out more batters when the bases weren’t empty. And for Snell, with the bases empty, he allowed a BABIP of .292. With runners on, that dropped to .161, and with runners in scoring position, it dropped to .118. If we’re dealing with anything here, we’re dealing with a question of contact quality. And that question always sends me to Baseball Savant, so I can check out what Statcast has to say.

The previous table showed you their wOBA splits. In this table, I want to show you their xwOBA splits. I know that xwOBA is still considered kind of experimental, and I wouldn’t want to base awards voting on xwOBA alone, but why shouldn’t we consider the actual batted balls these pitchers allowed? We’re always trying to strip away the effects of defense and luck. Let’s do some stripping:

xwOBA Comparison
Pitcher Overall Bases Empty Runner(s) On RISP
Justin Verlander 0.237 0.239 0.232 0.227
Blake Snell 0.272 0.273 0.269 0.248
SOURCE: Baseball Savant

Overall, Verlander had the lower xwOBA. With the bases empty, Verlander had the lower xwOBA. With the bases not empty, Verlander had the lower xwOBA. With runners in scoring position, Verlander had the lower xwOBA. And these margins aren’t even all that small. Again, I know xwOBA isn’t perfect, and I know pitchers sometimes try to pitch to their ballparks or defenses or whatnot, but I went looking for a reason to believe Snell really was blowing Verlander away under more pressure-packed situations. By looking at xwOBA, I could find zero evidence. xwOBA indicates that Verlander was, if anything, the better pitcher than Snell, and he threw a good deal more innings than Snell. I don’t worship at the altar of 34 starts, not in this era of baseball, but Verlander was durable and terrific. With Snell, I couldn’t get myself over the top.

There were 121 pitchers who faced at least 500 batters. Snell had the fourth-most positive difference between actual wOBA and expected wOBA. There were 83 pitchers who faced at least 250 batters with runners on. Snell had by far the most positive difference between actual wOBA and expected wOBA. And there were 137 pitchers who faced at least 100 batters with runners in scoring position. Snell had the second-most positive difference between actual wOBA and expected wOBA. It’s very possible, if not probable, that’s not giving Snell enough credit. Maybe he was in some way able to pitch to his defenders. The name of the game, after all, is run prevention. It’s also very possible xwOBA underrates the extent to which Snell suppressed the best contact. I’m not upset that Snell wound up the winner. I just think you need convincing evidence if you’re going to vote for the guy who pitched less often. The evidence, as I see it, doesn’t convince me. Snell was outstanding, but his ERA had some help. Help that Verlander didn’t get.

Realistically, this was always going to be a two-pitcher race, so you’ll be less interested in how I filled out the rest of my ballot after the top two spots. I don’t want to go too deep on those choices, either, because ultimately they don’t matter very much. Gerrit Cole was incredible. Had I weighted xwOBA a little stronger, relative to actual wOBA, I could’ve made an argument for moving Cole in front of Snell, too. But I didn’t want to ignore Snell’s actual results entirely. As good as Corey Kluber was, I docked him for pitching a bit worse with runners on, and with runners in scoring position. I included a closer in Blake Treinen, and the league’s two outstanding relievers were Treinen and Edwin Diaz, but Treinen faced more batters. Also, Diaz never went more than 1.1 innings, while Treinen recorded at least five outs on 14 different occasions. On a per-batter basis, Diaz was probably better by a hair, but Treinen had more on his shoulders, especially over the first few months. Diaz probably would’ve wound up sixth or seventh on a deeper ballot.

At last, while it would’ve been nice to include Chris Sale — he was amazing — he threw 158 innings over 27 starts. He threw all of 17 innings in August and September combined. It’s one thing if a pitcher is expected to throw something like 158 innings, because of a preexisting plan. But the Red Sox weren’t prepared for Sale’s absence. He put the team in a bind (even though it was running away with the division). Treinen might’ve been a closer, but at least the A’s always had him available. Sale was basically missing for almost a third of the season. I couldn’t bring myself to write him in anywhere. Five spots isn’t that many.

That’s what I’ve got to tell you. If you were at all curious about my explanation, that sums it up. I voted for Justin Verlander over Blake Snell. The rest of the voters, collectively, voted for Blake Snell over Justin Verlander. It was pretty close, all things considered, and no other pitchers really factored into the race. I’m open to the suggestion that I might’ve made too much of xwOBA. It does still kind of remain in its infancy. But all I needed was a sign that Snell in some way deserved those minuscule BABIPs. The more I looked, the more I believed in Verlander’s case. It was a good case, but congratulations to Blake Snell, anyway, on a magnificent season.

3 years ago

Snell’s numbers are backed up by peripherals that are almost certainly not sustainable going forward. But the Cy Young award is for success in 2018, not for how sustainable that success is going to be for 2019 and beyond. Therefore, I think Snell was the right choice

3 years ago
Reply to  Dmjn53

I don’t think I buy the argument that peripherals are purely predictive. To me, they’re a clearer description of what a pitcher actually did, as opposed to what happened after he let go of the ball.

3 years ago
Reply to  Jeremy

But the two are connected: The reason why it is predictive is *because* it is a much better descriptor of what the pitcher actually did.

If you have two measurements:
1) One of which is narrow but filters out all unwanted noise,
2) And a second one that is broad but lets in a ton of unwanted noise

The first one often tends to perform better in terms of, well, everything–and it’s because the first one usually captures the construct of interest better. This case is no exception, and it’s demonstrated by its predictive validity.

3 years ago
Reply to  Dmjn53

All you are doing is giving Snell all of the credit for run prevention, when that credit should go to him, his defense, and in this case, a huge amount of luck.

3 years ago
Reply to  paperlions

I think that’s a bit dismissive.

Would it be fair to say that “all you are doing” is giving Verlander all the credit for things that could have, might have, and perhaps even should have – but didn’t – happen?

3 years ago
Reply to  DBA455

No, you’re giving credit to Verlander for things you can actually measure.

Snell had a good year, but Jeff presents a really compelling case why a lot of Snell’s credit should go to his defense while Verlander was just awesome and struck people out. paperlions’ comment is justified by Jeff’s work here.

Jetsy Extrano
3 years ago
Reply to  paperlions

I’d love to know the luck vs. defense split. I’m actually happy to give the pitcher credit for the year’s luck, in an award like this. But defense belongs to other people.

My wild guess is this is more luck than defense. But we could calculate it for real, with some assumptions…

3 years ago
Reply to  Dmjn53

Can we please retire this straw man? When people use stats like xwOBA or even FIP for Cy Young considerations, rarely are they looking ahead to the future or thinking about “sustainability.” The objective is simply to credit the pitcher for only his own contributions to run prevention, and not those of his teammates or any other outside influences.

3 years ago
Reply to  Wilmerrr

Some day, people will be using nothing but spin rate and pitch location to vote for Cy Young. Shortly thereafter, the award will be retired.

3 years ago
Reply to  evo34

The PitchFX award

3 years ago
Reply to  evo34

Awards are so much better when we vote on them because of storylines and other nonsense. AMIRIGHT??

3 years ago
Reply to  Wilmerrr


3 years ago
Reply to  Dmjn53

This really misses the whole point of sabermetrics.

Peripherals are not supposed to just be predictive. The reason they happen to be more predictive is because they do a better job describing what already happened.