Can Matt Cain Sustain His Low HR/FB Rate?

Any time a general theory that applies to most people is advanced, people naturally begin to look for the outliers, and they often use the examples at the ends of the spectrum to cast validity on the theory. Or, they just dismiss the theory as not being applicable to that specific case, which may or may not be true. We see this quite a bit with metrics like xFIP and Matt Cain, who has become the poster child for the part of our readership who thinks that stat isn’t worth all that much. For years, Cain’s ERA has been better than his xFIP would suggest, largely because he has sustained one of the lowest HR/FB rates in all of baseball.

The low HR/FB rate was brought up again yesterday in a reasoned post over at PaapFly. As is often stated by the Cain-is-better-than-xFIP-says crowd, the author noted that Cain has thrown 1,100 innings in the big leagues now, and that should be a large enough sample to conclude that this is a legitimate skill that he can carry forward.

Just for fun, I decided to look back at the data that has been collected over the last nine years. We’re starting to get large enough samples now where we can find other pitchers who have had similar stretches of home run prevention for 1,000+ innings, and still have observed performance in seasons after their run of keeping the ball in the park.

Below are 10 pitchers who, from 2002 to 2007, had the lowest HR/FB rates in baseball, who have thrown a similar number of innings to Cain, and have thrown at least 100 total innings in the last three seasons. The first section is their 2002-2007 IP and HR/FB rate, with the second section being their 2008-2010 IP and HR/FB rate.

Pedro Martinez: 981 IP, 8.0% HR/FB – 154 IP, 14.2% HR/FB
Roy Oswalt: 1,272 IP, 8.3% HR/FB – 602 IP, 10.4% HR/RB
John Lackey: 1,162 IP, 8.5% HR/FB – 555 IP, 10.5% HR/FB
CC Sabathia: 1,226 IP, 8.5% HR/FB – 721 IP, 8.2% HR/FB
Brad Penny: 1,041 IP, 8.7% HR/FB – 324 IP, 10.5% HR/FB
Jarrod Washburn: 1,121 IP, 8.7% HR/FB – 330 IP, 9.3% HR/FB
Barry Zito: 1,320 IP, 8.8% HR/FB – 571 IP, 7.9% HR/FB
Miguel Batista: 1,051 IP, 8.8% HR/FB – 269 IP, 11.7% HR/FB
Dontrelle Willis: 1,022 IP, 8.9% HR/FB – 123 IP, 11.5% HR/FB
Kevin Millwood: 1,160 IP, 9.1% HR/FB – 558 IP, 10.6% HR/FB

Group: 11,351 IP, 8.6% HR/FB – 4,202 IP, 9.9% HR/FB

The league average HR/FB rate is usually around 10.6%. As a group, the ten best big time home run suppressors from 2002 to 2007 were only marginally better than average at that same skill from 2008 to 2010. Sabathia and Zito bucked the trend and actually lowered their HR/FB rates over the last three seasons, so it’s certainly possible that Cain could continue to post low HR/FB rates going forward. After all, he does pitch in a pretty good pitcher’s park and his career HR/FB rate is better than any of the pitchers in this sample, so maybe there is something to David Pinto’s theory about how his fastball moves.

You could have made a similar argument about almost everyone on the above list, though, and as a group, they didn’t demonstrate that there was really much of a sustainable skillset there. Just for fun, I also looked at the guys who had the highest HR/FB rates from 2002 to 2007 and had thrown similar numbers of innings in both samples. Their rate dropped from 12.2% in the first period to 11.4% in the second period – still higher than average, and higher than the low HR/FB group, but only by one percentage point, much smaller than the gap between their observed rates from 2002 to 2007.

Is there some skill to allowing long fly outs? Maybe. But if you can identify which pitchers are likely to keep their home run rates low while giving up a lot of fly balls before they actually do it, then you could make a lot of money in player forecasting. History suggests that we can’t simply look at guys with 1,000+ innings of home run prevention and assume they’ll keep chugging along. It just doesn’t work that way.





Dave is the Managing Editor of FanGraphs.

168 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Dan
11 years ago

Good info. One issue I have with your group of pitchers is that they were probably all in their collective primes and then entered a typical decline phase after that. Miguel Batista? Dontrelle Willis? Millwood? Washburn…Martinez…even Oswalt to some degree. These guys are not the same pitchers they were in the 5 year stretch. So how much does their HR/FB rate have to do with their declines and not their law of averages regressing to the norm?

Zach Kolodin
11 years ago
Reply to  Dan

Word.

Kyle Hmember
11 years ago
Reply to  Dan

Completely Agree. To that extent include Penny, Zito, and Lackey to pitchers that haven’t been the same as they were from 2002-2007. Not to mention that you are taking data from a much lower sample size after dismissing the larger sample size for not being big enough. Great article though

Brad Johnson
11 years ago
Reply to  Dave Cameron

A good point, but several of those regressions coincide nicely with injuries or changes in skill set (reduced velocity for example). That group as a whole isn’t passing the smell test for significance.

DrBGiantsfan
11 years ago
Reply to  Dave Cameron

It is certainly possible that the drops in HR/FB are the cause of the reduced overall effectiveness of the pitcher, but if the pitcher has sustained a certain rate for 5 years and suddenly “regresses”, it is much more likely that the “regression” is due to a change in the way he pitches than to chance, because the chances of sustaining “luck” through a 5 year sample size are extremely small.

DrBGiantsfan
11 years ago
Reply to  Dave Cameron

Give it up Dave. What you need to do is just accept the statistically obvious fact that a 5 year sample size for a starting pitcher is large enough that a large variation is highly unlikely to be due to chance.

DrBGiantsfan
11 years ago
Reply to  Dave Cameron

Nope Dave, I haven’t ever met a lottery winner, and it’s highly, highly unlikely that I ever will.

Brad Johnson
11 years ago
Reply to  Dave Cameron

Dave,

I think you identified the most reasonable sample. The problem is that it still doesn’t smell right. Either more advanced testing needs to be done with this data set or we need to adjust the specifications of the experiment.

Until something statistically significant is found, as analysts we should stick to the null hypothesis (Cain does not possess unique skills i.e. HR/FB). Perhaps he already has passed the level of significance, I haven’t been paying special attention to him so I could have easily missed that analysis. If I was to throw a wild guess out there, I’d expect him to be only 1.5-2.5 SD from the norm.

Obo
11 years ago
Reply to  Dave Cameron

Wrong, statistical significance gives ZERO information on the magnitude of significance. Suppose Matt Cain has a slight skill in reducing his HR/FB rate (slight as in he is 0.00001% better than average). Then a sample size of 1 billion FB will certainly show Matt Cain has a statistically significant ability to prevent HRs on FBs. His skill is statistically significant and does not play a role in his performance. Big Pharm pulls that crap all the time and it is bad math/bad science.

bcp33bosox
11 years ago
Reply to  Dave Cameron

I have literally met 3 lottery winners (that I know of) and two of them are brother and sister and they both won over a million USD seperately and I believe within a few weeks/months of each other.

Steven Ellingson
11 years ago
Reply to  Dave Cameron

Obo,

Agreed. A confidence interval would be a lot better in this case, and in most cases.

JP
11 years ago
Reply to  Dave Cameron

So, real quick regression results using data on all qualifying starting pitchers who qualified for the ERA title between from 2007 to 2010. Looking at the relationship between HRFB this year and HRFB last year (so using one year of lagged data). This yields 153 usable observations (so not 1 billion, don’t worry).

lag hrfb +0.1692 significant at 0.05
controlling for ip, k9, bb9 (adding team effects to get at the home park just swamps the small sample with dummies)

This suggests that there is a year-to-year correlation between HRFB for pitchers, at least those good enough to qualify for the ERA title.

As a further check, I split the sample into those with above mean HRFB (> 9.567 in this sample) and those below. The relationship is there for those pitchers with very low HRFB percentages (coefficient is 0.1619 and still significant at 0.05) but not for pitchers closer to the MLB average (coefficient is effectively zero).

So, based on these data it looks like there is a year-to-year correlation for HRFB, but only for those with exceptionally low HRFB numbers. This is likely due either to (a) home park effects or (b) a skill. This sample is too small to figure that out, plus I have to get back to work.

TomG
11 years ago
Reply to  Dave Cameron

You make a good point about whether the fall-off is due to rising HR rate or vice versa. For me the the larger problem is that these pitchers may have had low overall HR rates, but none of them have the consistently low rate that Cain does. If you plot their HR rates year to year you will see a lot of steep slopes up and down, compared to Cain’s flat line. In fact this group was destined for a random future hr% as their low previous hr% were all based on one or two fluke seasons in the period mentioned.

The other piece is that other than Dontrelle he is by far the youngest pitcher, still coming in to his “peak” years. Dontrelle’s rising HR rate does make an interesting point, do pitchers struggling to find the plate see a sharp increase in HR’s as they groove fastballs?

In either case I think we can enjoy seeing Matt keep on doing his thing, baseball is all about beating the odds. Between Cain and Torres the Giants have two of the most problematic players to project.

Jason B
11 years ago
Reply to  Dave Cameron

“plus I have to get back to work.”

Now why would you want to go and do a thing like that?!?

Patrick
11 years ago
Reply to  Dave Cameron

DrBGiantsfan,

No, you should give it up. You can’t back up what you’re saying with statistical evidence that’s as strong as what Dave is putting against it – Which isn’t ironclad by any means, but it’s REAL evidence – but you insist it must be true.

You have a sample size of one pitcher and no explanation for why he’s doing what he does. Yet you INSIST and INSIST it MUST BE THIS WAY.

Is there ANY statistical examination that, if the result were against, would persuade you that Matt Cain does not have this skill?
Or he is just unique and we don’t know why?

And this isn’t even REMOTELY CLOSE to as rare as winning the lottery. Dave found 10 other guys who did it almost as well for the same length of time… And they, on average, didn’t keep it up. Why should we ASSUME that Cain is different?

The burden of proof is on you to show why Matt Cain is different. Whether or not you realize it, that’s where it lies.

TomG
11 years ago
Reply to  Dave Cameron

Patrick, this group did not do it almost as well, and they did in fact continue to be better than average. Statistically Cain’s numbers are much further from the mean, and rather than show that it is chance this study shows there is some predictive value in this result.

In fact if Cain’s rate rises as much as this group’s did he will still be better than anyone in this group (Except for Pedro) was during the first 5 year period. And significantly better than the group as a whole.

obsessivegiantscompulsive
11 years ago
Reply to  Dave Cameron

Patrick, clearly you have not read Paapfly’s or Baseball Musing’s articles regarding Cain. Both makes persuasive arguments both to the existence of Cain’s skill and how he is doing it, else you would not have wrote what you wrote.

In addition, TangoTiger in his blog calcuated that it would take roughly 6-7 seasons worth of results for a starting pitchers to statistically significantly prove that his low BABIP is a skill. Cain is there now. His above average popfly rates is clearly a strong explanation for his apparent ability to prevent home runs on flyballs. His rising fastball gives a baseball explanation to this ability.

From my perspective, just because Dave cannot find anyone in the past 8 years who matches Cain does not prove that Cain does not have this ability. 8 years is not really that long a time in baseball. As I pointed out above, it takes that long just for a pitcher to exhibit that ability at a statistically significant sample size. And as the other commenter noted, many of the increases were because of age and injury issues.

For a proper comparison, you need to find comparable pitchers who started their careers around age 20-21, like Cain. Comparing him with players who are in their 30’s for any part of that comparison period is, well, comparing apples with a PC, they are two different items altogether. I’m surprised nobody focused on this yet, though I haven’t read below yet, so maybe later…

Also, I think the most important point is whether Cain exhibits the ability to reduce his HR/FB, period. Consistency over X number of years is not important, any more than having his BABIP consistent over the same number of years, it is all fluctuations around his mean talent. What is most important is that he has enough “sampling” now to say that he has shown this ability over the years to keep him HR/FB lower than others. It is now up to people to disprove that by explaining away this anomaly.

Park factors has been disproven – see Paapfly for that and other disproven factors. When you look over the numbers Paapfly put forth, the fact is that Cain has not only kept his HR/FB down, he has pretty consistently been signficantly under. And Musing’s rising fastball analysis provides a strong explanation why this has been happening. Perhaps people can start with showing why they don’t believe Baseball Musing’s analysis is correct.

And people are also forgetting that it is not just HR/FB that is abnormally low, his BABIP has been as well. Again, he has proven he can do this by doing it over enough years, it is now up to people to show us a perspective that explain that away. Nobody, in my opinion, has yet.

Jason
11 years ago
Reply to  Dave Cameron

Point is though that the conclusion you are drawing can’t be made from this data set without controling for natural performance decline. Control for that and it would be a much more compelling article.

Jason
11 years ago
Reply to  Dave Cameron

What often surprises me about Fangraphs and the sabremetric community generally is how unscientific it really is. The scientific method starts with observation, then hypothesis, then measurement and testing. The true scientist tries to prove the null of their hypothesis to standard of 99.5% confidence. A true scientist approaches their subject without bias.

So we have a hypothesis that controling HR/FB ratio isnt a controlable skill, yet we have all these pitchers that are apparently doing it. Rather than conclude that the hypothesis is wrong the conclusion is formed that these pitchers are just lucky?

This conclusion is all the more indefensible when contrasted with direct observation and qualitative data from experts in the industry that says these players are very good.

The longer the “matt cain is not that good” meme continues to be perpetuated on this site it only serves to undermine the credibility of your methodology and aproach.

DrBGiantsfan
11 years ago
Reply to  Dave Cameron

I agree with Obo above about Big Pharm, but we’re not talking about Big Pharm here. We’re talking about Matt Cain who has beaten the odds by a big margin for 5 years in a row now. A recent article on hardballtimes.com put the odds of Matt Cain’s performance being due to chance in any give year at 12%. That means that the probability of it being due to chance for 5 years in a row is 0.12 to the 5’th power. That, my friends, is an extremely low number!

Of course it is possible that Matt Cain’s performance over the last 5 years is due to chance, yes there are lottery winners, but the probability that it is not due to chance is approximately 1,000,000 times greater.

Jason B
11 years ago
Reply to  Dave Cameron

“Of course it is possible that Matt Cain’s performance over the last 5 years is due to chance, … but the probability that it is not due to chance is approximately 1,000,000 times greater.”

Made up stat alert! (Although you did manage to work in an “approximately” to help cover a little.) =)

DrBGiantsfan
11 years ago
Reply to  Dave Cameron

Jason,

If you flip a coin once, and it comes up tails, it is likely due to chance because there is a 50% chance one coin flip will come up tails. If you flip it 5 times and it comes up tails 5 times, you start to think this is a bit strange, but yeah, it happens. At some point, if you come up tails every single time long enough, you conclude that you are probably flipping a loaded coin and it’s probably not due to chance at all.

Now, let’s say you roll and octagonal die and it comes up, say, #1. You don’t have to roll that octagon very many times with every result being #1 to conclude that you probably have a loaded die.

The point is that if the random probability of Matt Cain’s variance from the mean for a give season is 12%(about 1/8), then after 5 years of reproducing that result, it becomes highly probable that Matt Cain is rolling a loaded die.

Jason
11 years ago
Reply to  Dave Cameron

Dr. B,

Agree – but that conclusion can only be drawn if you already have settled laws of physics. I don’t accept the basic premise that pitchers don’t influence how hard contact is. BABIP is not just luck. HR/FB ratio is not luck. Matt Cain is good.

DrBGiantsfan
11 years ago
Reply to  Dave Cameron

Jason,

So we agree! Matt Cain is good! You had me confused there. Maybe I just misunderstood what you were trying to say. Sorry if that was the case.

Jason
11 years ago
Reply to  Dave Cameron

Dr. B,

We definitely agree. What I was trying to say is that the “science” on BAPIP & HR/FB ratio is highly questionable, and its irresponsible to draw conclusions unfavorable to Matt Cain without questioning the science. Particularly when other data points like direct observation and actual historical outcomes support the opposite conclusion.

dutchbrowncoat
11 years ago
Reply to  Dave Cameron

there is a balance between “babip is not just luck” and “matt cain is good” and i think that people (especially the two energetic giants fans doing most of the posting) need to remember that.

also, dave never drew any “unfavorable conclusions”. he doesn’t seem to think cain can keep it up (at least at this level) but he dedicated a full paragraph to showing that cain could do it.

and jason, i would counter with the opposite. i think it is more irresponsible to use a clear outlier to call into question the integrity of the theories behind babip and hr/fb. it is likely that cain has had some control over these factors, but it is also very likely that he was additionally quite lucky.

Jason
11 years ago
Reply to  Dave Cameron

Dutch – I’m not using Matt Cain to call into question the theory. I’m using years of direct observation of the game. Have you ever watched BP at a game? There are a lot of hits. When Joe Blanton (insert any run of the mill 4th starter type) takes the mound there are still a lot of hits but fewer than during BP. When Matt Cain takes the mound their are fewer hits still. Matt Cain is not just luckier than the coach throwing BP or Joe Blanton – he is better.

It makes no sense to say that a pitcher can control swing & miss %, GB%, but not HR/FB%. They all measure the same thing – how hard is it to make solid contact against this pitcher. Just becuase Matt Cain induces his outs by getting hitters to make weak contact on the lower half of the ball he should be regarded as a lesser pitcher than the ones that get their outs with contact on the top half of the ball?

DrBGiantsfan
11 years ago
Reply to  Dave Cameron

Dutchbrowncoat,

Nobody here is questioning the integrity of the theory. All we are saying is that no theory explains absolutely everything and it is very likely that we have an outlier in Matt Cain. That’s being a lot more faithful to the integrity of sabermetric theory than stubbornly trying to say that statistical outcome with a large sample size that is so improbable it is starting to approach zero probability is, in fact, due to chance.

dutchbrowncoat
11 years ago
Reply to  Dave Cameron

@ jason –
i actually agree with what you just said. you can look for yourself, i have actually said similar elsewhere in the comments. i was commenting directly on your statement:
“What I was trying to say is that the “science” on BAPIP & HR/FB ratio is highly questionable, and its irresponsible to draw conclusions unfavorable to Matt Cain without questioning the science. ”
@dr b –
i did not (and won’t) write off cain as entirely chance. and maybe i am missing something, but i dont see anybody else who has. to me, dave’s conclusion was that cain could keep it up (sabathia, zito, pinto’s theory) but it was not likely he could maintain it. as i said elsewhere, i will freely concede that cain can limit hr/fb by means of location, movement, ‘stuff’, whatever. i would peg him at a true talent level of 8%-9%. but i still think he has gotten lucky – 4-5 seasons of varying 1-2% is far more probable than 4-5 seasons of 3-4% variance. should that luck falter some as dave suggested, that 7% career mark would rise a little.

DrBGiantsfan
11 years ago
Reply to  Dave Cameron

DBC,

You and I have a different interpretation of what Dave is saying in this article. I have yet to see Dave himself say that Cain’s performance over the last 5 years even MIGHT be due to skill.

I guess it doesn’t matter though because when Dave went back and corrected his math errors he found a very different result that he has not directly acknowledged yet.

dutchbrowncoat
11 years ago
Reply to  Dave Cameron

dr b –

that is very possible, and is likely because we are approaching this from different angles. you are in a pro cain camp and i am giving dave the benefit of the doubt on all this. but he does say:

“…it’s certainly possible that Cain could continue to post low HR/FB rates going forward. After all, he does pitch in a pretty good pitcher’s park and his career HR/FB rate is better than any of the pitchers in this sample, so maybe there is something to David Pinto’s theory about how his fastball moves.”
“Is there some skill to allowing long fly outs? Maybe… History suggests that we can’t simply look at guys with 1,000+ innings of home run prevention and assume they’ll keep chugging along. It just doesn’t work that way.”

maybe you think i am being too kind to dave, but i think he is just saying that he pegs cain for some regression in the hr/fb area. whether he is regressing towards a rate of 10% or 8.5% is a different question. but as all of the comments picking apart dave’s methods can show you, proving this solidly one way or another is very very difficult to do with the information available to us.

chuckb
11 years ago
Reply to  Dave Cameron

obsessivegiantscompulsive —

you said:

“just because Dave cannot find anyone in the past 8 years who matches Cain does not prove that Cain does not have this ability.”

and it’s absolutely right.

However, just because Dave can’t prove that Cain doesn’t have this ability does not prove that he does either. Dave’s point was that the evidence “proving” Cain’s ability to outpitch his xFIP is wholly insufficient in that it’s not really evidence at all. You cannot say that the lack of evidence disproving something is, in and of itself, prove of that thing. That’s simply faulty logic. His point is that we don’t know and he’s right. The jury’s still out.

Poor Nunomember
11 years ago
Reply to  Dan

Something else worth note: of the pitchers Dave listed in the article, I would only characterize 3 of them as flyball pitchers: Barry Zito, Jarrod Washburn, and Matt Cain. All of them have sustained low HR/FB rates without regression. The rest were groundballers or roughly even. Maybe their rising HR/FB rates were due to them getting slightly lucky earlier due to a smaller sample (less flyballs than Zito, Wash, and Cain).

AJS
11 years ago
Reply to  Poor Nuno

Great point.

Nathaniel Dawson
11 years ago
Reply to  Dan

Dan, within the parameters of the study, it’s going to be hard to find good comparables to Cain that aren’t in their decline years. Cain started pitching regularly in the big leagues at the age of 21. You’re just not going to find a whole lot of other pitchers that have pitched regularly for five seasons and aren’t in their 30’s.

I’d also wonder how much of a factor home park plays in this. If a pitcher plays in a park that suppresses HR/FB, that would tend to keep his rate lower in the future. If he switched parks, he wouldn’t get that same benefit. We do see what appears to be some regression to the mean with these pitchers, but they are still lower than league rates, so it suggests some degree of ability there. The park they’re in might have a hand in keeping their rates below league norms.