Why We Feel How We Feel About Clutch

September 15, 2015

Apologies for walking on trodden ground. None of what’s below is new. Many of you already know everything in here, but I feel like this is a good opportunity to review why our position is our position. I’ll do my best to keep this simple and short. Just like all the world’s best analysis!

Over the last little while, I’ve written a few things about Clutch. The specific stat might be difficult to explain to the average fan, but the idea is a basic one. Teams with high Clutch scores have had really good timing. Teams with low Clutch scores have had really bad timing. Timing is important! This explains a lot of the difference we see between actual wins and BaseRuns wins, which you can just think of as “expected wins.” This year, the five most clutch teams in baseball so far have beaten their BaseRuns win total by a combined 45. The five least clutch teams in baseball so far have fallen short of their BaseRuns win total by a combined 43. It’s hugely important, and this isn’t a one-year phenomenon.

I have BaseRuns information going back to 2002, so let’s plot team Clutch score and the difference between actual wins and expected wins. This’ll cover the completed seasons, between 2002 – 2014.

It’s clearly a strong relationship. Clutch doesn’t explain everything, but it explains an awful lot, and that’s just intuitive. Of course the teams that do the best at the right times will be more successful. If they do better than they usually do at the right times, they’ll look like an over-achiever. That’s how you can get a team to win more games than you’d think just based on the overall statistics.

Clutch can turn a mediocre team into a playoff team. Clutch can also turn a would-be playoff team into a mediocre team. Because it’s so important, it stands to reason teams would try to emphasize clutch performance, if they could. They’d try to gather clutch performers. There’s not a single analyst in the world who doubts the significance of clutch events. But that isn’t the problem. Let me show you some more information. We’ve got team batting Clutch, team starter Clutch, and team reliever Clutch. I decided to look at the window from between 2000 – 2014, splitting seasons by first and second halves. Here’s how batting Clutch has carried over, half to half:

An r-squared of literally 0.00. You can find numbers that aren’t 0 if you go to more decimals, but that doesn’t accomplish anything. It’s a nothing relationship. Here’s how starter Clutch has carried over, half to half:

An r-squared of literally 0.00. You get the point already, but let’s move on to reliever Clutch:

An r-squared of literally 0.00. So this won’t surprise you — putting it all together for team Clutch:

An r-squared of literally 0.00. No observed relationship. No observed hint of a relationship. The only relationship here is the one between Clutch and total randomness, and that’s not a relationship for anyone to rely upon. It would be one thing if there were no relationship between Clutch in Year 1 and Year 2 (and there isn’t). But this is looking at the same teams, within seasons. Even clutch teams are only temporarily clutch. Sometimes they remain clutch, but no less often do they do the opposite.

This is why I’m more down on, say, the Twins than other people might be. Not that it matters at this point, with the season almost over, but there’s trusting the Twins’ record, and there’s trusting the Twins’ other, underlying numbers. The underlying numbers have proven more trustworthy. If you want to argue a certain team is innately clutch or unclutch, that’s fine. Make the argument. You might very well be right. Just, understand what the argument is up against. Understand how hard it’ll be to convince someone of legitimate clutchness. The argument against this stuff is strong, and it’s tough to doubt a 0.00 r-squared. Factors you might think lend themselves to better or worse performance in clutch situations — there’s nothing convincing in the recent history. And teams would have a lot to gain from harnessing this.

The position is one against clutchness, because that’s what all the evidence points to. It’s not coming out of stubbornness. It’s not coming out of closed-mindedness. It’s just that nothing else has been sufficiently convincing. And, this is important — analysts would probably love to be wrong! It would be amazing if real clutchness could be proven. That would be a breakthrough, and if you’re just some person doing research, good research could get you hired by a club. It would be greatly significant if one could demonstrate reasons for under- or over-achieving, ahead of time. I think we’d all love to read that article. It would change the way we see the game.

That just isn’t where we are today. Today, it looks like near or total randomness. So it gets treated as such, and though the actual wins matter more than the expected wins do, as far as the World Series is concerned, one should understand why sometimes analysts think the expected wins are more meaningful, analytically. We’re always trying to drown out the noise.

The 2015 NL Wild Card Game: A Singular Baseball Event

FanGraphs After Dark Chat – 9/15/15

Jeff made Lookout Landing a thing, but he does not still write there about the Mariners. He does write here, sometimes about the Mariners, but usually not.

101 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Cowboy Sweet N' Nasty

9 years ago

I object to characterizing a team as “bad” or “lucky” because they are outperforming their BaseRuns. What you can say is that outperforming their BaseRuns in this way is not sustainable based on the historical data we have. But that’s not how it’s framed. It’s framed as “Team X is a bad or mediocre team that has gotten lucky and will suck going forward.” My favorite thing all the analyst here like to say is that the Dodgers are the best team in the NL and maybe baseball based on their BaseRuns.

Fangraphs loves to just cite BaseRuns and declare the conversation over.

-35

Jeff SullivanMember since 2022

9 years ago

Reply to Cowboy Sweet N' Nasty

If that were true this post wouldn’t exist

BipMember since 2016

9 years ago

Reply to Cowboy Sweet N' Nasty

I like the parallel to (a)theism: believing in clutch is like believing in a god. Not believing in it is like being an aclutchist. That can be interpreted one of two ways:

1. Clutch has never been demonstrated to be a repeatable skill.
2. Clutch is not real and teams that have a good clutch score are lucky.

1 is the equivalent to “I don’t believe any gods”. 2 is “there are no gods.” Position number 1 is absolutely supported by stats. So if a team outperforms their Baseruns, you can say “there’s no reason to think they will continue to do this”. You don’t know they’re lucky, but right now luck is probably the best explanation.

Cowboy Sweet N' Nasty

9 years ago

Reply to Bip

“So if a team outperforms their Baseruns, you can say there’s no reason to think they will continue to do this'”

Yes, I agree with this. That is not what I’m arguing. I’m arguing the framing of how a team has performed in the past solely by what their BaseRuns is.

Costanza

9 years ago

Reply to Cowboy Sweet N' Nasty

Can you provide an example of what you’re arguing against? I think we are thinking of similar articles, but my understanding is that it’s usually used to evaluate a context-neutral previous performance of a team in order to look forward.

That is, its not used to denigrate a performance that happened, but to strip out the elements that aren’t shown to be repeatable in an effort to estimate true talent level. That estimate can then be used to project moving forward.

Rational Fan

9 years ago

Reply to Bip

“1. Clutch has never been demonstrated to be a repeatable skill.”

This just isn’t true; when it is repeated, we call them an outlier and lucky to be clutch for the duration of their career.

Two use one baseball example – for his career, Joe Crede had a .748 OPS – he also had a .716 OPS with the bases empty.

With RISP, Crede had a career OPS of .792, with 2 outs and RISP .780, with the bases loaded .902. In the postseason he’s a career .949 OPS player. Crede has stated himself that he felt more zoned in and comfortable in big situations; this was supported by his actual production.

He was tied with David Ortiz game winning RBI over a 6 year span and clearly performed well beyond his career norms when the situation was escalated.

On the bell curve, we just consider Crede an outlier because we can’t explain his improvement in play when the situation was bigger. This to me has always been a cop out.

Josh

9 years ago

Reply to Rational Fan

Outliers are an expected and inherent aspect of statistics. While looking at vast amounts of data, each point containing information from a single player’s single season, you see no relationship between at all between clutch over time.

However, you should expect to see individual outliers that are DO seem to consistently outperform or underperform in clutch situations. Does that mean the statistical relationship is untrue? NO! It just means there’s a statistical anomaly. If you flip a coin ten times for a long enough, you will eventually get heads all ten times. Because this happened in one instance out of many does not mean that the coin flips are no longer random.

Honestly, I think there’s a misunderstanding of how statistical analyses are actually concluding, what outliers are, and how an outlier does nothing to disprove the underlying relationships.

Rational Fan

9 years ago

Reply to Bip

I agree that entire teams obviously can’t be “clutch” but to think the human aspect of sports doesn’t allow a player to be better or worse in big moments is naive. Regardless of profession, there will always be people who perform better and worse under pressure. To discredit it as an outlier is lazy in my opinion.

Costanza

9 years ago

Reply to Rational Fan

> Regardless of profession, there will always be people who perform better and worse under pressure.
That sentence is talking about a normal distribution of pressure performers in a population. An alternate explanation is players who cannot perform under pressure wash out before they hit MLB.

If you understood why your Joe Crede argument doesn’t make sense, you will have gone a long ways towards a better understanding of the application of baseball statistics. Nowhere in this article will you find an argument that explicitly says Joe Crede didn’t perform better in “clutch” situations. This of it this way: if the distribution of “clutch” ability were perfectly random, would you predict to see the same number of Joe Crede level outliers as you actually saw?

If you want to prove Joe Crede was actually more clutch you have to use different evidence, because randomness explains the existence of Joe Crede really well.

This is why you cannot prove a case with a single data point!

To criticize something based on a lack of understanding of distributions and basic statistics is lazy in my opinion.

-1

Rational Fan

9 years ago

Reply to Rational Fan

It’s certainly not a lack of understanding. It’s differing views on the randomness of human nature.

You believe if you’re not clutch you’ll wash out before the big leagues but I disagree. There aren’t many high pressure situations that make or break your recruit status and your draft stock.

As I noted, i chose to not discard the fact that players like Crede state that they feel more comfortable and concentrate more in high pressure situations. Some pitchers perform better in high leverage situations and on and on. I understand that there will be outliers in statistics, I just am of the camp that thinks there are reasons to some of the randomness.

Pressure impacts people in different ways. All pressure is not created equal.

Just because it isn’t predictive based on a larger dataset does not mean there’s no reason for it.

-1

Neil

9 years ago

Reply to Rational Fan

“There aren’t many high pressure situations that make or break your recruit status and your draft stock.”

Well, this is demonstrably untrue. Plenty of players make or break their careers based on how they do during that small handful of games when pro scouts are present. And the pressure that they’re then under in the minors is immense. 90% wash-out somewhere in the minors.

“Just because it isn’t predictive based on a larger dataset does not mean there’s no reason for it.”

Sure. But it does mean that there’s no evidence of a reason OTHER than randomness. The default assumption should be that he’s right, not that you’re right.

Costanza

9 years ago

Reply to Rational Fan

>Just because it isn’t predictive based on a larger dataset does not mean there’s no reason for it.

No, but it does mean that you haven’t presented valid evidence to support your assertion.

You’re free to believe whatever you like. But do not state your opinion as fact and then attempt to back your opinion up with invalid statistical evidence.

Rational Fan

9 years ago

Reply to Rational Fan

There’s nothing invalid about the statistical evidence I provided.

Said player says early on in his career that he feels better when pressure is escalated – whether that be RISP, or just runners on base in general.

Said player goes on to greatly outperform his career averages in high pressure situations.

You view that as a singular sample that simply is an outlier, and not predictive – I disagree.

Someone who is clutch, and responds well to pressure, very well may be an out lier but it doesn’t mean it’s not predictive. You don’t need historical statistical evidence to make a claim or belief valid.

There are scouts who have shown an innate ability to read pressure ability – you would view them as “lucky,” simply picking the right outlier and getting it right… I would say they have an ability to determine the reaction of a player under pressure.

It’s no different than pitchers who struggle to close but can dominate the 8th inning – there is still pressure in the 8th inning, but the 8th is comforting for that pitcher. He may feel the game is not in his hands as it is in the 9th.

My entire problem with the pure-statistics community is if you can’t explain it, and can’t predict it, then it’s just an outlier which occurs in all factors of statistics. That’s a cop-out.

These are human beings; not computers. To think pressure and situations don’t get to them – for better or worse – is simply naive.

I just provided you with 10 years of data from one players career that shows he performed better when the pressure was heightened.

You feel the reason is randomness for Crede’s success; I feel you are wrong, based on evidence to the contrary that extends beyond the actual statistical output.

To the gentleman who said one or two games make or break a draft position – this isn’t 1980 pal. These guys are scouted hundreds of times; they’re the best players in their high school and the game comes easy to them. Pressure isn’t heightened until both the situation becomes difficult and important; the situations don’t become difficult until proball for many of these players. Many succeed more than they fail the majority of their baseball playing life.

-1

RichW

9 years ago

Reply to Rational Fan

Define pressure. I’m confident that the made up definition of pressure (or leverage) in baseball (late innings, risp, 2 out etc.) does not guarantee that each player feels the same amount of pressure regardless of the base/out state and inning. Early in the season a marginal player starting because of injury may feel extreme pressure during every PA because he knows that he still has options left. An established player in a slump may stick with his proven approach no matter what the results because he is confident that things will even out over time. They don’t experience the same pressure

Cowboy Sweet N' Nasty

9 years ago

Reply to Cowboy Sweet N' Nasty

Another way to look at, is that to “legitimate” in Fangraphs’ writers’ eyes you need to be right at your BaseRuns or you are a fluke one way or another.

Right now, there are exactly two teams performing in line with their BaseRuns: Yankees and Tigers. There are 9 other teams with a differential in their BaseRuns and actual record by two wins or less.

But there are 14 teams with a differential of 5 or greater wins between their BaseRuns and actual record. Three of those teams have the best records, respectively, in baseball. So what we have here is Fangraphs saying the three best teams in baseball records wise are all flukes to some degree.

While these teams may be outperforming their expected W/L, I don’t think it’s fair to cast this cloud of gloom around them with BaseRuns.

That article on the Cardinals, for one, failed to mention that by BaseRuns they still have an expected .551 winning percentage, which still has them in the top eschelon of teams. Instead, that article was framed to make the team look like a giant pile of luck that was not worthy of being called one of the best teams in baseball.

As everyone knows, framing is everything in an article. And the misuse of statistics or the omission of others annoys me.

-3

Jeff SullivanMember since 2022

9 years ago

Reply to Cowboy Sweet N' Nasty

“Yet, interestingly, we can also consider OPS differential. By that measure, the Cardinals rank fifth in baseball, between the Pirates and the Yankees.”

baseballfan123

9 years ago

Reply to Jeff Sullivan

I bet if you did the clutch r-square by team in the last 5 years the Cardinals would come up on top

Brian L

9 years ago

Reply to Cowboy Sweet N' Nasty

You’re not wrong, framing is just subjective.

So since we’re airing our opinions, I think Fangraphs frames it appropriately and that Dave did so as well in the Cards article.

BipMember since 2016

9 years ago

Reply to Cowboy Sweet N' Nasty

I mean, there is a distinct possibility that luck just has that great an effect on baseball games, and no one could ever possibly devise a measure of team performance which half the teams don’t deviate from by 5 games or more. So it’s not necessarily true that these teams are flukes, but at the same time, their record is impacted by something other than how good they are. What is the recommended way to say that?

Famous Mortimer

9 years ago

Reply to Cowboy Sweet N' Nasty

I would suggest the most troubling aspect of all this is “our estimates are fine, everything else is clutch and randomness”. How close were last season’s BaseRuns projections to what actually happened? If they’re close enough, I’d suggest the “and the rest is randomness” argument is strong; if not, it might be worth looking at how those projections are happened upon.

-2

Dan M

9 years ago

Reply to Cowboy Sweet N' Nasty

I think the problem here is all of the FG staff and most of the community have an understanding that everything is a bell curve. We don’t expect everyone will perform at base runs, that’s just the most likely outcome. Everything on either side is a bit less likely. Here’s the crux though: performing at expectation isn’t actually particularly likely, per se. Seeing only 2 teams of 30 exactly match that expected value is totally fine. Nobody here is doubting base runs because of that. We can see that a couple are bang on, most are close, and a few are way off on the sides of the curve. It’d be weird to expect it to look much different in a game where no matter the matchup, one team really never has any better than about a 70% chance of a win. Usually it’s damn close to 50/50. There’s going to be a lot of noise and uncertainty. That doesn’t render the predictions invalid – they’re based on stats that normalize much better than wins and losses.

9 years ago

Reply to Cowboy Sweet N' Nasty

…Did you not read the post?

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG