Strong Starts Don’t Mean That Much

by Dave Cameron

April 30, 2012

Last Friday, I focused my weekly ESPN Insider column (which can also be read here on the site if you are a FanGraphs Plus subscriber) on the predictive power of a team getting off to a strong start in April. We know that at the individual level one month doesn’t mean much, but I wondered whether a dominating start to the season for an entire team might be more predictive of future success.

To do this, we looked at every team since 1974 that won at least 70 percent of their games in April (minimum 15 games), which gave us a sample of 45 teams. We then looked at how these teams performed from May through September to find out how predictive a strong team start actually was. I was pretty surprised at just how little it actually mattered.

To summarize the results, the 45 teams combined for a .743 winning percentage in April but just a .549 winning percentage from May through September. The correlation between April record and May-September record was just .24, and the r squared was just .06, meaning that you could only explain six percent of these team’s record in the final five months by their records in April.

We even broke these 45 teams into quartiles based on ratio of runs scored to runs allowed to see if a pythag method would have done any better, but the correlation was an even weaker .19. In fact, the 12 teams with the worst run differential among the .700+ April clubs performed nearly as well over the remainder of the season as the 11 teams with the best run differential. Even teams that started the year winning games by mauling their opponents regressed heavily over the rest of the season, and knowing a team’s run differential didn’t help identify which teams would sustain more of their strong start than others.

That doesn’t mean April performance is worthless, of course. The fact that these teams won 54 percent of their May-September games shows that the sample was primarily made up of playoff contenders, so we shouldn’t pretend that a strong start to the season is meaningless. As a quick-and-dirty estimate of necessary regression, last week Tom Tango suggested adding 35 wins and 35 losses to a team’s record on any given day.

To test his method against the results of these early season barnstormers, we can add 1,575 wins and 1,575 losses to the April total for these 45 teams, which would bring the total number of adjusted wins and losses to 2,340-1,839, which works out to a .560 winning percentage. That’s just slightly higher than the .549 mark actually posted by these 45 teams over the rest of their season, so Tom’s shortcut seems to work pretty well on this sample of strong starting teams.

You Aren't a FanGraphs Member

It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.

We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.

1. Ad Free viewing! We won't bug you with this ad, or any other.

2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.

3. Dark mode and Classic mode!

4. Custom player page dashboards! Choose the player cards you want, in the order you want them.

5. One-click data exports! Export our projections and leaderboards for your personal projects.

6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)

7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.

8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.

9. A weekly mailbag column, exclusively for Members.

10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!

We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

Click Here To Become a Member

Applying that 35-35 regression to the Rangers and Dodgers, who both currently stand at 16-6 to begin the year, would leave you with an expected future winning percentage of .554. This method suggests that we haven’t actually learned all that much about the Rangers, as we were already pretty sure that they were good at baseball. Their first month confirms our preseason expectations, but shouldn’t change it all that much.

For the Dodgers, it’s tempting to say that perhaps they entered the year a tad bit underrated. Rather than regressing to the mean, Matt Kemp has doubled down on his terrific 2011 season, and quality performances from Andre Ethier and their collection of high walk/low power role players (A.J. Ellis, Mark Ellis, and Jerry Hairston have all been particularly good) have pushed the Dodgers out to an early lead in the NL West. Kemp can’t keep this up all year, and the Dodgers pitchers are due for some significant BABIP regression, but the Dodgers may be a little better than they were given credit for.

We should be careful not to overreact to the results of April performances, but also understand that they do carry some meaning, especially when viewed in the right context. A great first month to the season is mostly useful for putting wins in the bank that count in the final standings, but April performance can also help us understand a small part of a team’s expected future performance. April performance isn’t gospel, nor is it worthless. It’s data, and properly regressed, it can have some predictive value.

Pitcher Aging Curves: Introduction

Dan Szymborski FanGraphs Chat – 4/30/12

Dave is the Managing Editor of FanGraphs.

35 Comments

Oldest

Newest Most Voted

Chicago Mark

14 years ago

Excellent as usual Dave. But would Kemp be doing as well if he were batting 6th or 7th? 😉
Ps. So go the next step. WE probably gave the Dodgers a little less credit than they deserved. Do you think they can now make the playoffs in the NLW? I know that’s not exactly the object of the article but your thoughts would be welcome. Take the next step!!! 🙂

-1

batpig

14 years ago

Reply to Chicago Mark

of course they can! what a silly question! they are already 10 games above .500 and have a 4-game lead in the division, that means if they play .500 ball the rest of the way they will finish 86-76, and the regression “rule of thumb” above pegs them as a better than .500 team.

I think it’s obvious they’d have to be the front runners right now, this isn’t a division that had a clear-cut favorite coming into the season.

Chicago Mark

14 years ago

Reply to batpig

You’re a genious batpig! I wasn’t asking your opinion though. I wanted to hear from Dave. That being said, I’d still peg the DBacks as the favorites. So what does DAVE think?

-9

Anon21

14 years ago

Reply to batpig

Chicago Mark, you don’t seem to “get” the Internet. You can ask specific people who write widely-read articles to answer your specific questions, but mostly they will ignore you. Then you can either scorn your fellow readers’ answers to those same questions and resign yourself to eternal monologue, or you can try not to be a dick and just engage in conversation.

Sam Samson

14 years ago

Reply to batpig

My guess is Dave thinks somebody already answered your question adequately.

Jake

14 years ago

Isn’t the point of getting off to a strong start to help your chances of making the playoffs? As a fan I’m not really too concerned with what the Dodgers or Rangers winning percentage might be for the rest of the year, but whether or not they’ll use that advantage to get to the playoffs. This article doesn’t address that at all. I’d be interested to see how many of those 45 teams made the playoffs and how many would have had they played in the current format (three division winners plus two wild cards.

Nick Lindner

14 years ago

Reply to Jake

You’re mistaken. He did, in fact, address the importance of the first month in “banking wins” to help make the playoffs.

Jake

14 years ago

Reply to Nick Lindner

I’ve now read this article three times to make sure, and I still don’t see it. The only reference to the playoffs or the post-season comes in the fifth paragraph: “The fact that these teams won 54 percent of their May-September games shows that the sample was primarily made up of playoff contenders, so we shouldn’t pretend that a strong start to the season is meaningless.”

If you’re referring to the article he linked to, then I apologize. I don’t subscribe to either site, so I haven’t read that. In either case, I think it would be more useful to expand upon that snipet right there than to say that teams are unlikely to maintain a .700 winning percentage over the remainder of the year. It’s more of an impact over correlation thing.

vivalajeter

14 years ago

Reply to Nick Lindner

Jake, it’s in the last paragraph. Dave writes “A great first month to the season is mostly useful for putting wins in the bank that count in the final standings”

bstar

14 years ago

Reply to Nick Lindner

That’s not at all what Jake was talking about.

wahooo

14 years ago

Reply to Nick Lindner

I think the “banking wins” statement is mostly wrong.

Slartibartfast

14 years ago

Yup. 22 games out of a 162 game season shouldn’t get anyone too riled up.

Andrew

14 years ago

Reply to Slartibartfast

Make sure to tell David Schoenfield.

Colin

14 years ago

Well yes the correlation is low between hot starts and true talent. However, there is still a somewhat ok positive correlation and there is an effect on the requirements of performance going forward in order to overcome the start. If the team is .554 WP true talent and they start 16-6 that has a big impact as opposed to starting 6-16 because you can only assume true talent performance going forward. They don’t compensate by over performing later.

14 years ago

This article should be mailed to Orioles fans everywhere.

Oliver

14 years ago

Reply to BX

Yeah! With a note that says, “Don’t enjoy the success your club is having after all those years in the wilderness.”

(Or we could just let them enjoy it for a bit)

abreutime

14 years ago

Reply to Oliver

No one uses mail anymore. Try faxing it.

Jason H

14 years ago

The real question is “are the first month or games more predictive than a similar sample of games from other times in the season”. Using a restricted sample size, you’ve shown that a small sample of games is a weak predictor of a teams ability to win games going forward. Honestly, everybody knew this. Its a good part of the reason why the season is so long. The question is are the first 22 games more predictive than 22 games from elsewhere in the season.

To properly address this question you need to look at the record of all teams (not just ones arbitrarily selected that have won 70%) and calculate the difference between their win expectancy (linear extrapolation) based upon the first 22 games and the actual number of games they win. You then need to compare this to the difference in win expectancy from either samples or 22 games (or windows of 22 games) chosen throughout the rest of the season. Where do the first 22 games fall in this distribution? ….probably smack in the middle.

vivalajeter

14 years ago

Reply to Jason H

Jason, with all due respect, you can decide the *real* question when you start writing your own articles. It’s silly to tell Dave that he’s asking and answering the wrong question, just because you might want there to be a separate article. Dave wanted to see how strong starts translate over the course of the season, so he looked at teams with strong starts. Why would he include all teams in the data, rather than just teams that had strong starts?

-1

Jason H

14 years ago

Reply to vivalajeter

Viva,

Ok that is fine. However, I don’t think Dave actually answered any question because he really didn’t compare his data to anything. He basically showed that small samples are not predictive.

Mike Green

14 years ago

There is one issue with the analysis. Among the clubs that go .700+ early are a disporportionate number that run away and for whom the games late do not mean much. It can be a different game after September 1, and this is particularly so for teams which have a 10 game or more lead. I suspect that the correlation would be a little tighter if you used May 1-August 31. But not much.

bstar

14 years ago

Reply to Mike Green

Excellent point here, Mike.

Los

14 years ago

Reply to bstar

Just like the Red Sox and Braves ran away with the Wild Card last year!

I’m joking but still…

philosofoolMember since 2016

14 years ago

I’m curious whether we can improve this regression algorithm using RS and RA. What’s the correlation between April Pythag and May-Sept. Pythag?

Right now the Dodgers (Pthag = .60) and Mets (.40) are big over-achievers, while the Cardinals (.78) and Rangers (.76) are under performing.

vivalajeter

14 years ago

Reply to philosofool

I don’t put much faith in Pythag records at this point in the year. Over the course of the season it might work out well because things even out, but in small samples they can get out of whack. The Mets gave up 13 runs in one inning in Colorado last weekend. Fluky events like that will have too much of an impact over the course of ~20 games.

That’s not to say they’re not overachieving – they are, mainly because of their record in one-run games – but overall I don’t pay attention to Pythag until we’re further into the season.

philosofoolMember since 2016

14 years ago

Reply to vivalajeter

The question is which is more reliable, not whether it is reliable in some (non existent) absolute sense.

Also, the Mets have a -20 run differential, so you can’t chalk it all up to a 13 inning.

wahooo

14 years ago

I don’t get it–so was the correlation only run with the teams with .700+ winning %? If I understand correctly, then you are comparing the teams that won 80% to the ones that won 70% to see if the ones that won 80% fair better than the ones that won 70% –so we’re talking about 2-3 win difference from the high end to the low end–we shouldn’t be surprised that there isn’t much correlation. If you want to see if there is a correlation between the first month and the other months, why not use all the teams? What am I missing?

Given that no team finishes the season with a .743 winning percentage, it is also unrealistic to expect the teams to continue this way–and the fact that they won .549 seems (probably only 30% of teams win this many over the year) to say that it is somewhat of a predictor of success.

JWTP

14 years ago

Did you do any work with slow starts? I only ask because the Angels are curious.

jim mcAulife

14 years ago

Brilliant! “We should be careful not to overreact to the results of April performances, but also understand that they do carry some meaning,”

Todd

14 years ago

I fail to understand how a ~.550 winning % going forward “doesn’t mean that much”. Seems to me that it means a lot. .550 is nearly a 90-win pace. Sure, if you expected a team that started hot to do well, you might think you hadn’t learned much. But take a team at random, then learn they did this well in April, and thus could be expected to finish out the year playing .550 ball. Surely that’s a lot of information? I was expecting you to say something like .510 or .520, in which case I’d have agreed with the article title.

wahooo

14 years ago

Reply to Todd

exactly what I was trying to say above, but better summarized by Todd. I’m not sure what these statistics really tell us. It is unrealistic to think that a team that is winning at a 70% clip will continue to win at that rate–the fact that the teams with that winning percentage do pretty well seems to say there is some correlation.

Also, I disagree with the premise that the games in the bank are more important. A team with a .743 winning percentage will win 6 more games than a .500 team of 25 games, whereas a .549 team will win 7 more games over 147 than a .500 team. How is it that the banked games are more important?

Nick44

14 years ago

Strong teams regress. But by not including weak April records in your dataset, I’m not sure what is going on.

I can take a glance at the all-time record wins for a season and figure out that you are not going to get a strong correlation (.743*162 = 120.366).

If you did a Spearman’s Rank correlation of April Standings vs May vs September standings I think one might find a little more meaning in an early strong start.

Boomer

14 years ago

The 1984 Detroit Tigers started the season something like 25-5 (.833) and finished the season with a 79-53 (.598) record for a total of 104-58 (.642). This was a very good team that trailed off, but not that much to the point that they dogged it coming to the finish line. And to top it off, they won the WS vs. the Padres.

sportsczarMember since 2020

14 years ago

Does this mean I will stop hearing pundits tell me that the Rangers have the division sewn up? I am SO tired of hearing that the Angels have no chance at the division title now. Is it likely that they win it? Well, it’s less likely than it was on April 4th. I just need to turn the volume off when I have a game on. Former players are the worst. No, I have not contributed anything to this discussion. I’m okay with that.

-1

bpdelia

14 years ago

Reply to sportsczar

what are you talking about? It is well known that no team that was ever ten games out at any point has ever won its division. also former players bring much needed insight as to how various clubhouses (both by sq ft.and locker size) as well as a particular city’s restaurant and golf scene impact playoff races.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG