Investigating The Yankees Dominance of the Twins

by Jack Moore

February 14, 2011

Over the last nine baseball seasons, at least one thing has been constant: the domination of Ron Gardenhire’s Minnesota Twins by the New York Yankees. Between the regular season and the postseason, the Twins have lost a staggering 57 of their last 75 contests against the New York, including a sweep in last season’s ALDS that just seemed all too familiar. These struggles have created a narrative: the little-guy Twins, despite all their regular season success, crumble under the pressure of the big-city Yankees. Is this just the magnification of a small 75-game sample, or is there something substantive in the Twins 18-57 record against the Yankees under Gardenhire?

Using a Bill James method known as log5, we can find an expectation for one team’s performance against another. Log5 predicts Team A’s win percentage against Team B with the following formula:

A*(1-B)/(A*(1-B)+(1-A)*B)

The A in the above formula is Team A’s winning percentage and B is Team B’s winning percentage. Since 2002, the Twins are, including the postseason, 809-677 for a .544 winning percentage. The Yankees are 914-612 over that same time frame, for a winning percentage of .599. Therefore, the log5 method tells us that we should expect the Twins to win 44.4% percent of the time, or 33.3 games out of 75.

So the Twins won 15.3 fewer games than we would expect, nearly the number of games the team won total, although that may not even sound as shocking as it should. According to binomial distribution, the probability of 18 (or fewer) successes out of 75 tries with a success probability of .444 is only 0.021%. For another measurement of just how outlandish this run has been, I ran 2500 simulations of the 75 games assuming a Twins win probability of 44.4%. In only one of the simulations did the Twins win the 18 games that we’ve observed over the past nine years. The actual result of the games is a whopping 3.6 standard deviations from the mean expectation.

Running these numbers using Pythagorean records since 2002 yields nearly identical results, as nearly nine years of data results in Pythagorean records similar to actual records. It’s also worth noting that the Pythagorean record of the Twins-Yankees game results in a record of 52.5 wins for the Yankees and 22.5 for the Twins – slightly closer, but still only a 0.8% chance according to the binomial distribution and still 2.5 standard deviations from the mean expectation of just over 33 wins. As one extra note, it is important to remember that although the odds of the Yankees dominating at this level are tremendously low, the odds of any one team dominating any one other team like this over one stretch are much higher simply due to the large amount of 75 game head-to-head samples.

Using this data, we can reject the hypothesis that the Twins are actually a .444 team against the Yankees – as we would expect given the log5 method – with 99.9841 certainty, well over the typical 95% level of statistical significance. This highly suggests that some other factors are at play besides each team’s talent relative to the league. For some reason, the New York Yankees play much better or the Minnesota Twins play much worse (or both) when these two teams match up.

The Twins would have to be a roughly .350 team against the Yankees for the 18-57 actual result to be within two standard deviations of the mean expectation. The Yankees “extra advantage” over the Twins is at least in the realm of 100 points of winning percentage, if not more. Now the question is where this extra advantage comes from. I’d like to explore that in later posts, but for that I need some help. Is it something with the way each team distributes their talent? Are the teams built by the Yankees just better suited to play the Twins than the rest of the league (or vice-versa)? Is there something about Ron Gardenhire’s tactics against the Yankees which puts his teams at a disadvantage? Are the Yankees great at advance scouting the Twins? Is there anything else that I’m missing? Put your ideas in the comments and I’ll try to investigate some of them in the coming week.

66 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

lex logan

14 years ago

“As one extra note, it is important to remember that although the odds of the Yankees dominating at this level are tremendously low, the odds of any one team dominating any one other team like this over one stretch are much higher simply due to the large amount of 75 game head-to-head samples.

“Using this data, we can reject the hypothesis that the Twins are actually a .444 team against the Yankees – as we would expect given the log5 method – with 99.9841 certainty, well over the typical 95% level of statistical significance.”

Wrong. You cannot hunt through vast numbers of head-to-head matchups, pick one that sticks out like a sore thumb, and then apply a statistical test to reject the null hypothesis. There are 91 head-to-head matchups in the American league and 120 in the National; you basically need to multiply your p-value by something like 221.

The proper way to apply statistics is to form a hypothesis based on one set of data, then test it with fresh data. Cherry-picking yields zero statistical insight.

Luke in MN

Reply to lex logan

My statistics is sort of amateurish, but 1-.999841 is your p-value I think and multiplied by 221 you get .035, which would still give you 96.5% confidence. Can I do that?

test

Reply to Luke in MN

Starting at a significance level of 0.05 (95%), an applying a conservative bonferroni multiple testing correction, your new critical value if 0.05/221 – 0.00023. The p-value from the Yankees/Twins is .000159, so you are still rejecting the null hypothesis. I would interested if any of the other matchups over the same time period are as far off the expected distribution…

As to why the Twins can’t beat the Yankees, one common theme I’ve read is that the Twins pitching staff technique of not walking anyone just doesn’t work against a stacked lineup, which makes an intuitive kind of sense to me (can’t get them to chase borderline, and then they feast on average stuff in the zone). But I’m curious if the bad record is due to bad offense or bad pitching or bad defense. At this extreme, it must be all three to some degree, but does one stand out?

Jack MooreMember since 2020

My “fresh data” was the data from my simulations, which is what I used to reject the hypothesis.

Barkey Walker

Reply to Jack Moore

While not polite, he is right that you need to do a Bonferroni correction. Simulations are not data, they are simulated data.

Say you take a person and give them 200 blood tests, then you test if they are all within the 95% confidence interval. Because you gave 200 tests, and, when the person is normal, each test will be rejected 5% of the time, about 10 of the tests will have results outside of the 95% confidence interval. No amount of simulation will change that 10 or so of them will be rejected.

Doctors know this and do a Bonferroni correction where they widen the acceptable level of each test. This is not the whole story for blood testing, but it is a reasonable real world example.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG