The Messy Middle Part of the Season

by Kiri Oler

July 10, 2024

Remember back in 2021 when Gen Z tried to tell everyone to move their side parts to the middle and swap their skinny jeans for a looser variety? While most Millennials responded with outward indigence, offline they begrudgingly tried on high-waisted mom jeans and posted up in the bathroom blowing out their hair in a new direction. But before long they let their hair go back to lying in the manner to which it had become accustomed and eschewed jeans completely in favor of athleisure-wear. Even as many of us considered complying with the directive of our teenaged overlords, it felt absurd that people who haven’t even finished developing their prefrontal cortexes are left in charge of dictating what’s cool. As it turns out, though, that’s exactly why teenagers decide what’s cool. Teenagers are the only members of society with the time, energy, and lack of rationality to care so deeply about something that matters so very little.

Those who stuck to their dated stylings and weathered the petty hail storm of Zoomer mocking were vindicated a couple of months ago, when the celebrity and influencer cohort brought back the side part, declaring it on-trend once more. Around that same time another trend was taking hold among the baseball commentariat: Using strength of schedule to determine which teams had actually earned their W-L records. Mostly, this meant arguing that the Phillies weren’t a top team in the league because they’d played a soft schedule. The discourse eventually spawned multiple articles arguing that while yes, Philadelphia hadn’t exactly been slaying dragons while walking a tightrope, its act wasn’t entirely smoke (generated by the clubhouse fog machine) and mirrors either.

Strength of schedule is not typically a prominent talking point when comparing MLB teams. It might occasionally come up when comparing September schedules in a tight postseason race, but as a phrase uttered in May, it’s typically part of a college baseball discussion, or because you’ve wandered into a BCS-era college football forum. College sports need strength-of-schedule metrics because teams don’t all play one another and the variation in team quality spans the Big Ten’s new geographical footprint. But in the major professional leagues, the schedule is fairly balanced, and even though the White Sox and Rockies exist, dominating the worst teams in MLB presents a tougher task than rolling over the University of Maryland Baltimore County Golden Retrievers.

But even though strength of schedule seemingly lacks utility in a professional baseball context, the amount of mud slung at seemingly good teams had me questioning my own assumptions. Maybe there is useful information to uncover in the muck. So with roughly 90 games on each team’s odometer, I decided to pump the brakes and figure out if a team’s early-July winning percentage, combined with its strength-of-schedule rating (SoS), could more accurately predict its final record than its midseason record alone. So I gathered up each team’s strength of schedule and W-L record through a comparable point in mid-July for 2021, ’22, and ’23. Next, I calculated each club’s remaining strength of schedule as it would have stood at that point in the season, threw all three values (Win%, SoS so far, and remaining SoS) into a basic linear regression model and trained it to predict the team’s record at the end of the year. As a baseline comparison, I also trained a model that considered only the midseason win rates of teams.

Did the SoS model outperform the baseline model? No, it did not. Both models explained roughly 78% of the variation in the final winning percentages and made predictions with an average error of 30 percentage points. In the SoS model, neither of the SoS features were deemed statistically significant, though the remaining SoS metric came closer to providing some useful input.

But part of the pushback against the strength-of-schedule girlies concerned the context that gets tossed aside when you flatten a team into a single value. On the one hand, there’s the old Bill Parcells quote, “You are what your record says you are.” On the other hand, your record can only say so much. Standard SoS averages the winning percentages of a team’s opponents, with some debate over whether to use the team’s record from the time the game was played or update the calculation continuously throughout the season. Early season records are too wacky for me to take seriously, so I opted for the continuously updating version, but this aspect of the debate does raise a reasonable point. Teams can be streaky, and how well a team is playing at the time of a matchup, in addition to the health of the roster, factors into the difficulty of the matchup. Winning percentages based on larger samples are more likely to represent a team’s true talent, but they discard in-the-moment context. Fortunately, we’ve known for a while that winning percentage doesn’t tell a team’s whole story, leading to updated versions of the classic W-L record that capture at least some additional context.

Pythagorean W-L was developed by Bill James and uses a team’s run differential to determine its expected W-L record. Here run differential acts as a proxy for a team’s proclivity for both scoring and preventing runs, which tends to be more indicative of its actual ability to win games than its record might imply, since wins and losses are garnished with a larger dollop of randomness and luck. BaseRuns record goes a step further in cleansing the calculation of randomness by using the average run value associated with players’ actions on the field to determine the team’s expected run differential, rather than a run differential that may be inflated due to a fortuitous sequencing of hits.

Does calculating strength of schedule using win percentages based on Pythagorean W-L or BaseRuns W-L add enough context to create a metric that improves on the baseline model’s predictions? Answer: a little. Both the Pythagorean and BaseRuns versions of the model were able to explain 81% of the variation in teams’ 162-game win rates, up from 78%. The average error dropped a few percentage points as well, from 30 down to 27. It’s a slight improvement, but still not enough to suddenly convince me that strength of schedule as a metric has any super dishy secrets to spill about the true talent of a team.

In one final attempt to make fetch happen, I figured since we’re already borrowing tactics from college sports analysis, we might as well really do the thing. Boyd’s World posts Iterative Strength Ratings (ISRs) for college baseball, which work similar to Elo ratings in chess, to assign each team a score based on the quality of its opponents and its outcomes against said opponents. Which is to say, a team gets more credit for beating a good team than a bad one, and is docked more for losing to a bad team than a good one. Lastly, I added Relative Power Index (RPI) to the pile, which ESPN defines as “25% team winning percentage, 50% opponents’ average winning percentage, and 25% opponents’ opponents’ average winning percentage.”

The ISR version of the model performed comparably to the BaseRuns and Pythagorean models, with a slightly worse average error on the predictions. Meanwhile, the RPI model was worse than the baseline model across the board.

Despite the internet’s best efforts to shake me from what I thought was a fairly non-controversial belief that strength of schedule doesn’t matter all that much in a professional league with a 162-game season, I believe we have successfully touched grass and locked back in with reality and what actually matters, and it ain’t SoS. But with that said, we also learned that when calculated with a bit more context, SoS does matter a teeny, tiny bit. So as we enter trade deadline season, is there anything SoS can offer to sway our opinions on whether teams should buy or sell? If a team at the back of the pack in the wild card hunt has played a tough schedule so far, but has a relatively easier slate in the second half, is that a large enough factor to convince its front office to go for it? If a team is on the fringes of contention now, but cake-walked to this point and now stands on the precipice of a pit of quicksand, is that a strong enough argument to sell?

I used the Pythagorean, BaseRuns, and ISR models to predict the final standings for this season to see how much they differ from the current standings. The outputs are summarized below. By my interpretation, SoS changes the current outlook enough for only two teams to shift their assumed deadline strategies based on the current standings. The Pirates are presently “in the mix” for a wild card spot, and anyone north of the Rockies and Marlins in the NL standings could reasonably go for it, or at least stand pat and see what happens. But given that five other teams are in a similar position, a tough second half schedule and a seller’s market might tip the scales. The Rays have only two teams ahead of them in the AL Wild Card race, but the existing separation between Tampa Bay and its competitors, combined with a tough schedule, lower the odds that it can make up that ground.

American League

	Current Standings			Projected Standings
Team	W%	Division GB	WC GB	Py W%	BR W%	ISR W%
CLE	.633	–	–	.596	.601	.600
BAL	.626	–	–	.624	.621	.645
SEA	.538	–	–	.540	.533	.541
NYY	.591	3.0	+3.5	.589	.579	.607
MIN	.571	5.5	+1.5	.556	.562	.541
BOS	.556	6.5	–	.534	.539	.535
KCR	.533	9.0	2.0	.526	.529	.548
HOU	.516	2.0	3.5	.509	.507	.498
TBR	.495	12.0	5.5	.458	.463	.451
TEX	.478	5.5	7.0	.490	.484	.472
DET	.467	15.0	8.0	.468	.475	.438
TOR	.451	16.0	9.5	.460	.448	.475
LAA	.407	12.0	13.5	.440	.433	.475
OAK	.366	16.0	17.5	.391	.386	.390
CWS	.280	32.5	25.5	.305	.306	.325

Standings as of start of play on 7/10.

National League

	Current Standings			Projected Standings
Team	W%	Division GB	WC GB	Py W%	BR W%	ISR W%
PHI	.648	–	–	.611	.618	.616
LAD	.598	–	–	.585	.589	.574
MIL	.576	–	–	.568	.572	.585
ATL	.567	7.5	+4.5	.551	.553	.544
STL	.533	4.0	+1.5	.518	.528	.493
SDP	.516	7.5	–	.528	.527	.526
NYM	.500	13.5	1.5	.504	.501	.523
ARI	.489	10.0	2.5	.485	.491	.460
SFG	.489	10.0	2.5	.486	.488	.490
PIT	.484	8.5	3.0	.467	.474	.461
CIN	.478	9.0	3.5	.488	.490	.495
CHC	.467	10.0	4.5	.466	.472	.450
WAS	.457	17.5	5.5.0	.461	.461	.460
MIA	.352	27.0	15.0	.364	.362	.366
COL	.348	23.0	15.5	.354	.356	.329

Standings as of start of play on 7/10.

A few other teams do experience notable changes to their winning percentages, but not in a way that meaningfully affects their positions in the standings. Banked wins are banked wins, and the same can be said for losses. All three models have the Phillies, Guardians, and Dodgers taking a hit, but not enough to knock them off their seats atop the division, while the White Sox and Angels get a nice bump, but not enough to suddenly make them contenders. Kansas City has the chance to take advantage of a remaining schedule that’s easier than the one Boston has, while the Reds and Mets have an easier path ahead of them than the Cardinals do. But given the current positions of those teams, they already have a strong enough claim to buy even before considering their remaining schedules.

When it comes to evaluating a team’s true talent and its season-long outlook, strength of schedule matters about as much as whether 17-year-olds think your Uggs are cheugy, even though they’re strutting around in Crocs adorned with Jibbitz. Which is not to say that it doesn’t matter at all. We all have pride and delicate egos, so it’s reasonable to fear a group of people known for their cutting remarks designed specifically to gut you from the inside out. But there’s one thing those same teens always need to be reminded of in their ever-brooding state: Certain things that feel like the end of the world in the moment won’t be remembered a few months from now and certainly not in a few years.

No matter how the 2024 season ends, when we look back on these Phillies, a cushy schedule over their first 50ish games won’t be a defining feature. Because if they make the postseason, it will most likely be as either a really good, properly rated team, one whose record leveled out over the course of a long season, or as a really, really good team who beat the SoS allegations. And if the Phillies don’t make the postseason, the narrative will revolve around their collapse, which won’t be explainable using strength of schedule alone (though some might try). Either way, more impactful factors will take over the story of their season and their early season opponents will go largely unremarked upon.

Though strength of schedule might tilt a team or two closer to selling at the deadline — and Zoomers might convince Millennials to donate their old jeans — in the long run no one is going to remember the shape of a team’s win distribution over the course of the season. And no one will care that you kept your swoopy side bangs for like three years after they were no longer in style.

27 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Lanidrac

1 year ago

You forgot to put the + on the Wild Card spots of the Braves and Cardinals.

Last edited 1 year ago by Lanidrac

CromulentMember since 2017

Reply to Lanidrac

This comment is kind of cheugy

ThomasMember since 2017

Reply to Cromulent

Well played.

Well, he’s fixed it now. It was still useful to point out his mistake so he could fix it.

-1

formerly matt wMember since 2025

(Kiri Oler’s a woman and uses “she”: https://music.amazon.com/podcasts/39c6b52a-3581-4f64-bd56-6ecc2fcf3a85/episodes/4ed04e5a-82a8-402c-a793-021e76574abc/the-sis-baseball-podcast-women-in-baseball-analytics-spotlight-emily-curtis-and-kiri-oler)

Reply to formerly matt w

OK, that was my mistake.