Archive for Research

I Had an Idea About Bat Tracking Data

Sam Navarro-Imagn Images

I was in Hawaii this past weekend, taking a nice vacation to wind down from the end of the baseball season, when I found myself thinking about intercept points. Weird? Overly baseball obsessed? Maybe. But in my defense, a kid at the pool kept swinging at a Wiffle ball almost hilariously late, spraying it “foul” every time. “Oh look, the next Luis Arraez,” I thought, before going back to my umbrella-adorned drink. But that stuck with me, and when I got home, a database query leapt out of my head fully formed, like Athena after Zeus’ headache.

Where is the optimal place to make contact with the ball? It depends on who’s swinging. Statcast measures every single swing’s contact point relative to a hitter’s center of mass, and that data clearly shows that there are many ways to succeed. That’s always stymied me as I’ve looked into swing path data. But that small child gave me an idea when he got off the best swing I’d seen all day, a Wiffle ball line drive that would have been a screamer down the left field foul line (he was batting lefty). Because his normal swing was so late, his best contact was ever so slightly less late. What if I bucketed hitters based on their own swings to look for swing timing clues?

I took every batter who produced 300 or more batted balls (foul balls or balls in play) in 2025. For each of those hitters, I took aggregate statistics for all of their results, then also split their batted balls into three groups: deepest contact point, middle contact point, and farthest forward contact point. You can think of it as late, on time, and early, adjusted for that player’s swing. The later you start your swing, the more you “let it travel,” the deeper your contact point relative to your center of mass. The earlier you start, the more you “get out in front,” the farther forward you make contact. Read the rest of this entry »


What if Baserunning and Defense Were as Valuable as Hitting?

Jim Cowsert-Imagn Images

In just about any sport you can name, offense is king. If you’re the one who scores the goals, the points, the runs, the whatever they call it in polo – the biscuits, maybe? – you’re going to get the plaudits. Who’s the greatest defenseman in the history of hockey? It’s Bobby Orr, of course, because he was the first great offensive defenseman. This pattern very much holds when it comes to baseball.

Among other things, the sabermetric revolution helped us codify the value of hitting relative to the other facets of the game. To wit, according to weighted runs above average – and we’re using that particular stat because, like standard baserunning and defensive metrics, it’s a counting stat that compares a player to the performance of an average player – the most valuable hitter during the 2025 season was one Aaron Judge. Judge created 82.5 more runs than the average hitter. That’s 21 runs more than any other player, and an astonishing 36 more than any other player not named Shohei Ohtani. Judge was the best offensive performer in the game by a mile, which makes him the frontrunner for the American League MVP award, even though he put up negative value as a baserunner and, depending on which metric you trust, his defense graded out somewhere between pretty good (DRS, FRV) and really bad (DRP). The best defender was Patrick Bailey, who put up 30 fielding runs, and the best baserunner was Corbin Carroll, who finished with a measly 10.3 baserunning runs. Offense is just more valuable than defense and baserunning. Here’s the distribution of values for the three portions of the game:

Read the rest of this entry »


Checking in on Pythagoras

Kiyoshi Mio-Imagn Images

This June 25, the Dodgers and Tigers both played their 81st game of the season. Both teams finished the day 50-31, sharing the best winning percentage in baseball at .617. The Tigers got there with a slightly better run differential, though; their Pythagorean winning percentage was a cool .608, while the Dodgers checked in at .595. Pythagorean record is implied by runs scored and allowed, and broadly regarded as a more stable measure of talent than simple wins and losses. Since that day, though, the Tigers have gone 35-40 (.467 with a .483 Pythag), while the Dodgers have gone 38-37 (.507 with a .556 Pythag).

I’m bringing this up – last data project for a while, incidentally, I just had a bunch of things in my queue and couldn’t resist tackling them all – because “how good is that team, anyway?” has been a hot topic this year given the various surprising teams who have, at times, taken up the mantel of “hottest in baseball.” Versions of this question – “This team is doing well/poorly now, what does that mean for next month?” – have been both interesting and top of mind in 2025. The Tigers and Brewers played so well for so long that they each crashed the best-team-in-baseball debate. The Mets did their hot-and-cold thing. The Dodgers have endured multiple fallow stretches. Sometimes, teams felt like they were getting very lucky or unlucky relative to their run differential. But what does any of that even mean? Read the rest of this entry »


Fun With Playoff Odds Modeling

Gary A. Vasquez-Imagn Images

Author’s note: “Five Things I Liked (Or Didn’t Like) This Week” is taking a short break, but will return next Friday for the end of the regular season.

Earlier this week, I did the sabermetric equivalent of eating my vegetables by testing the accuracy of our playoff odds projections. I found that our odds do a pretty good job of beating season-to-date odds (particularly late) and pure randomness (particularly early, everything does pretty well late). It’s good to intermittently check in on the accuracy of our predictions. It’s also helpful to build a baseline as a benchmark to measure future changes or updates against.

Those are a bunch of solid, workmanlike reasons to write a measured, lengthy article. But boring! Who likes veggies? I want to beat the odds, and I want to flex a little mathematical muscle while doing it. So I goofed around with a computer program and tried to find ways to recombine our existing numbers to come up with improved odds built by slicing up existing ones. It didn’t break the game wide open or anything, but I’m going to talk about my attempts anyway, because it’s September 19, there aren’t many playoff races going on, and you can only write so many articles about whether the Mets will collapse or if Cal Raleigh will hit 60 dingers.

What if you just penalized extreme values?
I first tried to correct for the fact that early-season projection-based odds (which I’m calling FanGraphs mode for the rest of the article) seem to be too confident and thus prone to large misses. I did so by applying a mean reversion factor that pulled every team’s values toward the league-wide average playoff chances (i.e. how many teams made the playoffs that year). This method varies based on the current playoff format; we have 16-team, 12-team, and 10-team samples in the data, and I adjusted each appropriately. I set the mean reversion factor so that it was strong early in the year and decayed to zero by the end of the season. Read the rest of this entry »


MeatWaste Part 2: The Re-Meatening

Benny Sieu-Imagn Images

Last week, I dug into the data a little to see if there was any empirical basis to the suspicion that the Brewers lineup might not be cut out for October. The result was a new metric, if you want to call it that, called MeatWaste%. This number — the percentage of pitches that end up either in the dead center of the strike zone or out in Baseball Savant’s Waste region — I used as a proxy for pitcher quality. MeatWaste pitches are gifts to the batter, the kind of offering that produces an instant swing decision and either an easy take or a full-force swing.

I found two things: First, that the Brewers are better, relative to the league, on these two pitch locations than they are on the whole. And second, that these easy opportunities come around often in the regular season, but disappear in close playoff games. Simple enough, though there are limits to what this finding allows us to infer about the Brewers’ future. It’s why they play the games, after all. Read the rest of this entry »


A FanGraphs Playoff Odds Performance Update

Gregory Fisher-Imagn Images

Look, I get it. You keep refreshing FanGraphs, and it keeps saying that the Mets are 99.9999% likely to make the playoffs (okay, fine, 79.4%). You’ve seen the Mets play, though. They stink! They’re 32-48 since June 13. The White Sox are better than that! We think they’re going to make the playoffs? These Mets?! What, do we not watch the games or something?

Well, to be fair, our models don’t actually watch the games. They’re just code snippets. But given how the Mets’ recent swoon has created the most interesting playoff race in baseball this year, and given that our odds keep favoring them to pull out of a tailspin, the time is ripe to re-evaluate how our playoff odds perform. When we say a team is 80% likely to make the playoffs, what does that mean? Read on to find out.

In 2021, I sliced the data up in two ways to get an idea of what was going on. My conclusions were twofold. First, our model does a good job of saying what it does on the tin: Teams that we give an 80% playoff chance make the playoffs about 80% of the time, and so on. Second, our model’s biggest edge comes from the extremes. It’s at its best determining that teams are very likely, or very unlikely, to make the playoffs. Our flagship model did better than a model that uses season-to-date statistics to estimate team strength in the aggregate, with that coverage of extreme teams doing a lot of the work. Read the rest of this entry »


Opportunity, Takeoff Rate, and Stolen Base Opportunism

Rafael Suanes-Imagn Images

David Hamilton doesn’t wait casually at first base. He lurks, waiting for the slightest opening to take off. Watch an at-bat where Hamilton is on the bases, and he’s often as much a point of discussion as the man at the plate. Take the game between the Red Sox and Guardians on September 1, for example. Hamilton pinch-ran for Carlos Narváez with Connor Wong at the plate. Wong fouled off a bunt for strike one with the entire defense focused on Hamilton at first base. Then Hamilton stole second on the next pitch even with the catcher, pitcher, and infielders all fixed on his every move.

Hamilton isn’t the most prolific basestealer in the majors. He isn’t the most successful. But he is the baserunner who tries to steal most frequently, after adjusting for opportunities, and so he’s a great poster boy for what I’d like to talk about today: stolen base opportunities and takeoff rate.

It doesn’t take much to make a stolen base possible, just a runner and an open base. You do need both of those, though. Draw a walk to load the bases, and you’re not attempting a steal without something very strange going on. Stolen base opportunities aren’t easy to find in a box score or a game recap. They’re the negative space of baseball – no one’s counting them, and it’s easier to see where they aren’t than where they are. So, uh, I counted them. Read the rest of this entry »


How Much Do Trail Runners Matter? An Investigation

Rick Scuteri-Imagn Images

Watch this play. What do you notice?

Here’s what I see: Brooks Lee lofts a soft fly ball 248 feet from home plate. Chandler Simpson circles it but loses a bit of momentum by the time it lands in his glove. Twins third base coach Tommy Watkins sends the not-particularly-fast Trevor Larnach (18th-percentile sprint speed). Shallow fly ball, slow runner, close play at the plate — Larnach slides in just ahead of the throw. It’s an exciting sequence, and I’ve missed an important part of it. Read the rest of this entry »


Welcome to Meatball Watch 2025

Charles LeClaire-Imagn Images

I’d like to present the meatball-iest pitch thrown so far in 2025:

I know, I know! I said that, but it’s just a foul ball. Hear me out, though, because I can put some data behind my claim. Here at FanGraphs, PitchingBot, our in-house pitch modeling system, looks at every single pitch thrown, regresses it against a huge database of past pitches, and uses some mathematical ingenuity to turn that into the expected outcomes of the pitch. That’s not the same as knowing which pitch is most likely to turn into a home run, but luckily, a good bit of mathematical wrangling can turn pitch grades into home run percentages.

Last year, I worked out the rough contours of converting PitchingBot grades into home run likelihood. This year, I’ve expanded that methodology to try to learn a little bit more about the pitchers doing the meatballing. If you’d like to skip through the how, you can head right down to the table labeled “Meatball Mongers.” If you’re here for the nitty gritty of turning pitch metrics into home run likelihood, though, here’s how I did it.

That Trent Thornton fastball had a lot of things working against it, and those things help explain how PitchingBot estimates the chances that a pitch will be hit for a home run. PitchingBot has a flowchart that explains how the model works. Here’s how the system assesses every pitch it grades:

Hey, a convenient “start here” label! How great! The “swing model” takes location, count, pitch type, movement, platoon matchups, and pretty much everything else you can imagine into account and guesses at the likelihood of a batter swinging at each pitch. That Thornton fastball was down the middle in an 0-1 count, and it’s not a particularly deceptive offering. In other words, hitters often swing at fastballs like that – 92.7% of the time, per PitchingBot’s model. Read the rest of this entry »


The Enigma: My Journey Through Statistical Artifacts in Pursuit of Hot Streaks

Brett Davis-Imagn Images

A warning up top: This article is about seeking and not finding, about the unique ways that data can mislead you. The hero doesn’t win in the end – unless the hero is stochastic randomness and I’m the villain, but I don’t like that telling of the tale. It all started with an innocuous question: Can we tell which types of hitters are streaky?

I approached this question in an article about Michael Harris II’s rampage through July and August. I took a cursory look at it and set it aside for future investigation after not finding any obvious effects right away. To delve more deeply, I had to come up with a definition of streakiness to test, and so I set about doing so.

My chosen method was to look at 20-game stretches to determine hot and cold streaks, then look at performance in the following 20 games to see which types of players were more prone to “stay hot” or “stay cold.” I started throwing out definitions and samples: 2021-2024, minimum 400 plate appearances on the season as a whole, overlapping sampling (so check games 1-20 vs. 21-40, 2-21 vs. 22-41, and so on), wOBA as my relevant offensive statistic, 50 points of wOBA deviation against seasonal average to convey hot or cold, 40-PA minimum per 20-game set to avoid weird pinch-hitting anomalies, throw out games with no plate appearances to skip defensive replacements — the list goes on and on. Read the rest of this entry »