Archive for Research

The New Ball Is Confusing!

Last week, Justin Choi published an examination of the new ball. The results were — well, you should read it for yourself, but they were muddled, to say the least. Home runs are down! Exit velocity is up! Liners got better, fly balls got worse. It’s enough to make you wonder whether we’ll ever know the answer. It’s also catnip to analysts, and so today I’d like to present some supplemental evidence that only makes me more confused.

There were two key conflicting findings in Justin’s research. First: home runs are down, and fly balls aren’t carrying as far, on average, as they did last year. Second, overall exit velocity is up league-wide, whether you care about broad averages or the hardest-struck balls. The two effects — harder hits, less carry — benefit line drives over fly balls, because line drives both spend less time in the air and depend less on distance for their value.

I wasn’t really sure what to make of the fact that fly balls are carrying less. There are so many confounding factors — weather, new humidors, angle, stadiums, the list goes on and on — that I don’t think I’ll ever be able to disambiguate them all, but I took a crack at it. Read the rest of this entry »


April Hitting Stats Mean Nothing… Except When They Kinda Do

As part of my exhausting shtick, I like to respond “April!” to questions in my chats involving player performances in the season’s early going. This is effective shorthand when someone wants to know if, say, George Springer is a bust because he’s put up a .480 OPS in his first two weeks in the majors. It’s also dead wrong. April stats, in their proper context, are meaningful.

“But Dan, a few weeks of baseball is a tiny sample!” That’s correct, but you have to take into consideration the underlying reasons projections can prove to be inaccurate. It’s not just that things change, though they do — pitcher X learns a sweet knuckle-curve or batter Y realizes that not hitting everything into the ground might be good — it’s that it’s challenging to gauge where players stand in the first place. Players’ stats themselves aren’t even perfect at this. Tim Anderson hit .322 in 2020, but that doesn’t actually mean his mean batting average projection should have been .322. We don’t actually know if a theoretical player was “truly” a .322 hitter, a .312 hitter who got lucky, a very unlucky .342 hitter, or a .252 hitter who made a deal with a supernatural or extraterrestrial entity. A .300 hitter isn’t observed, they’re inferred.

The way most, if not all, in-season projections (or any projections, really) function is by applying what we call Bayesian inference. We won’t get into a full-blown math class, but in essence, it simply means that we update our hypotheses to take new data into account. And for players, data comes in all the time: every pitch or swing of the bat is new information about a player. It’s valuable information, too, as only the last handful of seasons have much predictive value and recent performance is the most useful. Read the rest of this entry »


Playoff Formats and the Marginal Win

In the weirdest year of baseball history so far, 2020 featured a gigantic playoff field introduced right as the season began, turning a 10-team postseason into a 16-team format. Changing the basic structure of awarding the sport’s championship with no advance notice would have been an odd choice in a normal season. But given the 102-game reduction in the league’s schedule and its resulting small sample size season, it kind of made sense. When the decision was made, it wasn’t a surety that there would even be a season, to the point that people would have been happy if extra-inning games were decided by closers riding ostriches and jousting.

Before the World Series was even completed, commissioner Rob Manfred expressed the league’s desire to keep the new format in a normal season. The players need to agree to changes like this, of course, and that permission wasn’t granted after all MLB offered in exchange was a universal designated hitter. One of the concerns, not officially made public, is that a playoff system that is more of crapshoot will further reduce the already eroded incentives for teams to spend money to improve their rosters. That’s hardly a shock; at the 2020 trade deadline, 10 teams already had projected playoff probabilities above 97%. Combine that with the absence of the normal advantages afforded to higher seeds, and you had a trade deadline that saw only a single team, the San Diego Padres, aggressively improve, and even their moves were almost certainly made with an eye toward 2021 and beyond.

So what is the ideal playoff system? That’s a difficult question, one that’s impossible to answer to everyone’s satisfaction. I can only answer for myself, and for me, there are a few requirements that are particularly important. Basically, I want a system in which regular-season performance matters, thus maintaining one of the core aspects of the game. I also want a playoff system that more heavily awards quality over randomness without making the result a preordained one. The more a championship is decided by randomness, the less incentive there is for teams to innovate and invest. Read the rest of this entry »


Should Good Hitters Lead Off? FanGraphs Investigates

This story starts, as all good stories do, with me recounting the time one of my coworkers and I discussed something. Okay, fine, very few good stories start that way — almost none, in fact — but bear with me. This (non-baseball) coworker, someone who I consider very bright and very interested in baseball, told me he didn’t really believe in wRC+, even after I’d shown him some articles describing it.

Why, I wondered, didn’t he believe in it? It’s so elegant! The math is right there! How can you not like something that wraps up performance at the plate in a single number? No need to compare apples to oranges — you can juice everything to a pulp and simply count calories. His answer was simple: it doesn’t consider batting order.

“You’re telling me,” he said, “that you’d rather have Mitch Moreland as a leadoff hitter than Xander Bogaerts?” It was 2017, and we were working in the Northeast, which explains why both players were Red Sox and why this question was even close. “His wRC+ is higher, but he’d be worse at leadoff. He doesn’t get on base enough.”

To be honest, it’s a compelling argument. I didn’t really have the intellectual tools or the time to counter it. I went with the old tried and true method: I vaguely mentioned something about context-neutrality in the long run, said I had some bonds to arbitrage or whatnot, and went back to work, ending the conversation without conceding defeat.

Fast forward to today, and I still don’t have a wonderful answer to my former co-worker’s point. I do have a computer program that simulates games, though, so I decided to come up with a quick and dirty check. What if we plugged real hitters with similar one-number batting statistics but who get there in wildly different ways into the lineup? Would we learn anything? Would I be able to write 1,500 words about it and entertain the masses? I guess we’ll find out! Read the rest of this entry »


The Superlative Kyle Hendricks

You know it’s almost time for baseball season when all of the major projection systems forecast Kyle Hendricks‘ ERA one run per nine innings too high.

As much as this sounds like a knock on those who develop projections, it’s not. What Jared Cross (Steamer), Dan Szymborski (ZiPS), Derek Carty (THE BAT), and the folks at Baseball Prospectus (PECOTA) do is no small feat. If I weren’t too cowardly to even try to create my own projection system, I would be too stupid to design one that is half as effective as theirs. Glass houses and all that.

That said, I am just smart enough to know that projected ERAs ranging from 3.84 to 4.42 for Hendricks, who boasts a career ERA of 3.12 and has never finished a season with an ERA above 3.46 (except that dastardly 3.95 ERA in 2015), are too high. It’s easy to poke holes in the obvious outliers, but projections succeed by describing and then predicting the talents of most pitchers, not the ones whose talents deviate dramatically from expectation. Hendricks is every projection system’s known blind spot.

It’s not just projections that struggle with Hendricks, either. We, the sabermetric community, frequently use ERA estimators as shorthand to characterize a pitcher’s talent level. If you frequent FanGraphs, you’re familiar with Fielding Independent Pitching (FIP), expected FIP (xFIP), and Skill-Interactive ERA (SIERA). By virtue of how they’re constructed, each metric makes assumptions about the skills a pitcher theoretically “owns”:

  • FIP: strikeouts, walks, and home runs allowed
  • xFIP: strikeouts, walks, and fly balls induced
  • SIERA: a complicated combination of strikeouts, walks, net groundballs (groundballs minus fly balls), and their squared terms and interactions with one another

While each estimator features a batted ball component, they focus on trajectory (launch angle), not on authority (exit velocity). This is a fair assumption, frankly. I have illustrated how a pitcher can influence hitter launch angle, operating under the assumption they bear little to no influence over hitter exit velocity. It’s not quite that bleak; certified baseball genius Rob Arthur found that the average pitcher’s effect on a baseball’s exit velocity: roughly five parts hitter, one part pitcher. Read the rest of this entry »


Musings at the Intersection of Launch Angle Consistency and Hard-Hit Rate

If you follow the work of Alex Chamberlain at all, you’ve heard of the value of launch angle consistency. I’m not going to recapitulate his body of work on the subject, but briefly: hitters with tighter launch angle distributions routinely run higher BABIPs, and you can think of launch angle consistency as roughly a proxy for “hit tool.”

Most of this comes down to avoiding terrible batted ball outcomes. The two worst things you can do when you put the ball in play are to hit it straight down or straight up. Given that balls are, on average, hit mostly forward and with a tiny bit of loft — breaking news, I know — launch angle consistency is a great proxy for how often you avoid those, because the more -80s and +80s you put in your sample of mostly 10s and 20s, the higher the standard deviation gets.

One thing I’ve often wondered is whether this idea of consistency holds up for subsets of batted balls. Intuitively, it seems like it might. Take hard-hit balls, for example. If you’re hitting the ball 95 mph or harder, you really don’t want to squander it by hitting the ball on the ground or straight into the air. The distribution peaks at 30 degrees, but anything between 10 and 35 is a solid outcome.

With this in mind, I decided to look for batters who grouped their hard-hit balls most tightly. Having a narrow distribution seems like a great way to maximize good outcomes. Which player, you ask, has the tightest launch angle consistency (I’m just using standard deviation here) on hard-hit balls? I’m glad you asked — it’s Dee Strange-Gordon. Read the rest of this entry »


The Seam-Shifted Revolution Is Headed for the Mainstream

Hey there! I want to give you a heads up about this article, because it doesn’t fit into a normal genre I write. Today, I won’t be telling you some new insight about a player you like, or creating some new nonsense statistic that tries to pull meaning from noise. This is a story about how baseball analysis is changing right before our eyes. A group of scientists and baseball thinkers are redefining the way we think about pitch movement, and I think it’s worth highlighting even if I don’t have anything to add to the conversation yet, because this new avenue of research is going to be front and center in Statcast-based analysis over the next few years.

“Seam-shifted wake,” as Andrew Smith, a student of Dr. Barton Smith (no relation) coined it, is a source of pitch movement that the first attempts at understanding the physics of a pitched baseball overlooked. It has already changed the way that coaches and pitchers approach pitch design, and due to recent data advances, it’s about to be everywhere. So let’s go over how we got here, to this newly observable way that pitchers deceive hitters, by starting at the beginning and working forward.

At its core, baseball is a game about one person trying to throw a ball past another person. There are other trappings — bases and baserunners, umpires, a strike zone, the mythology of Babe Ruth, and a million other sundry things. At the end of the day, though, everything starts with the pitcher trying to throw a ball past the batter.

Accordingly, baseball analysis over the years has focused on describing the flight of that ball. For a time, that simply meant describing the shape of pitches — they don’t call them curveballs for nothing. The next step was velocity — radar guns let us appreciate fastballs numerically rather than merely aesthetically.

In the past 15 years, the amount and scope of pitch-level analytical data has exploded. First, PITCHf/x quantified pitch location and movement. When we report a pitcher’s chase rate or how often a batter swings at pitches in the strike zone, it’s because the location where each pitch crosses the plate is recorded and logged. When we say a pitcher has eight inches of horizontal break on their slider, it’s because new technology allows us to measure it.

When Statcast debuted in 2015, it added another wrinkle: radar tracked the spin rate of each pitch in flight, putting a numerical value on something that had previously been only qualitative; a pitcher’s ability to generate movement through spin. Doctor Alan Nathan has written several authoritative studies discussing the value of this spin data. Read the rest of this entry »


The Costs and Benefits of Six-Man Rotations

Planning a starting rotation for 2021 carries innumerable pitfalls. Nearly every pitcher in the league saw a reduced workload last year, and they did it in strange circumstances to boot. It’s not merely that the short season set everyone’s innings back — though that’s a huge component. A large number of cancelations and postponements also meant more doubleheaders and more cobbled-together games, another way to throw pitchers off their rhythm.

Put it all together, and protecting arms sounds like an appealing plan for 2021. The Mariners announced that they’ll use a six-man rotation next year, a continuation of the plan they leaned on for all of 2020. The Red Sox are talking workload management. Since initially publishing this piece, Jeff Zimmerman pointed out that the Tigers will use a six-man rotation as well. Is an embiggened rotation the solution to this universal problem? Let’s do the math.

It depends, first of all, on what you give up. The innings tradeoff of a six-man rotation is straightforward. Giving your pitchers an extra day off limits their workloads, naturally enough. The math on that is straightforward if you assume it doesn’t affect their in-game workload. Take a pitcher who averages six innings per start. In a five-man rotation, that’s 192 innings of work. Adding a sixth pitcher to the rotation cuts that down to 162 innings.

How much do those 30 innings of work matter when it comes to health? I’ll level with you — I’m not sure. We simply don’t have the data to say with any amount of certainty, because the number of comparable situations is so small. Pitchers have light workloads all the time, but in most cases it’s due to age or injury. Looking at what a 21-year-old pitcher did in first building up stamina probably can’t tell us much about how many innings Jake Odorizzi, to pick a random example, should throw in 2021. Likewise, a pitcher’s workload in his first year back from Tommy John surgery can’t tell us how many innings Trevor Bauer can be effective for. Read the rest of this entry »


Do Successful Steals Apply Measurable Pressure?

Consider the plight of the base stealer. In the 1980s, their role was sacrosanct. Get on base first, then cause havoc. For fans of speed and baserunning, it was a veritable golden age. Rickey Henderson and Vince Coleman each stole 100 bases in three separate seasons. Since 1980, the 13 top seasons in terms of stolen bases per plate appearance were 1980 through 1992.

Alas, the scurrilous forces of math and efficiency conspired to dethrone the stolen base. As it turns out, advancing one base is less good than creating an out is bad. It’s bad enough, in fact, that you need about three successful stolen bases to make up for the downside of getting caught once. The very best thieves managed that level of efficiency, but in aggregate, the league only crested a 70% success rate once from 1980 to 1992. Steals simply weren’t advancing teams’ goal of scoring as many runs as possible.

For a time, there was a reasonable counter-argument: what if attempting a stolen base has positive value that isn’t solely contained in reaching second base? Perhaps the pitcher has steals on the brain, or the defense loses its cohesion while attempting to cover the base for a throw. It doesn’t need to add much edge to make the math add up.

In 2007, the authors of The Book took up this question. They found a large advantage to batters when a runner was on first — exactly what proponents of steals suggested. There was a big problem, however. That advantage was for all runners on first base. The faster the runner, the smaller the advantage. In addition, actually attempting a steal carried a huge hit to the batter, more than enough to offset the advantage of having a runner on base. Read the rest of this entry »


Where Vertical Approach Angle Seems to Matter Most

A couple of weeks ago, I was chatting with PitcherList’s Alex Fast about four-seam fastballs swinging strike rates (SwStr%) and their relationship to pitch height — or, perhaps more specifically, their lack of relationship. At the pitcher-season level (e.g., “2020 Clayton Kershaw“), the correlation between SwStr% and pitch height appeared weak at best. When you consider that no fastball is created equal and then introduce small-sample variance to the equation, the relationship could, understandably, become blurred at the pitcher level.

As a retort, I sent him the following graph, which shows SwStr% by pitch height for the three broad pitch classes as defined by Statcast, the source of the data. For reference, I’ve added black lines to indicate the average bottom, heart, and top of the strike zone:

If we zoom out and consider the question at the macro level, independent of context (what’s the average swinging strike rate for all fastballs by pitch height?), we can see that fastballs generate more swinging strikes up in the zone, a phenomenon our own Jeff Zimmerman touched upon here. This finding is mildly interesting in and of itself. But as I considered the matter further, the importance of swing frequency (Swing%) to SwStr% became clear (both use all pitches as a denominator). Regardless of efficacy, more swings will afford more chances for swinging strikes. As such, I anticipated that fastballs probably induce more swinging strikes up high than down low simply because hitters swing more frequently at high fastballs. Similarly (but inversely), non-fastballs would generate more swinging strikes down low instead of up high. The next graph all but affirmed my intuition:

Although the peaks of the bell curves cluster near the heart of the zone, we can see distinct differences in swing rate by pitch class at the thresholds of the strike zone. At its bottom edge, hitters are half as likely to swing at fastballs as they are at non-fastballs; at its top edge, twice as likely. Read the rest of this entry »