Author Archive

Batted Balls: It’s All About Location, Location, Location

BABIP is a really hard thing to predict for pitchers. There have been plenty of attempts, sure, but nothing all that conclusive — probably because pitchers have a negligible amount of control over it. So naturally, when I found something that I thought might be able to model and estimate pitcher BABIP to a high degree of accuracy, I was very excited.

My original idea was to figure out the BABIP — as well as other batted ball stats — of individual pitches from details about the pitch itself. Velocity, movement, sequencing, and a multitude of other factors that are within the pitcher’s control play into the likelihood that a pitch will fall for a hit (even if to a very small degree). But much more than all of those, pitch location seems to be the most important factor (as well as one of the easiest to measure).

I got impressively meaningful results by plotting BABIP, GB%, FB%, wOBA on batted balls, and other stats based on horizontal and vertical location of the pitch. So I came up with models to find the probability that any batted ball would fall for a hit with the only inputs being the horizontal and vertical location (the models worked very well). I even gave different pitch types different models, since there were differences between, for example, fastballs and breaking balls. I found the “expected” BABIP of each of each pitcher’s pitches, and then I found the average of all of those expected BABIPs — theoretically, this should be the BABIP that the pitcher should have allowed.

Read the rest of this entry »


On the Consistency of ERA

We know that ERA isn’t a perfect indicator of a pitcher’s talent level. It depends a lot on the defense behind the pitcher in question. It depends a lot on luck in getting balls in play to fall where the fielders are. It depends a lot on luck in getting fly balls to land in front of the fence. It depends a lot on luck in sequencing — getting hits and walks at times where it doesn’t hurt too much.

That’s why we have DIPS. Stats like FIP, xFIP, SIERA, my recent SERA, and Jonathan Judge’s even more recent cFIP all attempt to more accurately measure a pitcher’s talent by stripping those things out. But what if there was an easy way to figure out how much ERA actually can vary? How likely a pitcher’s ERA was? What the spread of possible outcomes is? The aforementioned ERA estimators do not address that issue. They can tell you what the pitcher’s ERA should have been with all the luck taken away (or at least what they think the ERA should have been), but they can’t answer any of the questions I just posed.

Read the rest of this entry »


Examining SERA’s Predictive Powers

SERA, my attempt to estimate ERA with simulation, started off as an estimator. Then, later, I laid out ways to make it more predictive. Well, here’s the new SERA: a more predictive, more accurate and better ERA estimator altogether.

First, a refresher: The first SERA worked by inputting a pitcher’s K%, BB%, HR% (or HR/TBF), GB%, FB%, LD% and IFFB%. Then, the simulator would simulate as many innings as specified, with each at bat having an outcome with a likelihood specified by the input. A strikeout, walk or home run was simple; a ground ball, fly ball, line drive or popup made the runners advance, score or get out with the same frequency as would happen in real life.

To make SERA a better predictor of future ERA, I outlined a few major ways: not include home runs as an input (since they are so dependent on HR/FB rate, over which pitchers have almost no control), not include IFFB% for the same reason (it is extremely volatile and pitchers also have very little control over it) and regress K%, BB%, GB%, FB% and LD% based on the last three years of available data — or two or one if the player hadn’t been playing for three years. There were some other minor things, too.

Read the rest of this entry »


Towards a Better and More Predictive SERA

My last article introduced the concept of estimating a pitcher’s ERA using a simulation called SERA. As I pointed out throughout the article, SERA was strictly an estimator, not a predictor. That is, a pitcher’s SERA in one season wouldn’t do a great job predicting that pitcher’s ERA the next season. It’s more similar to FIP than it is to xFIP; descriptive rather than predictive.

But what if we want to create a simulator that predicts ERA for the future instead of just estimating what the ERA should’ve been? Some things are going to need to be changed — not just the code for the simulation, but also the inputs.

Read the rest of this entry »


Estimating ERA: A Simulated Approach

ERA, probably the single most cited reference for evaluating the performance of a pitcher, comes with a lot of problems. Neil does a good job outlining why in this FanGraphs Library entry. Over the last decade, plenty of research has cast a light on the variables within ERA that often have very little to do with the pitcher himself.

But what is the best way to use fielding-independent stats to estimate ERA? FIP is probably the most popular metric of this ilk, using only strikeouts, walks, hit batters, and home runs to create a linear equation that can be scaled to look like an expected ERA. Then there’s xFIP, which is based off the idea that pitchers have very little control over their HR/FB rate; to account for this, it estimates the amount of home runs that a pitcher should have allowed by multiplying their fly balls allowed by the league average HR/FB rate.

For many people, however, these are too simple. FIP more or less ignores all balls in play completely; xFIP treats all fly balls equally. Neither one correctly accounts for the effects that any ball in play can have; we know that the wOBA on line drives is much higher than the wOBA on pop ups, but we don’t see that reflected in many ERA estimators. The estimators we use also are fully linear, and may break down at the extreme ends; FIP tells us that a pitcher who strikes out every batter should have an ERA around -5.70, which is, well you know, not going to happen.

Read the rest of this entry »


Did Max Scherzer Really Have His Breakout in 2012?

Max Scherzer was on my fantasy baseball team in 2013. (Note: I recognize you don’t care about my fantasy team. This is in the service of a point, I promise.) My fantasy baseball team that year won the league championship, and Scherzer was a big reason why. I don’t remember if I thought to myself during the draft, “Hey, this guy is going to be really good because he had a 78 xFIP- last year,” or if I said, “Hey, whatever, it’s a late round, this pick won’t really matter. Why not take a flyer on this guy?” Scherzer wasn’t really much of anybody the year before, which is why I could get him late in my draft. Sure, he had a 3.74 ERA in 2012, and he won 16 games, but he certainly didn’t have the hype he does now.

Fast-forward to this offseason. Sooner or later, a real-life team will acquire Scherzer. He will be expensive, there’s no doubting that. And rightly so. Scherzer has established himself as one of the best pitchers in baseball. A true ace who has put up consecutive 5.5-win seasons, Scherzer now has a whole lot more value than pre-2013 Scherzer, who showed signs of promise but was just another pitcher who couldn’t put it together.

But how different is Scherzer now than he was two years ago? He’s two years older, of course. He’s a free-agent — as opposed to having two more years of team control. And he’s had three consecutive good (or better) years, instead of just one. But when you look closely, Scherzer is a very similar pitcher to who he was even before his Cy Young-winning 2013 campaign. And that’s not a bad thing.

Read the rest of this entry »


Updating and Improving The Outcome Machine

A little while ago, I wrote an article for the Community Research blog about projecting plate appearances before they happen based on the batter and the pitcher. It was pretty well received (which was nice, because I put some serious work into that thing), and apparently it was good enough for Dave Cameron to foolishly kindly decide to call me up to the big leagues.

If you read through the comments there (or if you left a comment!) you probably realized that no, the Outcome Machine — as the tool was dubbed — was not perfect. There were flaws in the way I conducted my research, and some of the assertions I made probably weren’t 100% true. So in this article, I am going to follow up on that first one and hopefully remedy any errors. Those include:

Read the rest of this entry »