Towards a Better and More Predictive SERA

My last article introduced the concept of estimating a pitcher’s ERA using a simulation called SERA. As I pointed out throughout the article, SERA was strictly an estimator, not a predictor. That is, a pitcher’s SERA in one season wouldn’t do a great job predicting that pitcher’s ERA the next season. It’s more similar to FIP than it is to xFIP; descriptive rather than predictive.

But what if we want to create a simulator that predicts ERA for the future instead of just estimating what the ERA should’ve been? Some things are going to need to be changed — not just the code for the simulation, but also the inputs.

First, a correction can be made to the fundamental way the simulator works: in the previous iteration, the program wouldn’t care about how many outs there were when determining how the runners on base advanced. That was wrong — runners take extra bases at shockingly different rates depending on how many outs there are. So a new version of the simulator could account for that and look at the out state in deciding how runners should advance. This would not really help with the predictiveness of the simulator, but it would help with overall accuracy and is a necessary adjustment. (Thanks to Peter Jensen for the heads-up there.) Here is an Excel table and here is a .csv table with runner-advancing frequencies.

To make SERA more predictive, the number one thing that has to be changed is eliminating home runs as an input. Pitchers have very little control over their home run per fly ball rate, and multiplying fly balls by an average HR/FB% predicts future home runs much better than home runs do (this is why xFIP exists). So to make our simulator more predictive, we have to incorporate home runs just like we do for any other hit (for an explanation on how that’s done, go back to my first article on this topic).

Another input that’s also very unstable year-to-year, despite not getting as much attention as HR/FB%, is IFFB%. IFFB%, or infield fly ball percentage, is not the percentage of batted balls that are popups. Instead, it’s the percentage of fly balls that are popups. IFFB% has a year-to-year r^2 of less than 10%, at least until you get over 100 IP in both years; even using the past three years of data, the r^2 never gets higher than 10% until again you start setting high filters. But pitchers do have a decent amount of control over the percentage of all batted balls that are popups, probably because FB% is pretty stable. So to account for the instability in IFFB%, the league average will be used instead of the pitcher’s individual rate.

Ground ball and fly ball percentages, too, should be regressed somewhat. I found that using the past 3 years of data (kind of like how Marcels work), weighted both by recency and balls in play for each season, provides a much better estimate of the next year’s number. The best weights for the past 3 years for FB% and GB% are .24, .3, and .46, with the more recent years having the higher coefficient. But then each year should be adjusted for how many balls in play the pitcher allowed to get an even better estimate. Here is the full formula for projected GB% based off of the past 3 years:

(Year 1 GB%*Year 1 BIP*.24 + Year 2 GB%*Year 2 BIP*.3 + Year 3 GB%*Year 3 BIP*.46) ÷ (Year 1 BIP*.24 + Year 2 BIP*.3 + Year 3 BIP*.46)

Fly ball percentage is the same thing just with fly balls, and then line drive percentage is just 100%-GB%-FB% — line drives are pretty unstable, so they don’t get the pleasure of being determined by a regression equation (the three all have to add up to 100% anyways, so line drives are just the filler).

The same can be done for strikeout and walk percentages. The formula is a little bit different for those two. For strikeouts, the best weights were .085, .315, and .6; for walks, they were .17, .33, and .5 (both with the most recent season weighted the most, naturally). Again, weighting each year by how much the pitcher pitched will get more accurate results, so the above formula for ground balls applies again — just use the different weights, and use TBF instead of BIP. The weighted formula works better for K% (53% r^2) than for BB% (34%), and GB% and FB% have an r^2 closer to the K% than the BB%. For comparison, year-to-year K% has an r^2 of 36% and BB% has an r^2 of 12% (these are for all pitchers, without any filters; when you set minimums such as 200 TBF in each season, the r^2 does go up considerably for each).

And then there’s the issue of intentional walks. There were some pretty convincing comments on the first article about why they should not be included in the pitchers’ rates. But the fact is, intentional walks do happen, and not including them would skew the simulator so that fewer runs were produced than happen in real life. So what I am going to do is not include intentional walks in the inputted BB% but instead make them happen sometimes based on the base state (and nothing else). I’m doing this for a few reasons: the base state has an enormous effect on intentional walk rate (see the table below this paragraph), pitchers have little control over their intentional walk rate (r^2 is under 10% for both year-to-year IBB% and year-to-year IBB/BB), and while score and inning do affect intentional walk rate, the simulator can’t incorporate them. This is again something that will not necessarily make SERA more predictive but will make it more accurate.

Base NIBB% BB% IBB% IBB/BB Frequency
X23 9.7% 18.3% 8.7% 47.2% 2.2%
XX3 10.2% 13.4% 3.2% 23.7% 2.9%
X2X 10.0% 13.1% 3.1% 23.7% 8.5%
1X3 7.0% 7.5% 0.4% 5.9% 2.9%
12X 7.4% 7.4% 0.0% 0.5% 6.5%
XXX 7.1% 7.1% 0.0% 0.0% 57.1%
1XX 6.4% 6.4% 0.0% 0.1% 17.7%
123 6.1% 6.1%  0.0%  0.0% 2.3%

Furthermore, the question of how much control pitchers have over balls in play (once they’re already in the field) is very important to this simulator. We know that pitchers have some authority over their ground ball and fly ball rates, but what can they control past that? Is there such a thing as  a pitcher who is naturally good or bad at preventing balls in play from becoming hits? If there is, we need to figure out a way to account for that. But looking at the numbers, it is pretty clear to me that keeping balls in play from becoming hits is not within the pitcher’s power, at least other than influencing the type of batted ball.

To find this out, I made some new metrics for myself: wOBAFB, wOBAGB, wOBALD, and wOBABIP. They are exactly what they sound like: wOBA on fly balls, wOBA on ground balls, wOBA on line drives, and wOBA on balls in play. These statistics all have an extremely low year-to-year correlation, with wOBAFB having the highest with a 4.24% r^2 (in other words, only about 4% of a pitcher’s wOBAFB can be predicted by his previous year’s wOBAFB).

wOBAFB

wOBAGBwOBALD

Screen Shot 2015-02-13 at 12.41.45 PM

It was pretty surprising to me that wOBABIP had such little stability – if pitchers can control their GB/FB/LD splits, even just a little bit, shouldn’t they have more influence over their wOBABIP seeing as grounders, fly balls, and line drives all have very different run values? I guess the answer is that the individual wOBAFB, wOBAGB, and wOBALD are so volatile that it cancels out any stability from the batted ball profile. (It probably doesn’t help that LD% is also not very stable.) What this does not mean, however, is that the batted ball profiles are unimportant. It just means that in a small sample — 250 innings or less — the batted ball profiles aren’t enough to greatly affect a pitcher’s wOBA allowed. Jeff Zimmerman found the same in a recent article at The Hardball Times. So we can forget about needing to include that in SERA.

What Jeff also found there was that pitchers don’t have a lot of control over sequencing. They don’t have zero control, but for the purposes of our model, we’ll assume they do and just admit it’s a potential source of weakness for the estimator.

What about holding runners? Max Weinstein has done some research indicating that pitchers have a lot more control over holding runners than catchers do. Do we need to account for the fact that some pitchers are better at holding runners than others? Let’s take a look at how stable [my crudely calculated version of] wSB is with each pitcher on the mound:

Screen Shot 2015-02-13 at 12.42.02 PM

After looking at that graph, I’m not so sure that we do. Yes, pitchers of course have some ability to control the running game. But it doesn’t look like it’s enough (year-to-year r^2 is under 7%) to merit including a whole new input.

But that brings up a new question: how do we account for stolen bases in general? This was missing from the first version of SERA. To make up for that, I did the same thing that I did with intentional walks: made them happen every so often, with the frequency of attempts and successes varying by the base-out state. But there is no effect from the pitcher.

That, so far, is what I have for the new version of SERA; I’m sure there are little things that a pitcher can control or that happen in a game which are missing, but that’s what revisions are for. In my next post, I’ll once again post the source code, as well as examine how well it works.





Jonah is a baseball analyst and Red Sox fan. He would like it if you followed him on Twitter @japemstein, but can't really do anything about it if you don't.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Str8chaser
9 years ago

Were you thinking of implementing any defensive errors?