Examining SERA’s Predictive Powers

March 6, 2015

SERA, my attempt to estimate ERA with simulation, started off as an estimator. Then, later, I laid out ways to make it more predictive. Well, here’s the new SERA: a more predictive, more accurate and better ERA estimator altogether.

First, a refresher: The first SERA worked by inputting a pitcher’s K%, BB%, HR% (or HR/TBF), GB%, FB%, LD% and IFFB%. Then, the simulator would simulate as many innings as specified, with each at bat having an outcome with a likelihood specified by the input. A strikeout, walk or home run was simple; a ground ball, fly ball, line drive or popup made the runners advance, score or get out with the same frequency as would happen in real life.

To make SERA a better predictor of future ERA, I outlined a few major ways: not include home runs as an input (since they are so dependent on HR/FB rate, over which pitchers have almost no control), not include IFFB% for the same reason (it is extremely volatile and pitchers also have very little control over it) and regress K%, BB%, GB%, FB% and LD% based on the last three years of available data — or two or one if the player hadn’t been playing for three years. There were some other minor things, too.

One thing I forget to include in the last post was reaching on errors. This does happen in real life, so those were included in the distribution of batted ball outcomes. Here’s an updated table of outcomes by batted-ball type:

Ball In Play	Out	Single	Double	Triple	Home Run	Error
OFFB	75.4%	3.7%	7.4%	1.2%	12.1%	0.2%
GB	74.6%	21.7%	1.7%	0.1%	0.0%	2.0%
LD	31.8%	51.3%	14.8%	1.3%	0.6%	0.2%
PU	98.5%	0.7%	0.4%	0.0%	0.0%	0.4%

The numbers might not add up to 100% because of rounding errors. The data is from the past three years.

Here’s the new script. A few things about it:

It now asks for a league average HR/FB% and IFFB%. Those are almost always around 10% each, so input that if you’re unsure.
The output it gives is now “RA/9” — ERA doesn’t include runs scored because of errors, but this does — and MLB’s scoring rules are too complicated to incorporate into this. As such, this will be slightly higher than the actual ERA. Divide by 1.09 and you should get something slightly closer to ERA — there’s a second line after the RA/9 output that does just that.
You’ll also see a few new output lines for “Observed K%,” etc. That’s the actual K%, BB%, etc. from the simulation. The more innings you simulate, the closer those will be to your inputs.
It’s slower now, probably running at half the speed that it did before. But I’ll take that if it’s more accurate.

The biggest question about this that I wanted to answer was if it was a better predictor than xFIP, right now the holy grail of short-term ERA predicting. (But not long term. That becomes fuzzier). I didn’t exactly get the answer I wanted — it’s not really better — but the good news is that it’s actually pretty close, and it also correlates just as well as xFIP with current-year ERA.

One of the benefits of SERA’s predictive version — which I’ll call pSERA — is that since data from the last few years is included, it correlates with itself very well year-to-year, much better than any other ERA stat does. In fact, pretty much any ERA estimator in one year correlates better to next year’s pSERA than that estimator does to itself.

Here’s a table showing how well various ERA estimators (as well as the average of pSERA, FIP and xFIP, which I called BLEND) correlate to each other and to themselves year-to-year (the number shown is the r^2). The columns are “Year 1”; the rows are “Year 2.” So, for example, to figure out how well FIP in the first year correlates to pSERA in the second year, go to the intersection of row FIP and column pSERA.

		Year 2
		ERA	pSERA	FIP	xFIP	BLEND
Year 1	ERA	0.0796	0.2532	0.0954	0.1157	0.1532
	pSERA	0.1585	0.6233	0.232	0.3194	0.3889
	FIP	0.1384	0.4328	0.2232	0.2495	0.3137
	xFIP	0.1617	0.5525	0.2493	0.3461	0.3919
	BLEND	0.1721	0.5957	0.2656	0.34	0.4087

Minimum 200 TBF in both years, or roughly just fewer than 50 innings pitched

One caveat of this table is it only uses 2013 and 2014 data. Calculating pSERA is a pain — even more than calculating normal SERA — because not only do you have to simulate everything, you also have to calculate each pitcher’s inputs based on past data. So the only past years that I calculated were the last two. I don’t think that having much more data would change the numbers all that much, but there are some bigger error margins on those than there might be with more years included.

I also wanted to answer the question of how quickly this becomes predictive. At certain sample sizes, is pSERA better than xFIP? Maybe when the sample is really small, one is a better predictor than the other, but when the sample gets large, the two switch. My hope was that pSERA would be more accurate with a small sample size, which would be very useful.

Kind of. xFIP doesn’t overtake pSERA in predictive ability until about 150 TBF in both years, which would indicate that for pitchers with fewer innings, pSERA is better. Once again, there’s not so much data to back this up — only one two-year set — so this might not be totally right. The general trend, though, seems hard to disprove. Both obviously are more accurate with more innings pitched. xFIP is a little better after lots of innings, and pSERA seems to be better after not so many innings. (Note, by the way, that the y-axis is r, not r^2.)

But with little data, the fluctuations in the data are very pronounced. The above graph uses smoothed LOESS curves, which eliminate shaky dips and rises. But look at how much the actual data varies:

The table above used a cutoff of 200 TBF for the correlations, but if the cutoff had been 193 then we actually would have seen a higher correlation from pSERA than from xFIP. This is why the LOESS curve is useful, because it lets us see the general trend much more easily. With so little data (only 394 pitchers who pitched in both years and only 186 who had 200 TBF both years), removing a few pitchers by raising the minimum TBF a little bit can have drastic effects on the correlation.

Now, here in the same graph, are all of the metrics included in the correlations chart above:

I think we can point to pSERA as a good predictor of future ERA. It might not be quite as good as xFIP, but it works pretty well. The nature of it — how long it takes to calculate it for hundreds of different pitchers — makes doing any long-term analysis much harder, and it makes it harder to use lots of seasons for year-to-year analysis. But from this, it certainly seems to be better than FIP, and it has some advantages over xFIP as well. As of now, I would still use xFIP instead of this. But there’s certainly some room for improvement here. I’m thinking like using more rigorously projected inputs; I think with the best inputs available, this could become more effective than xFIP. The whole SERA concept, in general, also allows you to do some pretty cool things with exploring variability that I’ll go into later.

For now, I’ll leave you with this chart what pSERA says for every pitcher next season:

pSERA the RA/9 equivalent, pSERA-adj is the ERA equivalent — it’s equal to pSERA/1.09. The inputs are the data regressed using the past 3 years; the ERA-FIP-xFIP-TBF are all from 2014. You can ctrl-F within the spreadsheet to search for pitchers.

Update 3/7/15: The Excel Web App chart has been updated to now include pitchers traded during the 2014 season, who were previously missing. The pSERA values have also been changed due to the fact that they were originally (erroneously) calculated factoring in intentional walks and not hit batsmen. The BB% value is now (BB-IBB+HBP)/TBF, whereas before it was just BB/TBF.

BAL	CHW	LAA
BOS	CLE	OAK
NYY	DET	SEA
TBR	KCR	TEX
TOR	MIN	HOU

ATL	CHC*	ARI
MIA	CIN	COL
WSN	MIL	LAD
NYM*	PIT	SDP*
PHI	STL	SFG