FIP vs. xwOBA for Assessing Pitcher Performance

At a basic level, nearly every piece at FanGraphs represents an attempt to answer a question. What is the value of an opt-out in a contract? Why do the Brewers continue to fare so poorly in the projected standings? How do people behave in the eighth inning of a spring-training game? Those were the questions asked, either explicitly or implicitly, by Jeff Sullivan, Jay Jaffe, and Meg Rowley just yesterday.

This piece also begins with question — probably one that has occurred to a number of readers. It concerns how we evaluate pitchers and how best to evaluate pitchers. I’ll present the question momentarily. First, a bit of background.

Fielding Independent Pitching, or FIP, is a well-known tool for estimating ERA. FIP attempts to isolate a pitcher’s contribution to run-prevention. It also serves as a better predictor of future ERA than ERA itself. The formula for FIP is elegant, including just three variables: strikeouts, walks, and homers. It does not include balls in play. That said, one would be mistaken for assuming that FIP excludes any kind of measurement for what happens when the bat hits the ball. Let this be a gentle reminder that home runs both (a) are a type of batted ball and (b) represent a major component of FIP. There is, in other words, some consideration of contact quality in FIP.

Expected wOBA, or xwOBA, is a newer metric, the product of Statcast data. xwOBA is calculated with run-value estimates derived from exit velocity and launch angle. Basically, xwOBA calculates the average run value of every batted ball for a hitter (or allowed by a pitcher), adds in the defense-independent numbers, and arrives as a wOBA-like figure. The advantage of xwOBA is that it removes the variance of batted-ball results and uses a “Platonic” value instead.

The introduction of Statcast’s batted-ball data is exciting and seems like it might help to better isolate a pitcher’s contributions. But does it? This is where I was compelled to ask my own, relatively simple question — namely, is xwOBA better for assessing pitcher performance than the more traditional FIP? What I found, however, is that the answer isn’t so simple.

The differences between FIP and xwOBA, as well as the similarities, deserve some exploration.

Both xwOBA and FIP include strikeouts and walks in their formula and both use runs as the basis for determining value. The metrics differ in two very meaningful ways. One difference is in presentation. FIP is made to look like ERA, probably the most widely used pitching statistic by most baseball fans. A good number for ERA is likewise a good number for FIP. On the other hand, xwOBA was created to look like wOBA, a popular sabermetric statistic that was made to look like on-base percentage. That difference doesn’t meaningfully change what the statistics do, but one is generally from a pitcher perspective (FIP) while the other is more easily associated with hitters (xwOBA).

The second difference comes in the batted balls measured. FIP uses only home runs. One could argue that FIP’s use of homers could be considered a proxy for all batted-ball contact, but I won’t be making that argument here. Alternatively, xwOBA — as I’ve said — uses the launch angle and exit velocity for all batted balls to provide a run value for those events. Both metrics convert the inputs to runs, FIP expressed as earned runs per nine innings and xwOBA in runs per plate appearance.

The stats work very similarly. To demonstrate this, consider the graph below, a plot of 72 pitchers who recorded at least 2,000 pitches in both the 2016 and 2017 seasons. The plot below shows xwOBA and FIP from last season with all xwOBA data from Baseball Savant.

The two produced similar numbers, although the similarities do not end there. The table below shows the correlation of the two metrics with ERA during those seasons.

Correlation with ERA
Metric 2016 2017
FIP 0.56 0.64
xwOBA 0.63 0.66
Numbers above represent r-squared figures.

The numbers were nearly identical last season, though FIP looks slightly off during the 2016 campaign. More on the 2016 data will come later, but first, the graph below shows those same 72 pitchers’ xwOBA in 2016 and 2017.

The relationship here is a pretty good one — much better than the r-squared for ERA among the same group, which comes to just .12.

Now compare those numbers to FIP from the last two seasons.

In terms of correlation, the r-squared is very similar to xwOBA’s, if a tiny bit behind. This, again, reinforces the similarities between the two statistics.

Now, let’s run another comparison testing their potential predictive capabilities, this time comparing 2016 xwOBA and FIP to 2017 ERA. First, xwOBA:

The relationship here isn’t the strongest one we’ve seen as we move further away from like groups, but there is a correlation between the two groups. Also, keep in mind: the r-squared is much higher than simply using ERA between the two seasons.

Now, let’s compare 2016 FIP to 2017 ERA.

Again, we receive almost identical correlations for FIP and xwOBA when compared to the following season’s ERA.

I said earlier I would get back to the 2016 numbers showing that xwOBA was more closely correlated to ERA than FIP. I broke down that season in half for the 72 pitchers and found something interesting — namely, that xwOBA had a stronger correlation between the first and second halves than FIP and a slightly higher correlation from first-half xwOBA to second-half ERA compared to first-half FIP and second-half ERA. The table below contains the results.

Relationship with ERA in 2016 by Half
2016 2nd Half xwOBA 2nd Half FIP 2nd Half ERA
1st Half ERA 0.03
1st Half xwOBA 0.26 0.16
1st Half FIP 0.17 0.12
Numbers above represent r-squared. Blanks above are on purpose to reduce confusion.

These numbers potentially indicate that, in smaller samples, xwOBA might stabilize more quickly and do a better job of predicting future ERA than FIP currently does. Unfortunately, this finding does not hold up to further scrutiny. Here are the same numbers as in the last table, except from 2017.

Relationship with ERA in 2017 by Half
2017 2nd Half xwOBA 2nd Half FIP 2nd Half ERA
1st Half ERA 0.09
1st Half xwOBA 0.24 0.12
1st Half FIP 0.25 0.21
Numbers above represent r-squared. Blanks above are on purpose to reduce confusion.

Compared to itself by halves, xwOBA did just as well in 2017 as it did in 2016, while FIP moved up to the same level after struggling to show a relationship in 2016. FIP came out better in its relationship to second-half ERA. Last year’s xwOBA relationship to second-half ERA didn’t quite measure up. It seems possible that 2016 was simply a very volatile year. It was the first full year with a juiced ball, and perhaps there was some added movement in the numbers. FIP is heavily reliant on homers as a measure of talent, so perhaps that helps explain the weaker results the season before.

We will need more years of data for further study, but thus far, I’m comfortable saying that xwOBA and FIP are pretty similar metrics. We have more data than we’ve ever had before about the quality of all batted balls, and there’s some hope this might lead us to a better, more confident metric as it relates to pitcher skill. From this data, that doesn’t appear to be the case. These results provide support — or perhaps just mimic — a similar study I ran back in August when I could not find a predictive skill in a pitcher’s quality of contact.

It’s certainly possible, maybe even likely, that a pitcher has control over the contact he gives up, but these numbers don’t support that finding with xwOBA. To be fair — if a metric needs to feel as though it is being treated fairly — FIP sets a fairly high bar when it comes to measuring pitchers, and being roughly as good as FIP is a victory in and of itself. So, good job xwOBA.

Craig Edwards can be found on twitter @craigjedwards.

Newest Most Voted
Inline Feedbacks
View all comments
6 years ago

while predictive value is important, there’s also explanatory power about past results.

in this case xwoba simply captures more information, namely the batted ball profile for a significant portion of balls in play. it’s useful in answering the question of ‘why is this pitcher over/under performing his FIP by so much.”

6 years ago
Reply to  awy

But the R^2 data shown above is explanatory value. Predictive value would compare, for example, 2017 projected ERA based on FIP with 2017 actual ERA, or 2017 projected FIP with 2017 actual FIP, etc.

xwOBA has no explanatory advantage over FIP from the data above.

In order to demonstrate that xwOBA explains why a pitcher over or under performed his FIP, a controllable skill for managing contact quality would have to prove statistically significant. I don’t believe such a showing has yet been made.

6 years ago
Reply to  ThomServo

uh this article is comparing 2016 with 2017. it’s not past-looking.

you’ve got this entirely backwards.