FIP vs. xwOBA for Assessing Pitcher Performance

At a basic level, nearly every piece at FanGraphs represents an attempt to answer a question. What is the value of an opt-out in a contract? Why do the Brewers continue to fare so poorly in the projected standings? How do people behave in the eighth inning of a spring-training game? Those were the questions asked, either explicitly or implicitly, by Jeff Sullivan, Jay Jaffe, and Meg Rowley just yesterday.

This piece also begins with question — probably one that has occurred to a number of readers. It concerns how we evaluate pitchers and how best to evaluate pitchers. I’ll present the question momentarily. First, a bit of background.

Fielding Independent Pitching, or FIP, is a well-known tool for estimating ERA. FIP attempts to isolate a pitcher’s contribution to run-prevention. It also serves as a better predictor of future ERA than ERA itself. The formula for FIP is elegant, including just three variables: strikeouts, walks, and homers. It does not include balls in play. That said, one would be mistaken for assuming that FIP excludes any kind of measurement for what happens when the bat hits the ball. Let this be a gentle reminder that home runs both (a) are a type of batted ball and (b) represent a major component of FIP. There is, in other words, some consideration of contact quality in FIP.

Expected wOBA, or xwOBA, is a newer metric, the product of Statcast data. xwOBA is calculated with run-value estimates derived from exit velocity and launch angle. Basically, xwOBA calculates the average run value of every batted ball for a hitter (or allowed by a pitcher), adds in the defense-independent numbers, and arrives as a wOBA-like figure. The advantage of xwOBA is that it removes the variance of batted-ball results and uses a “Platonic” value instead.

The introduction of Statcast’s batted-ball data is exciting and seems like it might help to better isolate a pitcher’s contributions. But does it? This is where I was compelled to ask my own, relatively simple question — namely, is xwOBA better for assessing pitcher performance than the more traditional FIP? What I found, however, is that the answer isn’t so simple.

The differences between FIP and xwOBA, as well as the similarities, deserve some exploration.

You Aren't a FanGraphs Member
It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.
We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.
1. Ad Free viewing! We won't bug you with this ad, or any other.
2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.
3. Dark mode and Classic mode!
4. Custom player page dashboards! Choose the player cards you want, in the order you want them.
5. One-click data exports! Export our projections and leaderboards for your personal projects.
6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)
7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.
8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.
9. A weekly mailbag column, exclusively for Members.
10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!
We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

Both xwOBA and FIP include strikeouts and walks in their formula and both use runs as the basis for determining value. The metrics differ in two very meaningful ways. One difference is in presentation. FIP is made to look like ERA, probably the most widely used pitching statistic by most baseball fans. A good number for ERA is likewise a good number for FIP. On the other hand, xwOBA was created to look like wOBA, a popular sabermetric statistic that was made to look like on-base percentage. That difference doesn’t meaningfully change what the statistics do, but one is generally from a pitcher perspective (FIP) while the other is more easily associated with hitters (xwOBA).

The second difference comes in the batted balls measured. FIP uses only home runs. One could argue that FIP’s use of homers could be considered a proxy for all batted-ball contact, but I won’t be making that argument here. Alternatively, xwOBA — as I’ve said — uses the launch angle and exit velocity for all batted balls to provide a run value for those events. Both metrics convert the inputs to runs, FIP expressed as earned runs per nine innings and xwOBA in runs per plate appearance.

The stats work very similarly. To demonstrate this, consider the graph below, a plot of 72 pitchers who recorded at least 2,000 pitches in both the 2016 and 2017 seasons. The plot below shows xwOBA and FIP from last season with all xwOBA data from Baseball Savant.

The two produced similar numbers, although the similarities do not end there. The table below shows the correlation of the two metrics with ERA during those seasons.

Correlation with ERA
Metric 2016 2017
FIP 0.56 0.64
xwOBA 0.63 0.66
Numbers above represent r-squared figures.

The numbers were nearly identical last season, though FIP looks slightly off during the 2016 campaign. More on the 2016 data will come later, but first, the graph below shows those same 72 pitchers’ xwOBA in 2016 and 2017.

The relationship here is a pretty good one — much better than the r-squared for ERA among the same group, which comes to just .12.

Now compare those numbers to FIP from the last two seasons.

In terms of correlation, the r-squared is very similar to xwOBA’s, if a tiny bit behind. This, again, reinforces the similarities between the two statistics.

Now, let’s run another comparison testing their potential predictive capabilities, this time comparing 2016 xwOBA and FIP to 2017 ERA. First, xwOBA:

The relationship here isn’t the strongest one we’ve seen as we move further away from like groups, but there is a correlation between the two groups. Also, keep in mind: the r-squared is much higher than simply using ERA between the two seasons.

Now, let’s compare 2016 FIP to 2017 ERA.

Again, we receive almost identical correlations for FIP and xwOBA when compared to the following season’s ERA.

I said earlier I would get back to the 2016 numbers showing that xwOBA was more closely correlated to ERA than FIP. I broke down that season in half for the 72 pitchers and found something interesting — namely, that xwOBA had a stronger correlation between the first and second halves than FIP and a slightly higher correlation from first-half xwOBA to second-half ERA compared to first-half FIP and second-half ERA. The table below contains the results.

Relationship with ERA in 2016 by Half
2016 2nd Half xwOBA 2nd Half FIP 2nd Half ERA
1st Half ERA 0.03
1st Half xwOBA 0.26 0.16
1st Half FIP 0.17 0.12
Numbers above represent r-squared. Blanks above are on purpose to reduce confusion.

These numbers potentially indicate that, in smaller samples, xwOBA might stabilize more quickly and do a better job of predicting future ERA than FIP currently does. Unfortunately, this finding does not hold up to further scrutiny. Here are the same numbers as in the last table, except from 2017.

Relationship with ERA in 2017 by Half
2017 2nd Half xwOBA 2nd Half FIP 2nd Half ERA
1st Half ERA 0.09
1st Half xwOBA 0.24 0.12
1st Half FIP 0.25 0.21
Numbers above represent r-squared. Blanks above are on purpose to reduce confusion.

Compared to itself by halves, xwOBA did just as well in 2017 as it did in 2016, while FIP moved up to the same level after struggling to show a relationship in 2016. FIP came out better in its relationship to second-half ERA. Last year’s xwOBA relationship to second-half ERA didn’t quite measure up. It seems possible that 2016 was simply a very volatile year. It was the first full year with a juiced ball, and perhaps there was some added movement in the numbers. FIP is heavily reliant on homers as a measure of talent, so perhaps that helps explain the weaker results the season before.

We will need more years of data for further study, but thus far, I’m comfortable saying that xwOBA and FIP are pretty similar metrics. We have more data than we’ve ever had before about the quality of all batted balls, and there’s some hope this might lead us to a better, more confident metric as it relates to pitcher skill. From this data, that doesn’t appear to be the case. These results provide support — or perhaps just mimic — a similar study I ran back in August when I could not find a predictive skill in a pitcher’s quality of contact.

It’s certainly possible, maybe even likely, that a pitcher has control over the contact he gives up, but these numbers don’t support that finding with xwOBA. To be fair — if a metric needs to feel as though it is being treated fairly — FIP sets a fairly high bar when it comes to measuring pitchers, and being roughly as good as FIP is a victory in and of itself. So, good job xwOBA.





Craig Edwards can be found on twitter @craigjedwards.

41 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
awy
7 years ago

while predictive value is important, there’s also explanatory power about past results.

in this case xwoba simply captures more information, namely the batted ball profile for a significant portion of balls in play. it’s useful in answering the question of ‘why is this pitcher over/under performing his FIP by so much.”

ThomServo
7 years ago
Reply to  awy

But the R^2 data shown above is explanatory value. Predictive value would compare, for example, 2017 projected ERA based on FIP with 2017 actual ERA, or 2017 projected FIP with 2017 actual FIP, etc.

xwOBA has no explanatory advantage over FIP from the data above.

In order to demonstrate that xwOBA explains why a pitcher over or under performed his FIP, a controllable skill for managing contact quality would have to prove statistically significant. I don’t believe such a showing has yet been made.

awy
7 years ago
Reply to  ThomServo

uh this article is comparing 2016 with 2017. it’s not past-looking.

you’ve got this entirely backwards.