Let’s Play With New Defensive Data by Jeff Sullivan March 14, 2017 Here’s the thing about catch probability: It’s existed for close to two decades. It’s at the heart of what we refer to as the advanced defensive metrics. Defensive Runs Saved, Ultimate Zone Rating — they couldn’t exist and do anything without catch probabilities in some form. It’s just that, for the longest time, those probabilities were generalized, educated guesses. You might’ve heard that baseball has entered the information era. Here’s a weekend tweet from Daren Willman: Here’s the new @statcast catch probability leaderboard we introduced @sabr https://t.co/I1NDOCRVFO — Daren Willman (@darenw) March 12, 2017 If you missed the link in there somehow, here it is again: the Statcast Catch Probability Leaderboard. We have most of two years of Statcast information, and now we’re getting to see it applied to player defense. Specifically, in this case, outfielder defense. If you don’t entirely understand what catch probability is, here’s the MLB.com glossary entry. Take a given fly ball or line drive to the outfield. What are the odds a given batted ball is caught? Statcast can tell us, by considering hang time and necessary distance to cover. This is the start of something beautiful. When you have one catch probability, that’s pretty cool. When you have a lot of them, you start to understand how good or bad certain players are at making catches. As with anything else, the data adds up over time. And now I should say a few things before going any further. People like Daren Willman and Mike Petriello are going to treat this data most responsibly. They’re there on the inside, and they know what should or shouldn’t be done with the numbers. And, the numbers will probably be tweaked down the line, since for now they don’t consider directionality or proximity to the wall. This is a simple and first attempt. And for what’s about to follow, I’ve performed my analysis using buckets. Not every player’s bucket is the same! Let me explain. Here are the existing buckets, and their actual overall conversion rates since 2015: 5 Star Plays: 8% caught 4 Star Plays: 42% caught 3 Star Plays: 68% caught 2 Star Plays: 84% caught 1 Star Plays: 93% caught Let’s take…Charlie Blackmon and Gerardo Parra? Sure, why not. Over the last two years, Blackmon has made five 5 Star plays, out of 69 opportunities. Parra has made six 5 Star plays, out of 57 opportunities. Based on the overall average for that bucket, Blackmon “should have” made 5.9 plays, and Parra “should have” made 4.8. But maybe Blackmon’s actual opportunities averaged out to a 10% catch rate, or a 5% catch rate. Maybe the same applies to Parra instead. Not every player’s opportunity buckets are equal, so everything, everything here is just an estimate. Statcast hasn’t yet eliminated defensive error bars. I just can’t help myself but get into the data. It’s presented in a way similar to the Inside Edge data we already have on the site, but the Statcast information should be better. So, even considering the caveats I’ve mentioned, how about we look at some of the best Statcast defensive outfielders? On the left side of this table, the top 10 players in plays made over expected. On the right side, the same, but per 1,200 innings. Statcast Outfield Defense, 2015 – 2016 Player +/- Plays Player +/- Plays/1200 Kevin Kiermaier 46 Kevin Kiermaier 26.7 Billy Hamilton 40 Billy Hamilton 25.0 Lorenzo Cain 37 Jake Marisnick 24.0 Jake Marisnick 34 Lorenzo Cain 21.5 Mookie Betts 32 Juan Lagares 19.4 Ender Inciarte 32 Byron Buxton 19.2 Adam Eaton 27 Ender Inciarte 17.5 Kevin Pillar 24 Travis Jankowski 16.4 Jason Heyward 24 Keon Broxton 15.9 Odubel Herrera 23 Peter Bourjos 15.1 SOURCE: Baseball Savant I’m not sure there’s a surprise in the bunch. Which is probably more of a good sign than a bad one — one wouldn’t think we’ve been completely wrong all this time. For as much as people have openly criticized the advanced defensive numbers, I think the bulk of the disagreement has centered on infield play, especially in the age of infielders moving around all over the place. We’ve long had a pretty good grasp on the outfield, I think. Statcast here mostly supports the information we already had. Kevin Kiermaier? Amazing! Billy Hamilton? Amazing! Keon Broxton? You better believe he’s amazing! Maybe one way of interpreting this is as further evidence that Kiermaier has been better out there than Kevin Pillar. I know that’s been fiercely debated, but Statcast knows more than most of us do. There’s still room for these numbers to be adjusted, so Blue Jays fans can continue to take some heart. Travis Jankowski has apparently got it. Peter Bourjos has apparently still got it. Moving on, we go to the other end. Statcast Outfield Defense, 2015 – 2016 Player +/- Plays Player +/- Plays/1200 Matt Kemp -39 Hanley Ramirez -23.7 Mark Trumbo -27 Mark Trumbo -22.1 Andrew McCutchen -22 Matt Kemp -18.3 Melky Cabrera -21 Robbie Grossman -18.1 Nick Markakis -19 Danny Valencia -13.6 Ryan Braun -17 Preston Tucker -13.4 Yasmany Tomas -16 Tyler Naquin -13.3 Hanley Ramirez -15 Daniel Nava -13.1 Jeff Francoeur -13 Jeff Francoeur -13.0 Jorge Soler -13 Jorge Soler -13.0 SOURCE: Baseball Savant Again, this largely supports what many already suspected. This is why the offensive bar for Matt Kemp is so high, at least for as long as he’s in the National League. I don’t know the exact run value of an outfield play not made, but my ballpark guess is right around one run. That, in turn, would put Kemp around -39 runs over two seasons. Now, we can’t ignore Mark Trumbo, though. Kemp is at -39 plays in 2,575 outfield innings. Trumbo is at -27 plays in 1,461 outfield innings. Which means Trumbo comes out worse, and better on a rate basis than only Hanley Ramirez, who sure as heck doesn’t play outfield anymore. The Orioles know that Trumbo is a bat-first player. They might not fully appreciate the extent to which that’s been true. Trumbo and Kemp are not so dissimilar. Trumbo is a league-worst -9 on 1 Star plays. Those are the easy ones, and Trumbo has converted nine fewer of them than you’d expect. Kemp has the worst mark on 5 Star plays and 3 Star plays, and he’s second-worst on 4 Star plays. Kiermaier is the easy leader in 4 Star plays; Hamilton leads everyone in the 5 Star category. Hamilton has made the greatest number of the most sensational catches, but Kiermaier has him beat elsewhere. We’re always interested in the surprises. So I ran some quick and easy math to try to figure out who those might be. For every regular or semi-regular outfielder over the past two years, I calculated +/- from the Inside Edge data set. I also gathered the range component of DRS, and the range component of UZR (because catch probability has nothing to do with, say, throwing arm). I then ran a regression using these three advanced metrics against the Statcast numbers. From there I could calculate an “expected” Statcast +/-. Important! UZR and DRS are adjusted for position. The Statcast information is not. So, in the following analysis, center fielders will look a little better, and corner outfielders will look a little worse. But anyway, here’s Statcast +/- against expected +/-: There’s a good and pretty linear-looking relationship. That’s what you’d like to see, if the data we already had was worth anything. But you’re here for the outliers! Here are the five players with the biggest positive differences between Statcast +/- and expected +/-: Adam Eaton Adam Jones Leonys Martin Jake Marisnick Lorenzo Cain These are players our numbers have probably underrated a little bit. Eaton, for example, might well be a defensive superstar. Meanwhile, here are the five players with the biggest negative differences between Statcast +/- and expected +/-: Nick Markakis Matt Kemp Kole Calhoun Bryce Harper Curtis Granderson These are players our numbers have probably overrated a little bit. This is too simple an analysis to grant too much weight, especially given the issues with playing different positions, but there should be some substance at the extremes, and these last 10 players listed are extreme. It’s something to think about as we all move ahead, eager to learn more from the information system we’ve wanted to have forever. Some of this, I’m sure, has been irresponsible, but who can think so clearly when given access to a new source of data? One of the major upsides of the Statcast system installation in the first place was that it seemed like it could give us unprecedented defensive clarity. We’re obviously not there yet, but every week brings us closer.