Thoughts on Carl Crawford and FIELDf/x

There’s been quite a lengthy discussion covering multiple angles at Dave’s post on the Carl Crawford signing with the Red Sox. From the unprecedented value of the contract for a player of that type to how the Boston lineup should look to what this means for the future of free agency, a variety of interesting debates stem from a transaction of this magnitude. The one thought that intrigued me most, however, was the argument that playing left field at Fenway Park could diminish the value of Crawford’s defensive range because of the Green Monster. The idea is that Crawford’s speed and range in the outfield may be better served at right field since the left field wall limits the ability for Crawford to chase down balls as many bounce off the wall that would be outs in other ballparks.

At the same time, Boston’s regular right fielder J.D. Drew is known to have a pretty good cannon of an arm. Even at the age of 34, Drew was rated a 62 in arm strength and a 69 in arm accuracy by Tom Tango’s Fans Scouting Report, the second of which was tenth among rightfielders last season. FSR rated Drew a better thrower than the other Sox outfielders, be it Crawford, Jacoby Ellsbury, or Mike Cameron. Conventional thought also says to place Drew in right field in order to maximize the utility from his good arm, mostly in situations for runners going from 1st to 3rd.

So on the one hand, you got the possibility of the Crawford/Ellsbury tandem roaming the deepest parts of Fenway in center-right, maximizing Crawford’s fielding value. On the other hand, Drew’s plus arm is a good fit for right field as well. With expected values pulling against one another, how can the Red Sox outfield arrangement be optimized according to Fenway’s ballpark dimensions?

Without fielding metrics to accurately weigh the value of throwing ability against defensive range given Fenway’s dimensions, we may have to rely on scouts to make an assessment. However, I believe that proper analysis of FIELDf/x will answer such questions definitively. Several months ago, a group of highly intelligent and single-minded individuals gathered in a darkly lit room (as it appeared on webcam) at the home city of your 2010 World Series Champions to listen to presentations on FIELDf/x, what looks to be the future game-changer of baseball analysis.

FIELDf/x records high resolution shots 15 times a second, identifying every human on the field with each shot assigned to a time stamp. It also records events, such as when the pitcher releases the ball, the batter hits the ball, the fielder gains possession of a ball, and the fielder throws the ball. Whereas PITCHf/x gives us about 250 pitches per game, there may be up to 1 million FIELDf/x data entries recorded a game. This comes out to over 2.4 billion lines of data for each season describing the locations of fielders, baserunners, and umpires sorted by game and time stamp, scaling the petabyte level in memory.

You can already start imagining how FIELDf/x can inform the Red Sox about their outfield. For instance, let’s look at Crawford’s case. With a full FIELDf/x database on a server dynamic enough to display timed animation of fielding routes, we can draw a baseline for True Defensive Range (coined by Greg Rybarczyk). Looking at away teams, we can estimate how a leftfielder’s TDR is affected when playing at Fenway versus when playing at other ballparks. Distribution curves of varying TDR values can be plotted for leftfielders at Fenway versus that of leftfielders at other ballparks, giving us an idea of how Crawford’s TDR should be affected by Fenway if we look at his TDR comparables.

For Drew, we can improve our assessment of arm strength by accounting for 1st-to-3rd baserunning situations and runners thrown out at the plate while accurately categorizing every throw a right fielder makes at any ballpark. Those situations can be compared with would-be 1st-to-3rd baserunning situations that are hit to left field instead (where the runner at 1st doesn’t think twice about passing 2nd). In whatever method is chosen, throwing ability can be more accurately assessed, whether it’s comparing its value in LF and RF at Fenway or comparing rightfielders across baseball. The theory is then that Crawford’s plus-plus range and plus arm versus Drew’s plus range and plus-plus arm can be quantitatively assessed in order to figure out who is a better fit in LF or RF at Fenway.

This seems like a lot of work in order to make one decision, deciding on switching two corner outfielders who are both already good at defense. However, what can be taken away from this FIELDf/x thought experiment is the breadth of questions that FIELDf/x can be utilized to answer — if assembled and analyzed correctly.

When FIELDf/x is fully operable in all 30 MLB ballparks, you can bet that some front offices will be all over this more readily than others. Some clubs will already have internal FIELDf/x-ready systems in place to filter out a large fraction of the massive dataset in order to store, read, mine, and analyze the meaningful bits (while other clubs will remain clueless). Whereas SQL databases are great for storing PITCHf/x data, it may not be enough to store FIELDf/x data. Clubs may have to find a more dynamic database system if they want to preview animation of fielding plays while sifting through terabytes of data.

You might see a few front offices hire a team of programmers and developers just for handling FIELDf/x data alone. It may also help them to create a team comprised of baseball minds, maybe a few analytically-minded scouts, just to figure out what they want from this data. For all the busy work and limited time a front office has, focusing on an organization’s needs (baserunning ability, infield defense, or outfielder throwing ability) may be more efficient than laying the framework for an exhaustive study on every aspect of fielding. Organizations will tackle FIELDf/x at different magnitudes as well as from different angles, extracting needs-based information for team-specific analyses.

Decisions such as whether or not the Red Sox should move Crawford to right field will soon be more answerable. FIELDf/x should pave the way for better analysis (and it already has) of any fielder’s reaction time, true defensive range, fielding routes, decision making, and more. As Rob Neyer said, FIELDf/x is going to change everything.

How would you use the data?





Albert Lyu (@thinkbluecrew, LinkedIn) is a graduate student at the Georgia Institute of Technology, but will always root for his beloved Northwestern Wildcats. Feel free to email him with any comments or suggestions.

28 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Joe R
13 years ago

Whereas SQL databases are great for storing PITCHf/x data, it may not be enough to store FIELDf/x data. Clubs may have to find a more dynamic database system if they want to preview animation of fielding plays while sifting through terabytes of data.

Hmmm I already work as a data analyst and use a massive hadoop cluster.

Too bad I’m not actually the systems admin on it.

Marver
13 years ago
Reply to  Joe R

Assuming the software design team for FIELDf/x creates a strong output form — as in, byte-by-byte positioning of each relevant piece of information — for each record, I don’t think it’d be very difficult to manipulate through PERL, C, and R to create whatever you want with the data. 2.4 billion lines of data is not daunting in the world of large-scale analytics, especially when real-time performance can be ignored, as it can here.

I am very intrigued to learn more about FIELDf/x.

Marver
13 years ago
Reply to  Marver

That was a retort about the SQL quote, not about anything in your post in particular. Just fyi.

Joe R
13 years ago
Reply to  Marver

Yes, I’m actually intrigued on what kind of data could be collected.
I’m sure math/stat geeks could have a field day w/ this stuff.

Pun not intended.

Marver
13 years ago
Reply to  Marver

Albert,

That’s what I was getting at when I said that it greatly depends on how well the software team implementing FIELDf/x writes its output.

If the ouput is in some sort of fixed format where each line indicated a moment in time, and some sort of setup where bytes 1-4 represent the batter code, bytes 5-8 the pitcher code, bytes 9-12 the first baseman code, etc. up until you reach the actual play data where (for example) byte 100 represent the result of the play (like an A here would represent double-play, B would represent strike, C would represent homerun, etc.), then it wouldn’t be too difficult to parse data effectively.