Will FIELDf/x Go Public? Should It?

For those among you out there who read FanGraphs regularly, chances are you have a copy of the Hardball Times Baseball Annual 2011. If so, pace around your mother’s basement, take your dog-eared copy off that book shelf, and flip to page eleventy-one with me (or do yourself a favor and purchase a copy here). Take a couple of minutes to (re)read Rob Neyer’s article documenting his giddiness of the potential of FIELDf/x, a new player-tracking system by Sportvision. Fully operational FIELDf/x camera systems will be installed in five stadiums by the end of this season and hopefully all 30 by 2012. Here’s an excerpt from Rob’s article describing FIELDf/x:

FIELDf/x will manifestly and forever revolutionize the evaluation of defense. In fact, I will venture that the defensive metrics in use today, whether by John Dewan or Sean Smith or David Pinto or Mickey Lichtman or anyone else, will in five years seem nearly as primitive as range factor does today. Because with FIELDf/x, we’ll know not just (approximately) where the baseball went and whether it was caught and who caught it (or didn’t). We’ll know exactly where the ball went and exactly how long it took a fielder to arrive and exactly how he got there. All the talk about range and getting a good jump and taking a good route — it won’t be just talk anymore. There will be cold, hard data for every bit of it.

What Rob touched on about taking a good route, our very own FanGraphBot Dave Allen investigated. Last summer, the Sideburn King gave a presentation at the annual PITCHf/x Summit on using FIELDf/x to assess fielders’ routes to fly balls. Sportvision had released sample data to several baseball analysts, and Dave took that data to determine the speed of an outfielder as they pursued a fly ball. He looked at the starting points of each fielder at the time of contact and how efficient the fielder was in getting to the ball by comparing the distance traveled by the player against the shortest distance possible. You can view this presentation over at Sportvision’s PITCHf/x Summit website. You’ll also get a great view of Dave Allen’s sideburns and crazy hairdo at work.

I’ve become increasingly interested in FIELDf/x myself, dreaming of potential analysis of the data. Several months ago, I wrote an article brainstorming ways we could potentially use FIELDf/x to analyze how Carl Crawford would fit in left field versus right field at Fenway Park.

The problem is, all I (and those of you who commented) could do was speculate. The initial samplings of the data from FIELDf/x’s first home AT&T Park were largely unavailable to the public, save for a few analysts who were invited to analyze its potential. MLB Advanced Media has released raw data from Sportvision’s PITCHf/x system for free to the public for teams, FanGraphs, The Hardball Times, Brooks Baseball, TexasLeaguers.com, you, me, and anyone with a computer and an Internet connection to use. But it’s looking more and more like that won’t be the case for FIELDf/x.

Yes, the potential of the data is massive. Yes, the Dave Allens, Mike Fasts, Harry Pavlidises, Jeremy Greenhouses, and other big-time PITCHf/x analysts of our world could unlock the location of the Holy Grail with full access to that data. Yes, in the public’s eyes and as fans of the game of baseball, we would love to see state of the art analysis and research done in the blogosphere.

But We The Fans will not decide the fate of FIELDf/x and its transparency. There are other stakeholders of the data who have their reasons not to make FIELDf/x data available to the public.

1) Competitive advantage vs. free labor
The release of PITCHf/x to the public allowed statheads to develop cutting edge baseball research, analyzing hitters’ and pitchers’ tendencies and weaknesses. In a few cases, careers were made out of the pool of PITCHf/x analysts who propped up since PITCHf/x first came on the public scene in 2007. Josh Kalk of the Tampa Bay Rays is one publicized example, who you can read about in Jonah Keri’s The Extra 2%. PITCHf/x analysts on sites such as this one, Baseball Prospectus, The Hardball Times, etc. are widely read by executives and analysts within baseball — some of whom scan the blogosphere in order to obtain “free labor” and analyses. For such teams, maybe it makes sense for them to advocate for public release of FIELDf/x data.

At the same time, those several teams known to invest heavily in their analytics departments do so in order to gain a competitive advantage over other teams, especially over those that ignore cutting edge PITCHf/x analysis or settle for free analysis. The teams who invest in analytics (which at this point are probably slightly more than half) hire programmers and analysts who develop proprietary software, databases, and system applications to varying degrees.

For these teams, it’s not in their best interest for FIELDf/x data to be public. The goal of an organization is not to develop state of the art analytics systems — the goal is to win games and championships. Yes, free release of the data opens the doors for dozens and hundreds of freelance analysts, who collectively would be much more progressive than a handful of analysts working for teams. But the more freelance analysts out there (or here, i.e., you and me) get a hold of freely available real time data, the less of an advantage the most analytically-minded organizations have over all the others.

2) Data deluge
I explained the sheer massiveness of the data before in my Crawford/FIELDf/x article:

FIELDf/x records high resolution shots 15 times a second, identifying every human on the field with each shot assigned to a time stamp. It also records events, such as when the pitcher releases the ball, the batter hits the ball, the fielder gains possession of a ball, and the fielder throws the ball. Whereas PITCHf/x gives us about 250 pitches per game, there may be up to 1 million FIELDf/x data entries recorded a game. This comes out to over 2.4 billion lines of data for each season describing the locations of fielders, baserunners, and umpires sorted by game and time stamp, scaling the petabyte level in memory.

I mentioned later in the post that much of the data will need to be filtered out — we don’t need to track a player when he’s warming up or jogging to and from the dugout between innings. Initially, the most useful parts of player tracking are when the ball is actually in play, which only occurs on a fraction of all pitches. The raw FIELDf/x data needs a lot of cleaning up to do, and it’s going to take more than a personal laptop computer with an Internet connection to be able to handle all of that data.

Sportvision and MLBAM will have the necessary resources to clean up the data — but even then, release of ready and massaged FIELDf/x data may not bring the same analyses that PITCHf/x produced because of the size of the data sets. What’s more likely is that current PITCHf/x analysts or independent baseball analytics consultants will be allowed to get a peek of the data like they have been — if those analysts aren’t already hired away by teams. I’m not sure if the general public would be able to do much with the data as few will have the databasing skills and computing power to handle such large data sets.

3) We The Fans
As hard as it is for me to say this and to stomach it, I am not sure that We The Fans need this data. At least, for the purposes of entertainment enhancement. ESPN, FOX, and local broadcasts can buy sets of summarized data (as opposed to bulk data dumps more useful to teams) from Major League Baseball to feed the public’s insatiable desire for baseball information. MLB Gameday can replay every fielding play in baseball, showing a replay of fielders’ positions, speeds, and movements during a play. We The Fans will still be able to enjoy the fruits of FIELDf/x through enhanced broadcasting experiences.

One idea is that MLBAM could assemble a team of programmers in order to centralize all of the FIELDf/x data mish-mashing and adjusting. Unlike the Major League Scouting Bureau (which centralizes scouting reports and rankings), these programmers would manipulate the data so that it is presentable and more useful to teams, so that teams can spend most of the analyst work hours on analyzing data and producing their own proprietary rankings rather than making the data workable.

That still begs the question of where teams get their FIELDf/x analysts if they don’t know who out there is able to analyze it. Well, we already have PITCHf/x, do we not? Of course, FIELDf/x analysis will be very different from PITCHf/x analysis, but I would argue that most PITCHf/x analysts have the technical skills and the baseballing knowledge to be able to transition well into FIELDf/x analysis. That’s why Sportvision went to independent PITCHf/x analysts like Allen, Fast, Pavlidis, Greenhouse, etc. with sample FIELDf/x data, right?

Overall, I am very excited for the potential of FIELDf/x. According to this Bloomberg article, FIELDf/x is now installed at Yankee Stadium and PETCO Park in addition to AT&T Park, with Kauffman Stadium and Tropicana Field next in line. By 2012, the hope is that all 30 stadiums have fully operational FIELDf/x systems, with the fruits of the data enhancing what is seen on broadcasts and web applications by 2013. I may never get the opportunity to see the raw data first hand, but sure am thankful that motion capture technology has evolved so that a significantly heightened baseball fan experience is just around the corner.

For more thoughts and ideas on FIELDf/x by much smarter people than me, see Tango’s blog post about it here.

——————————————————————————————————————————————————————————————————————

It’s been a wild ride the past senior year for me at Northwestern. And while I feel like I wouldn’t be writing for the best and certainly the coolest baseball blog on the Internet without my impassioned love for baseball and PITCHf/x, I most definitely would not have received this opportunity if not for the people in my life who have advised me and led me along the way. I feel so blessed to have been given this opportunity by David Appelman and Dave Cameron back in September, who have been, up to this point in my young life, the best bosses I have ever worked for. I am forever indebted to both of you for giving me the opportunity to write in these spaces, for dealing with my incessant emails, and for organizing quite possibly the most awesome spring training trip ever.

In particular, I’d also like to thank Carson, Niv, Eno, Dave (Allen), Eric, Jack, Chris, Paul, Robert, and all the other FanGraphs authors for making me feel at home, who have become more friends than co-workers during my time here. I can’t thank you guys enough for editing and advising my work and for all that I learned from you guys. I would also like to thank the readers and commenters who have kept me accountable on my writing and drove me to be at the top of my game.

In the meantime, I’ll be pursuing other opportunities in my life, and I believe my experiences at FanGraphs have done their part in preparing me for the real world. To quote Admiral Adama from Battlestar Galactica, “Gentlemen, it’s been an honor.”





Albert Lyu (@thinkbluecrew, LinkedIn) is a graduate student at the Georgia Institute of Technology, but will always root for his beloved Northwestern Wildcats. Feel free to email him with any comments or suggestions.

36 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Eddie
12 years ago

As a fan of both the Cubs and fangraphs, I hope this info is released publicly.