A Q&A with Tom Tango: MLB’s New Data Guru by Dave Cameron June 8, 2016 Today, MLB Advanced Media is making a pretty exciting announcement, as they’ve brought in Tom Tango — who created many of the metrics we use here on FanGraphs, and is probably the closest thing this generation has had to Bill James in terms of advancing the understanding of the game — to serve as their Senior Database Architect of Stats. In other words, he’ll help facilitate the development and deployment of Statcast data. While the league has previously been somewhat reserved in discussions about which direction they would take this technology, the fact that MLB has brought in one of the game’s most respected public analysts, and is putting him in a position to develop tools for the public, seems like a great sign for the future of the data. To help put some context behind this announcement, I conducted a Q&A with Tango via email last week, and I think his answers were quite encouraging. Our conversation is below, and if you’re interested in hearing more, Tango also recorded an episode of the Statcast Podcast with Mike Petriello, which will be up shortly. On to the Q&A. DC: First off, congratulations on the new gig. TT: Thank you, I’ve been waiting for this day to officially arrive. DC: As someone who has spent a significant amount of time pushing forward the knowledge level of the baseball community, but who has also had his work removed from the public sphere at times due to jobs with the Mariners and Cubs, can you give us some idea of what went into the decision to join MLBAM? TT: I’ve worked all my life for various corporate America-type companies, and for the last 10 years, I’ve had a secondary job as a sports consultant, on nights and weekends. My day job is “what I did,” but my night job is “who I am.” While in the past I’ve had the possibilities to make sports my full-time job, relocation was a constraint, so I never made that jump. With MLBAM, all the stars aligned. First, it was baseball. Secondly, it was with Cory and his team, with whom I’ve always had a great relationship. Finally, it’s a train ride away. DC: When you and BAM started talking about this possibility, what did they say that made you think this was the right fit for you? And is this your sole full-time job now? TT: When I proposed to my wife, she said yes. We figured out how to have a great marriage just by being together. That’s what love lets you do. MLBAM proposed a job. A full-time job that allowed me to walk away from the “what I do” job, and have a sole “who I am” job. That’s all they needed to say. DC: What do you imagine your primary role will be at BAM? TT: The primary role would be to do as much as I can in data analytics, data management, metric construction, and just general saber-creativity. Basically try to parse through the mountains of data, and come up with an organization of the data that will allow for compelling stories. We have a thousand questions to ask, so I will come up with ways to present that data so that, when someone asks a question, that person will be able to come up with their own answer. DC: You’ve worn a lot of hats in the public community, ranging from developing the foundation of many of the metrics found here on FanGraphs to being more of an evangelist for other people’s work, encouraging younger writers to take up torches you weren’t allowed to publicly carry anymore. MLB has previously brought in Mike Petriello and Daren Willman to help the public learn about what Statcast is and how to use the information. Do you expect to fill that kind of role as well, or will you be more of a behind-the-scenes player, developing models and metrics for Mike and Daren to use? TT: This is very much a team-based approach. I don’t think it’s about some specific roles. Everyone has their strengths, and everyone is going to pitch in wherever and whenever they can. I might ultimately be the one who owns the data process, but the organization of that data is going to be influenced by a large group of people. I’m sure I’ll have some public work products. DC: And of course, the primary question the public community has had about Statcast involves the plans for the release of information. We’re self-interested beings, and we want to know how this affects us. You’ve straddled the line between being a notable member of the public community and having proprietary information that you had to keep to yourself for years. Did you get any kind of assurances from MLB about the public availability of the work you’re going to do for them? TT: The commissioner’s office drives the availability of the data. And MLBAM has been at the forefront of making data available to the public in such an easy to parse manner. MLB’s ultimate client base is the baseball fan, whether directly or through its various partners. If we can make a compelling story out of the data, something that leads to the direct enjoyment of baseball for the fan, then that’s going to drive the decision. DC: Can we expect to see more of your work than we have during your time working for teams, or will you still be limited in what you can say and what kinds of information you can provide? TT: Since my client base is now the baseball fan, whatever limitations I’ll have would be quite limited. DC: In regards to the public-versus-private nature of the data, what do you see as the best role for the community in regards to Statcast? TT: The more the community can do with the data, the better off everyone will be. Again, this is just about increasing the enjoyment of baseball for whatever segment of fandom enjoys that aspect. DC: Do you think your hiring shows that BAM is looking to develop metrics and models that have, to this point, been developed by third-party sites like ours, or is there a place for the public to attempt to take things like exit velocity and launch angle and build our own metrics based on that information? TT: Yes, I think it’s fair to say that I’ll be able to influence the construction of metrics that will ultimately make their way to the public. There are many different ways to take exit velocity and launch angle and create a metric around it. And even if it’s rolled out at MLBAM and then someone else in the public offers their insight, that can make its way into the metric, as well. Again, I see the team-based approach as not only everyone over at MLBAM, but also the saber community. We’re all on the same team, and we all just want to enjoy baseball, with whatever angle we can find. DC: Have you been given any insight into whether your role will be to help build the official suite of Statcast metrics, and those will be the ones the public is expected to use, or is your role going to include facilitating the use and analysis of data as its released? TT: I’m not sure what will be “official,” as opposed to trying different metrics and seeing what catches on with the community. But more generally, yes, I’d say those two things — the metric construction and the processing of the data for consumption — will be at the forefront. DC: With PITCHf/x, there was a lot of seminal work being done in the open, because writers and the teams were all getting the information at the same time, but teams — and people employed by teams, including yourself — have had access to TrackMan data and the f/x products for years now, and Statcast is only now becoming used regularly in public. Do you think the public community can still add significant value to pushing the game’s knowledge base forward with Statcast data in the same way it did with PITCHf/x data? TT: There’s no doubt that having 10,000 people looking at the data is better than 10 or 100. Open Source is the best thing about the internet. The sheer speed at which things get caught and resolved is astounding. If you only have 10 people looking at something, but that 11th person would have saved you from doing something embarrassing, then it benefits everyone to have that 11th person there. And 20th and 100th, and 1000th. DC: Does the lag in time between what teams have been doing and what is now available to the public make it less likely that the public will be able to discover something that team employees haven’t already known for years? TT: We’ve probably learned 1% of what we need to learn. And in 10 years, we’ll be at 10%. And that’s if everything is public and everyone works together. There’s really no end in sight here. It’s all about leverage. As much as you think you may have that “final” piece, all you have is the first of a hundred pieces. And everyone needs help finding that next piece. DC: Do you hope, or have you gathered from BAM during your discussions with them during the process of taking this job, that your hiring is part of a centralization process of data-related knowledge by the league, in an attempt to stop the fracturing of information across 30 separate entities? TT: MLBAM provides the same data to all the teams, so I don’t know that the information is fractured. However, the analysis of that data is unique for each team, as it should be. I think there are economies of scale that can be leveraged and minimize duplication of effort. I think the 30 teams are the ones most incentivized to do that, and so, the teams should be driving any decision on that level. DC: With teams rapidly building their analytical departments, the public research community has lost a lot of talent over the last few years. Do you see the department of which you’re part expanding and attempting to try and keep some of these people working in the public sphere, or is the continued splintering of smart researchers into positions where they can’t collaborate with each other simply an inevitable part of the market’s demand for teams to gain a competitive advantage? TT: I don’t see any change in direction or magnitude on that front. MLBAM has a different client base than the 30 teams. MLBAM is about what’s ultimately good for the fan, while the 30 teams is about winning the World Series. DC: To this point, the Statcast numbers that have received the most attention are exit velocity and launch angle, but it seems like results-based hitting metrics were generally seen as the most reliable, and in need of the least amount of upgrading. Are there base-running and fielding metrics that you’re hoping to push forward in your new role? TT: Yes, exit velocity primarily, and launch angle secondarily, have received the most attention, as it is warranted. In terms of base-running, whether for runners or catchers/pitchers, it really comes down to organizing the data. Whereas Mitchel Lichtman merged all those various components into one base-running figure which you can see at FanGraphs, Baseball Reference kept all the various components separate and distinct. So, I’m sure once I sit down and work with the data, I’ll try to come up with something that might be intriguing. DC: And we should probably just talk about fielding in general. When Statcast was introduced, defense was the first place everyone went when talking about how this could revolutionize the game and our understanding of it. What do you think would be the best use of Statcast data to advance the current state of fielding metrics? Do you think stats like UZR and DRS can be improved enough through the availability of better inputs, or is a better path forward to just build a new set of metrics based solely on Statcast data? Have you asked your new bosses yet whether you’re going to be building a new fielding metric at some point? TT: For fielding, that one has great potential. Mitchel Lichtman has also been the leader with regards to fielding, but he will be the first to tell you the major problem he’s got is that he doesn’t know the starting position of each fielder. Statcast gives us that. I don’t know how many terabytes of data we get out of Statcast, but 50% of the value will simply be knowing where the feet of all the fielders are, when the pitcher releases the ball. Just by doing that, it would essentially make all other fielding metrics obsolete. Then we can spend the other 99% of the time focusing on all the nuances that we can get out of Statcast that will end up giving us some interesting angles. So, the value is in knowing the starting position, but the fun is knowing all the little things that we watch baseball for. DC: Park factors seem to be a very interesting application of more granular data. The current public park factors rely on heavy regression to the mean over very long samples, because there’s so much noise in single-stadium results data. Additionally, PITCHf/x research found that there is even some variance in tracking data depending on where cameras are placed and how the systems are calibrated, and some research has shown this may be true of Statcast data, as well. How big of a priority will it be for you to provide park adjustments for the various numbers being put out publicly? TT: Establishing the park-impact numbers would be one of the first things I’d want to do. There’s nothing worse for data than a systematic bias that is unknown. DC: One of the other areas where the public has done some work on PITCHf/x data has been related to pitch classifications. Between Pitch Info, Baseball Info Solutions, and Inside Edge, there are a number of companies providing adjust pitch-type information to sites like FanGraphs, Brooks Baseball, and to teams themselves. Do you see upgrading the pitch-classification abilities of BAM’s algorithm as an area where you could help add value, or is the difficulty of accurately identifying pitch types in real time just a difficult problem to solve? TT: Right, I think the real-time classification is a huge burden. That, by itself, makes the comparisons not on an equal footing. The issue would therefore be post-game processing, and if/how that needs to get addressed. What I’d like to see is a comparison of MLBAM’s real-time classification to each of the other solutions post-game, and see what kind of differences of opinion we have. Is there some sort of systematic bias? Is it limited to guys like Felix where it’s easier to figure out what he’s throwing after he has thrown 100 pitches in a game or 3000 pitches in a season? For example, just as MLBAM is constrained to real-time classification, some of these providers are constrained to single post-game classification. Imagine you have a public researcher that then provided classifications only after every team has played 162 games. That researcher would make the claim that his classification is better than the providers who provide classification after each game. This might be one of those areas where leveraging knowledge is one where we can get advances that would benefit everyone. DC: What is one part of Statcast that you’re most excited about that you don’t think has gotten enough attention? Is there some part of this kind of tracking data that you’ve found particularly useful that you think has a lot of information still to be mined? TT: The starting position of each fielder is huge. And coming up with a method or metric to show exit velocity and launch angle as a “pair” would be quite a big deal as well. DC: Finally, what will this job mean for your blog? Are you going to continue to maintain that corner of the internet as a place for high-level discussion, or are the straight arrows going to have to find a new home? TT: MLBAM recognizes that my blog draws the saber followers, and they want that to continue. I know I need to engage with the Straight Arrow readers of the saber community. Half my inspiration comes from those engagements. I think we’re pretty much on the same page on just about everything. ***** Many thanks to Tom for taking the time to have this discussion. Having Tango working directly with the league, in addition to the way the league has developed Baseball Savant since bringing Daren Willman into the fold last winter, gives me the most confidence I’ve ever had that the league is serious about using this technology to further our understanding of the game. From my perspective, this is a pretty exciting day for baseball fans.