Why the Astros Didn’t Catch Chris Correa

The St. Louis Cardinals’ former director of amateur scouting, Chris Correa, is serving 46 months in jail for gaining unauthorized access to the Astros’ player information/evaluation database, codenamed Ground Control. A few days ago, MLB announced St. Louis’s penalty: they’d have to send $2 million and their top two draft picks to Houston.

From a network-security perspective, the case is interesting. It illustrates how difficult true network security really is, which raises the strong possibility that another team will attempt this in the future (if indeed one isn’t doing it right now).

Here’s a timeline of the incident up until it was made public:

  • March 2013 – April 2014: Correa accesses Ground Control using passwords of various Astros staff. (Source: David Barron and Jake Kaplan of the Houston Chronicle.)
  • June 2014: Deadspin posts leaked documents that were retrieved from Ground Control, mostly regarding trades or potential trades during the 2013 season. This action causes the Astros to contact MLB, who contacts the FBI to begin an investigation into the breach. (Source: Derrick Goold and Robert Partrick of the St. Louis Post-Dispatch.)
  • June 2015: Michael S. Schmidt of the New York Times reports that the Cardinals are the prime suspects in this investigation.

Why didn’t the Astros detect the unauthorized access themselves? I don’t know anything about how they ran their security team, so I can only speculate. But I do have several years of experience in the network-security industry. I’ll use those to provide a perspective.

First, Correa masqueraded as Astros personnel. According to the article by Barron and Kaplan cited above, Correa was able to guess or otherwise obtain the password of accounts belonging to general manager Jeff Luhnow, analyst Colin Wyers, and three minor-league players. The article contains details on what Correa did while logged in as these people, implying Ground Control keeps a record of user actions while they’re logged in.

Impersonating three different people was smart. If anyone looked at Ground Control’s logs, they wouldn’t see Correa snooping around. They’d see Luhnow, Wyers, or a player accessing information. In this case the threshold for “something fishy is going on here, I’d better report it” is very, very high. These people are all expected to access Ground Control as part of their jobs. If Correa had gained access another way, perhaps via an account he was able to create for himself or by exploiting a security vulnerability in a web page, that might’ve set off more (figurative) alarms.

In addition to Ground Control logs, there are logs from network devices that I’m betting Correa had to use in order to get to Ground Control. These device logs provided another way for the Astros to detect Correa. They didn’t, though, probably because network security is hard. Really hard.

Some Obstacles to Effective Network Security

The following problems are just a few I’ve noticed among customers in my own experience.

Misconfigured Devices
If you’re not collecting a log, you can’t analyze it for evidence of malfeasance. IT staff, often overworked to begin with, may install a device like a router, change the default password, make sure it works, and leave it be. These devices collect logs by default, but many have optional and even custom fields on which they can report. The trade-off is in disk space and storage: the more logs you collect, the more you have to store. IT is a cost center for many companies, so the focus is on minimizing money spent, not on creating a rich data source for security analysis.

Products like Splunk and LogRhythm exist to centralize, and provide reports on, logs from all manner of devices, including network devices but also applications like Ground Control itself. You can search for anything you want and call up trend reports and pie charts at will. But these products are only as good as the people who use them, which leads to the next problem…

Lack of Staff
Maintaining a Security Operations Center (SOC) is expensive. If you want your network monitored 24 hours a day, seven days a week, and 365 days a year, you need to hire nine people to cover shift changes, sick time, vacations, and so on. Then you have to train them and retain them just like any other employee. If you don’t have a 24 x 7 x 365 SOC, you risk getting breached when no one’s monitoring the network. You may never find out about it.

Employees are often the most expensive part of any organization, but proving a return on this particular investment is difficult. If the SOC team catches something, there’s no way to quantify the value of their work. Who can say where the attackers would’ve gone and what they would’ve gotten? But the longer this team goes without catching anything, the more management looks at the team’s budget and starts wondering what value those nerds in the SOC are really providing.

For these reasons I’ve seen many companies with a “Security Director” who is simply the IT person. This person may not have any network-security training whatsoever. They’re told they have to handle network security because they handle all the other computer stuff. CEO can’t log into her email? Printer’s acting up? Phone system’s down? Got hacked? These tasks often fall on the same person.

False Positives
A false positive is a security incident on which you raise an alarm but is actually expected behavior. Tackling this problem is difficult because each network has its own layout, its own set of users, and its own set of shared assumptions about how and when it’ll be used. The same is true of each application, especially custom-built ones like Ground Control.

Here’s an oversimplified but illustrative example. Let’s say the Astros do have people monitoring not only their Ground Control logs, but also their network logs, and a security analyst notices Luhnow logging in from the Dominican Republic at 3:51 AM on a Sunday. The analyst thinks “That’s a weird time for Jeff to be logging in, and I don’t think he’s in the DR.” They tell their boss, who tells their boss, and pretty soon Luhnow gets a phone call: “Did you log in from the DR at 3:51 AM?”

He replies yes, he took an unannounced scouting trip. This fact gets back to the analyst, who then removes “Jeff logging in from the Dominican Republic” from his mental checklist of “suspicious things I should report.” After all, no one wants to look incompetent in front of the guy who runs the team.

Correa is now free to log in from the Dominican Republic, or pretend he’s in the Dominican Republic, with Luhnow’s account whenever he wants. After enough of these false positives, Correa has a lot of latitude for when he can log in as Luhnow, Wyers, or any staffer.

Rise of the Analytics

The trend in the network-security industry that attempts to overcome these problems is the same one that’s been roiling the baseball world for the past decade: analytics. Specifically: algorithms that run in real-time or on a schedule, monitor log files and other data sources on the network, apply statistical techniques, and alert humans when the “probability that something bad happened” exceeds a confidence threshold.

The newest class of analytics would be perfect for the situation the Astros faced. User and Entity Behavioral Analytics (UEBA) claim to use machine-learning techniques along with network logs and other contextual data to establish behavioral patterns of all users on your network. When someone deviates from the pattern, an alert fires.

Theoretically, UEBA would’ve caught Correa earlier. But they didn’t exist in 2013, and we don’t know how closely Correa’s activity matched Luhnow’s. Additionally, security analytics aren’t a panacea. Many analytics systems suffer from the same problems as manual log analysis.

Consider:

False Positives
How do you write an algorithm that catches Chris Correa masquerading as Jeff Luhnow but lets the real Luhnow do his job without interruption? Over time, humans can learn the idiosyncrasies of the network or application they’re analyzing and adjust. In the example above, our human analyst did this. But teaching a computer is more difficult, and machine-learning techniques are in their infancy.

In the security industry we talk about needing to “tune” security analytics systems. By tuning, we mean giving feedback to the detection algorithms (or whoever writes them) to suppress alerts about which we don’t care, reserving our inboxes for the ones that are relevant. The easier a system is to tune, and the more feedback an analyst can give it directly — instead of having to file a bug, send an email, or some other long-running activity — the more useful it is.

Still, tuning can take weeks to months — and that’s if there’s a team dedicated to it, which brings us back to the problem of…

Lack of Staff
Analytics reduce, but don’t obviate, the need to retain staff. Someone must receive the alert, prioritize it along with the other things to which they must respond (the CEO still can’t log into her email, remember?), find the issue, remediate it, and (maybe) provide feedback to the system.

A human being can fail at any one of these phases. Consider the hack of Target stores in 2013: analysts saw alerts but decided they “did not warrant immediate follow up.” The analytics did their job; the humans, as they do from time to time, erred in theirs.

Additionally, if you depend on security analytics, you’re depending on a company’s ability to hire people to create useful ones. The intersection between “people who know network security well enough to create useful algorithms for detecting threats” and “people who know how to express these algorithms in production-ready code” is smaller than you might think. Several tools exist that purport to make writing analytics easier, but so far none have risen to the top.

It’s similar to baseball operations. You have people who know baseball well enough to come up with useful ways of analyzing data, and you have people who can write releasable code. There’s a Venn Diagram here. The folks on the outside can provide value, but the people in the center are in the highest demand.

Running Algorithms on Big Data
Large computer networks truly are big data. For large companies, analytics may need to process and store millions of log messages per day, everything from “Fred logged in to gc-db-03 at 12:35:34 GMT” to “Mary transferred 545667 bytes of data to the IP address 1.2.3.4 at 14:01:22 GMT.” These analytics may also need access to historical datasets going back a year or more. Advanced systems store not only the log, but the actual packets that crossed the wire. All of this data has to be available to multiple security analysts at once, as quickly as possible.

This Will Happen Again

I’ve left out a lot about running an effective network-security operation here, because there isn’t room. Network security is a complex problem that exists at the very boundary between human-computer interaction. It’s a discipline unto itself replete with dense textbooks, advanced degrees, and industry certifications. Despite being 15-plus years into the internet-connected age, no one has quite solved these problems yet.

The following is pure speculation: I suspect that, prior to 2014, the Astros had a typical network setup that probably included some log-management devices. I suspect that very competent people installed and maintained this equipment and set up the network so that (a) staffers could to their jobs but (b) the millions of people who visit Minute Maid Park every summer couldn’t access areas of the network they shouldn’t.

But I suspect the team didn’t prioritize log reviews/monitoring as highly as they could have. In 2013, very few companies were doing security analytics, but many were doing log management. Splunk and LogRhythm both existed. Regular, deep audits of network access or flow logs could have helped the team catch Correa before the leaked documents hit the web. Regular audits of Ground Control logs would also have helped.

I’m not knocking the Astros. The above two paragraphs are true of many companies — far more than you’d guess.

Like a homeowner who installs a burglar alarm after getting robbed, I bet the Astros and other MLB teams doubled down on network security after Deadspin posted the leak. The fallout within MLB circles was apparently huge. According to Dave Cameron:

The Astros didn’t benefit [from this situation]; they get a couple of lower-value picks and some mostly meaningless cash in exchange for some pretty seriously negative PR. Given the amount teams spend on their image, Houston came out in the red here. They got crushed when the trade transcripts were leaked to Deadspin. Crushed.

I’m sure the Astros pay much more attention to their network and application security now. But this kind of thing will happen again — if not to the Astros, then some other team. Millions of dollars are at stake, not to mention the bright sheen of a World Series championship. The only question is: who is the next Chris Correa?

We hoped you liked reading Why the Astros Didn’t Catch Chris Correa by Ryan Pollack!

Please support FanGraphs by becoming a member. We publish thousands of articles a year, host multiple podcasts, and have an ever growing database of baseball stats.

FanGraphs does not have a paywall. With your membership, we can continue to offer the content you've come to rely on and add to our unique baseball coverage.

Support FanGraphs




Ryan enjoys characterizing that elusive line between luck and skill in baseball. For more, subscribe to his articles and follow him on Twitter.

newest oldest most voted
lesmash
Member
Member
lesmash

Fantastic article, Ryan. I learned a great deal from reading your piece.

Question about your claim that ‘this kind of thing will happen again’:

Do you suspect that Correa’s lengthy jail sentence, coupled with his complete blackballing from the sport, is not a significant disincentive for someone contemplating this behavior? Given how little money these guys make, isn’t the risk vs reward balance heavily skewed on the risk side?

cmarts cups
Member
cmarts cups

Yeah, this some some great insight into security, but I just don’t get the premise that MLB team employees are chomping at the bit to hack into another team’s system and face serious prison time.

I am the Rockies fan
Member
I am the Rockies fan

I wouldn’t say chomping at the bit, but with that logic there would be very few people in prison, and no thrill junkies or thieves who steal though they need nothing.

People often commit crimes knowing full well potential consequences, it’s not like Correa or any other serious hacker/criminal had no idea of what would happen if he was caught. Criminals are very often opportunists, so if you give them an opening (like say the password to another teams database), and a half decent motive, whether a personal grudge or something else, this could easily happen again.

Rollie's Mustache
Member
Member

I agree. Punishment is rarely a deterrent for breaking a set of rules. Look at Jenrry Mejia. He knew he’d be kicked out of MLB forever if he got caught a 3rd time for PEDs and he did it anyway.

jdbolick
Member

There was a massive incentive for Mejia to take that risk in the form of a major league baseball salary. Lower level team employees aren’t getting paid anywhere near what the players do, and there wouldn’t appear to be any opportunity for them to turn performance into a massive raise. Basically, while there may be significant benefits to an organization hacking another if they remain undetected, there wouldn’t appear to be much benefit for any particular employee of that organization to do so. That means they’re exposing themselves to extremely serious risk for a minimal personal gain. Maybe someone desperate to remain in baseball would try it, but I have to think that anyone who would have the tools to make a reasonable attempt wouldn’t be in a position of such desperation.

victorvran
Member
victorvran

Wasn’t a lot of the prison time for violating HIPPA? If someone was to access data that wasn’t medical records, they’d like see a far shorter prison sentence to (from my understanding of what happened)

Paul G.
Member
Member
Paul G.

Punishments are a disincentive to performing certain behaviors. However, its one factor in many. There are people who would never have done this even if guaranteed they would not get caught. There are people who desperately want to be successful as a baseball executive, do not have the requisite skills, and are willing to do pretty much anything. There are people who are quite willing to hack but are not willing to risk 4 years in jail for it. And there are people who don’t need to hack but will do it anyway because they are ruthless and arrogant.

If the punishment was the death penalty, there would still be people willing to do it. It would be a lot less people than with a 4 year penalty which is still a lot smaller than if there was no punishment at all.

jianadaren
Member
jianadaren

It’s exactly zero disincentive for a team to hire somebody confident in their ability to cover their tracks.