Let’s Fix MLB’s Salary Arbitration System: Evidence and Admissibility

January 29, 2019

Perhaps the most commonly discussed issue with the current arbitration system is the pervasiveness of traditional metrics, like home runs and runs batted in, over more advanced metrics like WAR and wRC+. Last time, we talked about how arbitrators use those metrics, and how they have slowly begun to garner greater acceptance as part of arbitration decisions, despite misgivings from agents some agents about whether or not they are properly understood or used by arbitrators. This time, we’re going to explore in greater detail the metrics and evidence itself – and see where there might be a possibility for improvement.

The Collective Bargaining Agreement provides a fairly straightforward list of criteria arbitrators are allowed to consider when ruling on a player’s salary.

The criteria will be the quality of the Player’s contribution to his Club during the past season (including but not limited to his overall performance, special qualities of leadership and public appeal), the length and consistency of his career contribution, the record of the Player’s past compensation, comparative baseball salaries . . ., the existence of any physical or mental defects on the part of the Player, and the recent performance record of the Club including but not limited to its League standing and attendance as an indication of public acceptance . . . . Except as set forth in subsections 10(b) and 10(c) below, any evidence may be submitted which is relevant to the above criteria, and the arbitration panel shall assign such weight to the evidence as shall appear appropriate under the circumstances. The arbitration panel shall, except for a Player with five or more years of Major League service, give particular attention, for comparative salary purposes, to the contracts of Players with Major League service not exceeding one annual service group above the Player’s annual service group. This shall not limit the ability of a Player or his representative, because of special accomplishment, to argue the equal relevance of salaries of Players without regard to service, and the arbitration panel shall give whatever weight to such argument as is deemed appropriate.

Helpfully, the CBA also gives us evidentiary rules outlining what criteria is not admissible:

(i) The financial position of the Player and the Club;

(ii) Press comments, testimonials or similar material bearing on the performance of either the Player or the Club, except that recognized annual Player awards for playing excellence shall not be excluded;

(iii) Offers made by either Player or Club prior to arbitration;

(iv) The cost to the parties of their representatives, attorneys, etc.;

(v) Salaries in other sports or occupations.

Here’s further detail on what can be used:

Only publicly available statistics shall be admissible. For purposes of this provision, publicly available statistics shall include data available through subscription-only websites (e.g., Baseball Prospectus). Statistics and data generated through the use of performance technology, wearable technology, or “STATCAST”, whether publicly available or not, shall not be admissible.

Now that we know what can and can’t be used, let’s see if that list is actually optimized to the era of the file-and-trial. You might be surprised to see that Statcast data can’t be used. After all, don’t we want modern metrics to be part of this process? But there are a few reasons limiting its usage makes sense from the players’ side.

First, it’s important to know that Statcast has limitations; not only is the data primarily gathered and sorted by the league, but the precision of the data, and the metrics built on it, is still being refined. R.J. Anderson of CBS Sports talked to Rob Arthur about this two years ago.

“It is incredibly powerful, when it works and is released to the public,” FiveThirtyEight baseball columnist Rob Arthur said to CBS Sports. “The trouble is that it doesn’t always work, and often isn’t released to the public.”…

Arthur has firsthand knowledge of Statcast’s shortcomings. Last August, he uncovered Statcast’s difficulty with tracking batted balls that had “atypical trajectories,” a group comprising more than 10 percent of all balls put into play. Perhaps another equally concerning controversy surfaced this April when a league-wide velocity bump was revealed to be a byproduct of MLB changing its measurement preferences from PITCHf/x to Statcast without notice. There’s also the matter of ballpark bias as it relates to exit-velocity recordings.

And FanGraphs alum Mike Petriello told Anderson that “[w]e know there are certain things the system does very well and certain things the system doesn’t do as well. The goal is to be continuously improving on that.”

But there’s a second reason: the MLBPA doesn’t currently have the ability to analyze Statcast data to determine, on its own, what it means. As we discussed last April, Statcast is to a large extent a black box, a new technology. Including WAR in an arbitration makes sense because anyone can calculate WAR. Similarly, anyone can calculate wRC+, peruse FanGraphs or purchase a Baseball Prospectus membership. But Statcast data is different. It relies on technology that is controlled by the league. Teams have much more complete access to its raw data, and possess the processing capacity and statistical acumen to analyze these raw outputs, potentially creating their own metrics and measures of value. As it stands, the union and most agencies are not similarly staffed or situated.

The MLBPA represents players’ interests, and those interests aren’t helped by including metrics in arbitration that the players’ side can’t assess or challenge effectively. That’s all the more true given how those metrics could potentially negatively impact players. As just one example, it could work against players like Didi Gregorius, who performs like a top-tier offensive shortstop despite not being a Statcast darling. That isn’t to say that players’ representatives will be forever in the dark on Statcast; data could be shared with both parties, and agencies could hire their own analysts to assess it. But the union was prudent to build time into the CBA to allow for some degree of catch-up.

And what of other advanced metrics, like WAR and wRC+? We discussed this, at least in part, last time. But it’s worth noting that the current rules aren’t well suited to sabermetrics. Why? Because, generally, players are only allowed to introduce performance evidence for their most recent season except to explain their prior salary awards. Per MLB Trade Rumors (emphasis below mine):

Beyond the first year, players receive raises based more heavily on the most recent season’s performance. Historical performance is only factored in to the extent that it affected a player’s most recent salary. While that may seem counter-intuitive, those familiar with the process have confirmed that this is usually the case in actual arbitration hearings.

If you’re surprised, you’re not alone. The plain language of the Rule (in both the 2012 and 2016 Collective Bargaining Agreements) allowed for at least some career performance to be admissible, even if weighed less than the most recent season. But as you can see, that’s not necessarily how it actually works in practice, and that’s an issue for players. If you read FanGraphs, you know that most statistics are subject to sample size noise: the less data you have, the less reliable your result, and vice versa.

Essentially, we want to make sure we have enough observations that the random noise gets cancelled out. Don Kelly hit a home run against Yu Darvish one time, but how many Kelly versus Darvish at bats do we need before we can accurately assess their abilities? It’s more than one for sure, but the actual number you need depends on the skill you’re trying to analyze?

For many statistics, the answer is more than one season. For hitters, things like batting average, batting average on balls in play, and rate of extra-base hits all take 800 plate appearances or more to stabilize. As a result, hitters can be penalized in arbitration for fluky numbers. For pitchers, home run and extra-base-hit rate each take well over a thousand batters faced to stabilize (for reference, the most batters Mike Mussina ever faced in one season was 1,039). So if you’re a reliever, even your full-season statistics can be subject to small sample noise. Public defensive metrics are best consumed with the context of multiple seasons. (I am not a statistician, nor do I play one on television, so I refer you to this piece by Jonah Pemstein about how this all works in practice.)

On the one hand, there’s a certain logic to the current system: the most reliable indicator of future performance is most recent past performance. But if you’re a player, having your salary based, in essence, on one season’s worth of performance creates a myriad of problems – small sample size issues and random fluctuations (especially for bench players or relievers), injuries masking actual talent, and improvements that may not have shown up in underlying numbers.

Last time, we talked about how an arbitration panel delivers its decision:

The next day [after the hearing], the panel chairperson will call the designated Union and MLB representative and report that the panel ruled either for the player or the club. That’s it. No explanation. No rationale. Nothing. Just a stated winner of the case. The Union and MLB then call the player and club, respectively, and report the result. If an outside advocate is used, he or she will also get a call.

That’s what we call in the law and most other fields a “black box” – inputs go in, an outcome is generated, and we don’t really know why. Some lawyers prefer the more tongue-in-cheek term “sausage” to refer to verdicts, arbitration decisions, and other case outcomes like this; as the line goes, you don’t want to know how it gets made. Except in this case, we do! It would be to players’ benefit to understand what information is the most persuasive for arbitrators. The trouble with this kind of system is that when the inputs – like, say, baseball statistics – aren’t following the agreed-upon rules (like allowing for career stats) and are themselves subject to long-term variation, you’re removing the consistency on which a precedential system relies – and end up creating some strange arguments in the process. Per Anthony Castrovince:

Mike Scioscia has told the story of his arbitration victory in which his .407 on-base percentage from 1985 was actually used against him by the Dodgers. They argued that the slow-footed catcher was clogging up the basepaths and getting on base too much.

Scioscia won his hearing – but the type of evidence introduced hasn’t necessarily become more sensible — by modern baseball standards — since then. Per Jeff Passan:

[L]ast January, as they tried to convince a three-person arbitration panel that [Mookie] Betts deserved the $7.5 million salary they were offering and not the $10.5 million he requested, the Red Sox fashioned a novel approach in the typically staid, lawyerly arbitration room: They played a video talking about how good Kris Bryant was.

The purpose, multiple sources in the room told ESPN, was not simply to lavish praise on the Chicago Cubs’ third baseman but to make their case: As great as Mookie Betts may be, he isn’t Kris Bryant.

Is that really the type of evidence we want to be using? Not advanced metrics, but video tributes to other players? Now to be fair, that sort of thing went both ways for a while. This, my personal favorite, is about Sean Casey. Once again, per Castrovince:

The previous fall, the Caseys had a cat that wasn’t housebroken. Casey had come home from the Reds’ last road trip of the season and tossed his best — well, at that time, only — suit jacket on the ground. The cat peed on it. Casey had the jacket dry cleaned, but didn’t wear it again until the hearing several months later. The stench had stuck.

Casey’s coat might have emitted an offensive odor, but he had a sweet-smelling arbitration case. He had hit .315 with 20 homers, 33 doubles and 85 RBIs the previous season. All the statistical comparables insisted his asking price was accurate, and his agent, Ron Shapiro, amplified his argument by showing the panel a quote from Reds manager Jack McKeon about how Casey was so nice and so popular that he could be the mayor of Cincinnati (“The Mayor” nickname still sticks). Casey had even proved the point before the hearing started, taking the rare step of shaking hands and eagerly introducing himself to each of the arbitrators. The Reds didn’t stand a chance.

So that’s the story of how a man reeking of cat urine got a $2.6 million raise.

But the newest CBA limited players’ ability to introduce testimonials like that – so while the Red Sox can make their case in chief a paean to the Cubs’ third sacker, Casey likely couldn’t get away with his argument today.

There’s one last point that should be made about black boxes and why they don’t work. Under the current rules, teams can introduce the “recent performance record of the Club” over multiple seasons, “including but not limited to its League standing and attendance as an indication of public acceptance.” That is, at least in part, the brainchild of current Dodgers President of Baseball Operations Andrew Friedman, who developed the “file-and-trial” strategy back when he was General Manager of the Tampa Bay Rays (Friedman called it “file-and-go”). Why is this important? Because in baseball, one player just can’t take a team to the playoffs. We like to talk about a player “putting a team on his back,” but in reality, baseball needs a lot more than one superstar. The 2018 Angels had Mike Trout and Shohei Ohtani, and they didn’t contend. The Colorado Rockies, as close to a stars-and-scrubs roster as you can get, had an MVP candidate in Nolan Arenado and two All-Star caliber players in Charlie Blackmon and Trevor Story, and that’s just on the position player side. One player just can’t lift a team in baseball the way he can in basketball or even football.

So what are the fixes here? Actually, they’re pretty straightforward. First, the rule on career contributions needs to be amended to expressly allow both sides to present career statistics at arbitration for the purpose of demonstrating a performance pattern, or at least establishing outliers. The rules can weight a more recent year more heavily, but they should still expressly permit the allowance of career data, especially where necessary to show statistical stabilization. Evidence is meaningless without context, and context is something the current system does a poor job of providing.

Second, arbitration submissions should include that player’s statistical performance across a range of agreed-upon statistics from the player’s most recent season and career, highlighted by percentile, rank, and actual performance indicators for each season, both traditional and sabermetric. That would allow the arbitrators a frame of reference going into the hearing, regarding not only how good each individual player’s season was, but also what their baseline is. The arbitrators should be given a full picture of data ahead of time, allowing them to see why each side’s arguments make sense.

Third, this is yet another plea for the requirement of written arbitration memoranda decisions. In any precedential proceeding, it is essential that there be a certain degree of predictability in the process. Casey’s story is humorous, yes, but the idea that a player’s salary could come down to whether or not he has a manager who recognizes his contributions in the form of a nickname is concerning. Advocates for both sides need to know what the arbitrators will expect, and what the arbitrators will find most convincing.

Finally, there needs to be an agreement to remove from consideration all factors outside of the player’s control. That includes team record and attendance, yes, but it should also include things like “public appeal” and statistics like runs batted in and won-lost record. Admissibility should be limited to evidence of the player’s actual talent level, and only that evidence available to both sides. Statcast data would be excellent for this purpose were the MLBPA ever granted equal ownership or insight into its functionality. Until then, the MLBPA should consider hiring its own statistical experts to analyze, and propose for use in arbitration, a set of indicators that are most likely to reflect what a player’s performance actually was, independent of other factors.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG