Here’s How I’m Planning on Evaluating Free Agency Predictions

by Ben Clemens

December 17, 2025

Every year, FanGraphs (in this case, I am FanGraph) releases contract predictions for our top 50 free agents. We also run a contract crowdsourcing project for those players, and I have to say, the crowd is spectacularly good at this. Last year, for example, I looked through all of the various predictions across the internet and awarded the crowd the title of best overall prognosticator.

But honestly, the winner of that award was hard to determine because I didn’t have a great way to evaluate the various predictions. Why so difficult? Because not every deal ended up being for the length we all predicted. As an example, I predicted 12 years and $48 million per year for Juan Soto, while the crowd predicted 13 years and $45 million per year. Soto signed a deal that was for 15 years and $51 million per year. Who was closest to the mark? It’s not immediately clear. I did better on the AAV, but the crowd did better on the number of years. There’s no obvious determining factor to use when comparing the two. Even worse, the two are inversely correlated; more years generally means a lower AAV. The two predictions seem pretty similar to me, but I had to grade AAVs and total guarantees separately, and that just felt clunky and confusing.

After some time bouncing ideas off my friends and colleagues, and plenty of time in the FanGraphs Idea Generation Lab (not real, but man, it should be), I think I have a solution. It’s simple, really. Evaluating contract predictions would be much easier if the predictions and the actual contract were for the same length, so I made them all the same length.

You might have some objections to that. “Hey, that makes no sense,” or “that’s not how math works,” something along those lines. But hear me out: Even if we can’t go back and retroactively change contracts or predictions to make the years match up, we can try to figure out what a deal would have looked like if it were longer.

I wanted a simple rule that wouldn’t require me to exercise any judgment at all. Working through case-by-case decisions is for making predictions, not evaluating them. I wanted something with few moving parts, and I definitely didn’t want to need to lean on projections or complicated aging curves in my analysis. My goal was to make it so that anyone could perform this analysis with nothing more than a list of our predictions and a list of the actual contracts. That was limiting, but also freeing. If you can’t use much, you rarely struggle to figure out what to use.

After a bit of experimentation, I settled on a rule. To compare two contracts of different lengths, I extended the term of the shorter contract to match the term of the longer one. To determine the salary for those additional years, I took two-thirds of the predicted annual salary for each added year. Take Soto, for example. Since I predicted 12 years for his deal, I had to add three years to match the one he signed. I valued those years at $32 million, $21.3 million, and $14.2 million. The crowdsourced prediction was for 13 years, so I had two add two years, at $30 million and $20 million respectively.

You Aren't a FanGraphs Member

It looks like you aren't yet a FanGraphs Member (or aren't logged in). We aren't mad, just disappointed.

We get it. You want to read this article. But before we let you get back to it, we'd like to point out a few of the good reasons why you should become a Member.

1. Ad Free viewing! We won't bug you with this ad, or any other.

2. Unlimited articles! Non-Members only get to read 10 free articles a month. Members never get cut off.

3. Dark mode and Classic mode!

4. Custom player page dashboards! Choose the player cards you want, in the order you want them.

5. One-click data exports! Export our projections and leaderboards for your personal projects.

6. Remove the photos on the home page! (Honestly, this doesn't sound so great to us, but some people wanted it, and we like to give our Members what they want.)

7. Even more Steamer projections! We have handedness, percentile, and context neutral projections available for Members only.

8. Get FanGraphs Walk-Off, a customized year end review! Find out exactly how you used FanGraphs this year, and how that compares to other Members. Don't be a victim of FOMO.

9. A weekly mailbag column, exclusively for Members.

10. Help support FanGraphs and our entire staff! Our Members provide us with critical resources to improve the site and deliver new features!

We hope you'll consider a Membership today, for yourself or as a gift! And we realize this has been an awfully long sales pitch, so we've also removed all the other ads in this article. We didn't want to overdo it.

Click Here To Become a Member

With both predictions now for 15 years, I could compare them directly. Take my predicted contract plus the three added years, and I had Soto down for $643.5 million over 15 years, $121.5 million short of the deal he actually signed. The crowdsourced prediction came out to $635 million over 15 years, $130 million below his actual deal. My prediction was slightly better – but the two were exceedingly close to each other.

That’s the outcome I wanted. Those two predictions really do sound similar. It’s not clear whether 12/48 or 13/45 is a bigger offer; the latter is only for $9 million more in total salary, and there’s a whole extra year tacked on. On the other hand, it’s longer, and that really does matter. I think that calling the two roughly equivalent is the right way to go, and I like that my prediction came up as slightly higher after the adjustment; as I mentioned above, that 13th year being for only $9 million is instructive of how I’m thinking about it.

So what happens when a prediction is longer than the contract a player actually signs? You can just do the same thing in reverse. This is great for pillow contracts in particular. Take Gleyber Torres, who last offseason signed a one-year, $15 million deal. I had him down for five years at $18 million per, while the crowdsourced median was three years at $18 million a pop. Those were both bad predictions, and mine was clearly worse, but how bad? Well, that pillow deal and our two-thirds formula works out to $39 million over five years (a miss of $50 million) or $31.7 million over three years (a miss of $22 million). Those equivalent contracts seem reasonable to me; if you’d offer someone $15 million for one year, you’d probably offer them $30 million for three. That’s not true in every case, obviously, but for our purposes and using limited, objective inputs, I think it’s close enough.

I experimented with a few different ratios for this, and I don’t have any conclusive evidence that this two-thirds formula is the perfect solution. Halving the salary each year was my first guess, in fact, but I changed my mind after looking at some long deals like Soto’s. I’m definitely not convinced that this is set in stone, but it does look closest after some various spot checks of past contracts. I’d love some feedback here, truthfully; let me know if you think this method has merit, because while I think it does, it’s harder to feel confident in that view because there’s no real way to verify it perfectly using historical data.

Somehow, though, this still isn’t enough to conclusively say that one set of predictions is better than another. For one thing, contract length matters. It’s all well and good to equate two projections for the sake of evaluation, but if I predict a three-year deal and the player gets a seven-year pact, I was wrong. Those years tell us something about how teams view that player. A full evaluation of predictions still has to look at that, even after doing this math to transform the dollar side of the equation into a single metric.

But that’s only half of the problem, because there are two ways to evaluate the results my method spits out. First, you could evaluate a set of predictions by their average miss, with under-predictions and over-predictions offsetting. Second, you could use the absolute value of the misses, so that those two do not offset; over-predict one contract by $10 million and under-predict another by $10 million, and your average miss is still $10 million. There are arguments for both; the average miss version does a good job of explaining which set got the overall market right, while the absolute miss version does a good job of sussing out who got the closest on individual players. I think the second is more important, but the first is clearly useful as well.

To give you an idea of how this new evaluation method works, I’m going to compare the entire set of predictions from the 2024-25 offseason, both mine and the crowdsourced estimates. Here’s how our average predictions did, both using my old AAV-and-total-separately method and the new blended one:

2024-25 Free Agency Predictions, Re-Evaluated (Aggregate)

Group	Ben – AAV	Ben – Total	Ben – New	Crowd – AAV	Crowd – Total	Crowd – New
SP	-$1.04M	-$7.77M	-$6.58M	-$1.1M	-$7.34M	-$5.95M
RP	-$0.17M	-$3.44M	-$1.36M	-$2M	-$8.72M	-$6.49M
Hitter	-$0.54M	$3.46M	$2.21M	-$0.7M	-$2.21M	-$4.48M
Overall	-$0.69M	-$2.77M	-$2.45M	-$1.13M	-$5.71M	-$5.64M

It’s gratifying to see that my new numbers generally resemble the old “total guarantee” metric. That’s good; if these were way off, it would suggest that my normalize-and-compare method was doing something weird about understanding total guaranteed money.

On the other hand, the absolute value prediction errors look pretty different with the new methodology:

2024-25 Free Agency Predictions, Re-Evaluated (Absolute Value)

Group	Ben – AAV	Ben – Total	Ben – New	Crowd – AAV	Crowd – Total	Crowd – New
SP	$4.14M	$19.33M	$14.19M	$4.05M	$18.97M	$13.85M
RP	$1.80M	$7.22M	$4.48M	$2.45M	$9.16M	$6.94M
Hitter	$3.15M	$35.55M	$21.44M	$2.72M	$32.03M	$21.91M
Overall	$3.32M	$22.95M	$14.34M	$3.25M	$21.87M	$14.81M

This is basically the whole point of this formula; by putting contracts on an equal footing in terms of length, this method reduces some false “errors” that really just come because a five-year deal is inherently for more money than a three-year deal. In the aggregated and offsetting table above, those – well, they offset. In absolute value world, they don’t, which is why we see more of an improvement here.

I’m fairly confident that this new formula comes closer to my goal of evaluating which predictions were best. I’d be doing this even if I couldn’t publish it, in fact; every year, I do an extensive post-mortem evaluation to try to improve my system for the next year. I used to hate pillow contracts and qualifying offers for that reason; they messed everything up so much by virtue of their length that it was hard to interpret the results. By controlling for that at least somewhat, I think I’ll be able to refine my methods even further; the more accurately you can measure your shortcomings, the easier it is to address them.

Why publish this now, instead of in February when I’m doing that post-mortem? So that it doesn’t look like I have my thumb on the scales. I truly have no idea if this method will “help” my predictions. How could I? Much of this year’s free agent class remains unsigned. No one likes a judge with a pre-conceived agenda to help one contestant or another. Proposing a new methodology, one that is self-professedly a little bit hand-wavy, and simultaneously evaluating myself on it just doesn’t sound right. I’d like to think it wouldn’t affect my judgment, but why take the chance?

Anyway, this is how I’m planning on evaluating predictions this year. I’d sincerely love to hear what you think. I want a simple, foolproof method for this project. If you have a better one than mine, I haven’t liked this idea long enough to be tied to it. Let’s figure something out together – though again, I really do like this one at the moment.

14 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

HappyFunBallMember since 2019

2 months ago

I’m on board with the idea of attempting to equalize the contract length to get a reasonable comparison. And while I can appreciate the simplicity of the 2/3 model, why not use an uninvolved 3rd party contract estimator for those empty years? ZIPS, for example.

raregokusMember since 2022

Reply to HappyFunBall

I guess I could understand it if he doesn’t want to bother Dan, but yeah this seems like a great use case for ZiPS.

Greg SimonsMember since 2016

One issue would be long contracts like Soto’s. I don’t know if ZiPS goes out that far.

WanderingWinderMember since 2023

Using ZiPS doesn’t really make sense to me. If you were doing that, why not just use the ZiPS contract estimates instead of doing the exercise to begin with? Or more realistically, ZiPS’s contract estimates can be one of the things we’re trying to evaluate. Assuming it’s right feels like it defeats the purpose (or at least the purpose in my mind), which is to see how the market actually *differed* from expectations.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG