Handicapping the 2021 MVP and Cy Young Races
It may seem a little bit early to start talking about postseason hardware, but what’s the fun of a projection system if you’re not looking at every way to separate the best of the best? In any case, it’s not as interesting an exercise when the season ends, given that we already know what happened (though it’s way more accurate).
Naturally, voting is not going to be a simple ranking of WAR. Each award has 30 different voters, all with differing priorities and philosophical beliefs in the way of excellence. Rather than kidnapping my colleagues and subjecting them to a series of lab tests about voting, our best solution is to use past votes to infer how they’ll vote going forward.
While using a neural network is always tempting, we’re handicapped by the real scarcity of data; 30 votes per award is not a lot to work with. As I’ve worked with the models over the years, the other issue is that there does seem to be a change in how voters are voting, enough to have an effect on who is winning the awards and by how much. I’ve found that chucking out anything before 2000 improves every model and every approach I’ve tried. By and large, we’re not voting on WAR, but it and other analytics have affected the results both directly (more sabermetric-friendly writers joining the BBWAA) and indirectly (influencing existing voters). I could probably make a very accurate model for how I vote, but we’d be treading far too deep into meta territory at that point.
So, what’s new this year? One variable I’ve added to the mix is past award performance — something I wish I had checked in the past, but better later than never. Essentially, players who have received votes recently tend to do slightly better than equally excellent players who have not received votes recently.
Let’s jump right in.
