Player Evaluation on the Moon

May 16, 2022

A quick word of warning: this one is pretty abstract. If you like baseball math, it’s definitely got that. If you like analysis of the 2022 major league season, it absolutely does not have that. I think it’s pretty fun, but if that’s not your cup of tea, this one might not be for you. Anyway: on to the nonsense!

I’m the kind of maniac who likes to play baseball video games when I’m not writing about baseball. Right now, that’s Out Of The Park 23, specifically the Perfect Team mode. It’s a baseball simulation where you collect cards representing current and historical players, build teams, and then play simulated games against other players’ teams.

The headline mode of the game lets you collect whoever you want and battle against your opponents’ best shot – peak Mickey Mantle against peak Tex Hughson, say. That’s fun in its own way (for what it’s worth, Mantle strikes out more than you’d like when facing top-tier competition), but I’m more interested in another mode the game offers: tournaments where you match a limited pool of your players against a limited pool of opponents.

More specifically, I’m talking about the “Silver” mode, where you can use some interesting but mostly not overpowering versions of historical hitters. If you’re looking for an equivalent in terms of 2022 players, think Adam Frazier or Gleyber Torres; pretty good, but not great.

Mostly, it’s fun because it’s cool to learn about new baseball players. Bill Bruton, arguably the fastest man in baseball in the 1950s, is my center fielder. John Beckwith, a Negro League great, catches. It’s also an excuse to remember some players fondly – peak Dan Haren and a card representing Bob Feller’s rookie year are both in my rotation.

More importantly for this article, though, these squads of acceptable hitters and pitchers are putting up absolutely absurd offensive numbers. Based on a set of tournament data I acquired for this article, Silver hitters in aggregate are batting .277/.364/.497. They’re clubbing 1.9 homers per game and scoring 6.3 runs per game. The league as a whole has a .338 BABIP. It’s an offensive environment the likes of which we’ve never seen in the real-life majors.

That leads to some weird roster construction – bullpens are facing a ton of extra batters, so they get tired more frequently, and even great starters are prone to being knocked out of the game early. Should you load up on relievers? Keep your hitters fresh and try to pound the other team’s pitching into submission? Invest in two-way players? It’s a fun puzzle. It also creates an interesting evaluation question: how do you measure a hitter’s contribution in such a crazy run environment?

You could just use OPS, of course. Sure, all the numbers will be inflated – again, the league-average OPS is .861 – but there’s no question of what scale it’s on. But OPS has its own issues. Which hitter would you rather have – 2005 Pat Burrell, who’s hitting .287/.431/.594, or 1988 Darryl Strawberry, hitting .269/.380/.656? OPS is always confusing when one player has a significantly higher OBP than the other, but that problem is exacerbated when the run environment is so strange.

One other thing you could do is just not care so much. It’s a computer game that simulates baseball between players who never actually played against each other! Just let them play against each other and enjoy the results. But that’s not how I think about baseball – I want to know what’s valuable in this fake league just as much as I do in real life.

The obvious solution, then, is wOBA. With everyone playing in an identical stadium, there’s no need for park adjustments – whoever has the highest wOBA is the best hitter. There’s just one problem – the game doesn’t provide wOBA, and it also doesn’t provide bulk-downloadable game logs that you could use to work out wOBA weights.

That’s okay. We’ll create our own wOBA! You could try to derive it from first principles – wOBA is the average change in run expectancy per plate appearance – but luckily, there’s a guide to deriving wOBA on this very site, produced by Neil Weinberg six years ago. The steps aren’t exactly easy, but they’re pretty straightforward, and I think it’s kind of fun to do. Ever wanted to build wOBA at home? You haven’t? Then you definitely don’t need to read this part. But silly little math tricks are one of the things that I enjoy about baseball, so let’s give it a shot.

First things first: we’ll need a run expectancy matrix. A run expectancy matrix is a pretty simple thing. It’s asking one question: for a given base/out state, how many runs score in that half-inning, on average? That’s the run expectancy for that base/out state. Repeat it for all 24 possible states, and you have your matrix.

In the real world, we’d simply look at every instance of, say, a runner on second with nobody out, then figure out how many runs scored in each of those innings. Then we’d take the average of those run totals and bam, there’s your run expectancy for a runner on second and no one out. Of course, we don’t have those tables – as I mentioned, there aren’t any bulk-available game logs in OOTP. Instead, we’ll have to make them up ourselves out of whole cloth.

Luckily, we can build our own game logs. We have all of the rate stats for our hitters – how often they end a plate appearance with a given outcome. Here’s what that looks like for our league in aggregate:

Outcome Frequency

Result	Frequency
Single/Error	14.9%
Double	4.2%
Triple	0.5%
Home Run	4.7%
Walk/HBP	12.4%
Strikeout	25.6%
Other Out	37.7%

All we have to do is program a dice-rolling engine to turn this into our own set of game logs. Start with no one out and no one on and pick an outcome at random, using the probabilities up above. Change the game state by that outcome – let’s say we start with no one out and no one on, then select a double. Now we have a man on second and no one out. Then we pick another random outcome – let’s say a groundout. Now we have a runner on third with one out. Do this process until the end of the inning, repeat it a million times, and we have our own set of game logs.

There’s an even easier way to figure out how many runs score from each base/out state – we can simply start each of our million simulations from that exact combination of runners and outs. How many runs score in an inning when you start with runners on second and third with no one out? All you have to do is simulate a million innings that each start with runners on second and third and no one out. (For the record, in our high-offense environment, second and third with no one out is worth 2.38 runs on average.)

Next, following this guide, we need to turn our run expectancies into linear weights. In real life, this is easy to do. Take every walk that occurred in the year and work out the change in run expectancy from before the walk to after the walk – easy to do thanks to the fact that we have a run value for each base/out state. Next, add all those up and divide them by the number of walks. Voila! The run value of the average walk.

I did a slightly lazier version of the same thing. Again, we don’t have game logs. We do, however, have the ability to run a bunch of simulations. I asked my same dice-rolling program to note the frequency that the game reached each base/out state. You can access that here if you want to play around with it. Since our model gives every outcome an equal chance of happening in every base/out state, we’re mostly home.

Consider a walk again. With runners on second and third and no one out, 2.38 runs score in the inning on average, as we covered up above. A walk would move that to bases loaded with nobody out – 2.80 runs on average. That makes the walk worth 0.42 runs in this scenario. How often does second and third with no one out come up? 0.4% of the time, according to our program. We can repeat this for every single base/out state – bases empty and no out, for example, occurs on 23.4% of all plate appearances, and a walk there increases run expectancy by 0.44 runs. Do this for every single state, and we can get the average run value of a walk – 0.406 runs.

I repeated this process for every possible outcome – each type of base hit (I lumped errors in with singles), walks, hit by pitches, strikeouts, and other outs. Here’s the linear run value of each event:

Run Value Above Average by Event

Result	RV
Single/Error	0.509
Double	0.815
Triple	1.102
Home Run	1.436
Walk/HBP	0.406
Strikeout	-0.377
Other Out	-0.386

Hey, we’re almost home! All we have to do is convert from linear weights to wOBA weights by introducing scale. Per Neil’s guide, we next re-center everything so that an in-play out is equal to zero runs (Neil made all outs worth the same, but we can do a little bit better by making in-play outs slightly less beneficial – double plays are killers in our league, with so many runners on base and outs so unlikely). Here are our re-cast linear weights:

Run Value by Event (Centered on Zero)

Result	Recast RV
Single/Error	0.895
Double	1.201
Triple	1.488
Home Run	1.822
Walk/HBP	0.792
Strikeout	0.009
Other Out	0.000

Finally, we just work out league-wide wOBA given those weights. That’s not too hard – we already know the frequency of each event happening from up above. Do a giant multiply-and-add, and we get a value of 0.377. We want to scale that to league on-base percentage – that’s the way wOBA works – so we create a constant called “wOBA scale.” That’s .364/.377, or .966. Multiply each of those linear weights up above by wOBA scale, and we’ve made our own wOBA weights for this wild run-scoring environment.

These weights come from a game that looks quite unlike the baseball we know. Here, making outs hurts far more than it does in real-life baseball. Home runs are less valuable – cashing in all the runners on base is a big part of a home run’s value, and those runners are more likely to score some other way in a world with a .364 OBP. With the caveat that it’s not so easy to compare wOBA values because of the scale constant, here are the constants for OOTP Silver, with its 6.3 runs per game, and 2022 MLB:

wOBA Weights, Real and Imaginary

Result	OOTP wOBA	2022 wOBA
Single/Error	0.864	0.895
Double	1.159	1.286
Triple	1.436	1.637
Home Run	1.759	2.129
Walk/HBP	0.765	0.693
Strikeout	0.009	0
Other Out	0.000	0

Did I need to do all this? Absolutely not. But wOBA can be a scary, unknown thing, numbers handed down from on high that tell you how good everyone is. By showing my work, and calculating wOBA for a run environment unlike our own, I hope that I’m showing how well it works. By the way, that Burrell vs. Strawberry comparison from up above? The answer is what you’d intuitively expect. Homers, and extra bases in general, are less valuable, as we already covered. Avoiding outs is more important. Thus, Burrell has a meaningfully higher wOBA, .427 as compared to .412 for Strawberry, despite a worse OPS.

In a pinch, you could still just use OPS. OPS explains 92% of the variation in wOBA – lower than the 98% mark for the real-life majors in 2021, but still not too shabby. But if you can use something better, why not use it? There’s never any harm in learning how some of our tools of baseball analysis work – and if I can brag about my team of historical baseball players in the bargain, even better.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG