Is Run Estimation Relevant to Free Agency?

Sometimes there seem to be two separate branches of saber-oriented blogging: one that uses sabermetric tools to analyze current events (player transactions, in-game strategic choices, etc.), and another which focuses on more theoretical issues (e.g., specific hitting and pitching metrics). Obviously, the latter is supposed to ground the former, but there still seems to be something of a disconnect between the two levels in popular perception. I say this because I was recently part of a discussion in which some were pointing out the superiority of linear weights run estimators for individual hitters to the approach of Bill James’ Runs Created. Someone then made a comment to the effect that this was simply a nit-picking preference for a “pet metric” that really did not make that much of a practical difference.

Sabermetrics is far from being a “complete” science in any area. Debates about how best to measure pitching and fielding are obvious examples of this. With respect to run estimators, there is a greater level of consensus. However, because of the progress (at least relative to pitching and hitting) that has been made with run estimators for offense, that also means there is less of a difference between the metrics. However, it does make a difference. Rather than arguing for one approach to run estimation over another, I want to simply look at a few different free agents from the current off-season to see what sort of difference using one simple run estimator rather than another would make on a practical level.

My impression is that at least in “public” sabermetrics, there is a broad consensus that for individual hitters, linear weights is the best approach. Even more recent formulations of Runs Created retain have some of the same problems as the original (over-valuing singles, for example) without retaining the original’s charming simplicity. Moreover, James’ version of Runs Created has the logical problem of using a dynamic run estimator to value individual hitters’ contributions. There are different implementations of offensive linear weights (Batting Runs, Extrapolated Runs, EquivalentTrue Runs, wRAA, among others), but they all stem from the same basic idea. You can read more about the advantages and disadvantages of the various approaches elsewhere.

Let me be clear: I am not breaking any new ground here. I am not even going to make an argument for which approach is better. I am simply going to look at what difference using Runs Created or linear weights makes on a practical level.

[Note that through this post I have consistently tried to use the phrase “Runs Created” to just refer to James’ formula. In other contexts others (myself included) use “runs created” to mean whatever formula they are using at the time. The point being that one should not automatically assume that someone is a “Jamesian” if he or she is discussing how many runs a player has created.]

To keep this as simple as possible, I just used very basic versions of each run estimator. For Bill James’ Runs Created, I just use the original version: ((Hits + Walks) x Total Bases)/(At-Bats + Walks). Perhaps the most simple version of a linear weights formula is Paul Johnson’s Estimated Runs Produced (which has the added bonus an ironic historical connection to Bill James). For the sake of sticking to the same events as James’ Basic RC, I will use Patriot’s simplification of ERP: (Total Bases + 0.5 x Hits + Walks – 0.3 x (At-Bats – Hits)) all times a constant to fit it to the run environment, usually around .32. I would be surprised if “professional statheads” from teams used either Basic RC or ERP. Each can be (and has been) improved by adding in more events, for example.

To make the point concrete, I will apply each run estimator to some of the offensive lines projected by Brian Cartwright’s Oliver forecasting system. To put them on a more equal footing for the sake of comparison, I have calibrated each to the 2011 run environment. Let’s look at some players that are (or were) free agents during this off-season and their estimated 2012 runs created according each run estimator’s calculation based on Oliver’s projection.

[RC= Basic Runs Created, RCAA = that figure converted to runs above average, ERP = Estimated Runs Produced, ERPAA is that figure converted to runs above average. All figured rounded to the nearest run. More details for the truly nerdy can be found at the end of this post.]

Jimmy Rollins, 513 AB, 128 H, 24 2B, 3 3B, 13 HR, .264/.325/.384, 61 RC, -4 RCAA, 61 ERP, -4 ERPAA
Aramis Ramirez, 477 AB, 132 H, 24 2B, 1 3B, 21 HR, .277/.333/.462, 72 RC, 14 RCAA, 70 ERP, 12 ERPAA
Jose Reyes, 466 AB, 140 H, 25 2B, 9 3B, 8 HR, .300/.344/.444, 72 RC, 17 RCAA, 68 ERP, 13 ERPAA
Prince Fielder, 541 AB, 156 H, 30 2B, 1 3B, 34 HR, .288/.402/.536, 113 RC, 48 RCAA, 110 ERP, 45 ERPAA
Albert Pujols, 546 AB, 170 H, 33 2B, 1 3B, 42 HR, .311/.399/.606, 131 RC, 67 RCAA, 122 ERP, 58 ERPAA

Do not get too hung up on the individual projections (although Oliver is a good projection system and, as a customer, I can say that THT offers a good value) and what the simple formulas “leave out,” but look at the differences in estimated offensive production. In the case of Rollins, there is no difference. In the case of the recently-signed Aramis Ramirez, it is only about two runs. Fielder’s three runs is not much to worry about. Maybe the error bars on the projections are bigger than the amount of difference between the run estimators in these cases, but once you get into the area of one-half of a win, the size of a contract a player might be worth starts to stand out as the amount and length of the deal grow (and you probably do not want your team compounding the error of the projection by using the wrong run estimator, anyway. Take the case of Jose Reyes, where there is a four-run difference. For the sake of the example, leaving everything else constant except his batting events as evaluated by RC and ERP. Assume that Reyes is about a 4.5 win player using ERP, which would imply about a $98 million contract over five years. If one uses RC and has him close to a five-win player, over five years one would expect to pay him something like $112 million. How much a difference that makes probably depends on the franchise’s budgetary margin for mistakes, but I would not say it is necessarily insignificant.

The one that grabs the attention, of course, is Albert Pujols: Basic RC sees his projected production as worth nine runs more than ERP (a manifestation of Runs Created bias towards high-average hitters). That is worth about one marginal win. Let’s say an interested team uses a WAR-like framework, and for the sake of argument, with everything else the same, using ERP and THE Oliver projection they have Pujols as a 6.5 win player in 2012. At $5 million per win, and a five percent annual increase in the average cost of a marginal win and a half-win a season expected decline, if you wanted to sign Pujols for ten years it would take around $260 million dollars. However, if everything else is the same and the team uses RC instead of ERP, then he’s a 7.5 win player, and over ten years that would by about $320 million. I think that is a pretty big difference on a practical level.

This is not about what metrics the Angels, Cardinals, and Marlins did or did not use (I would be shocked to find out that they used either Basic RC or ERP), but simply to make a point about whether or not the difference between the metrics is just nerds arguing about stuff or whether it really matters. I think that it is apparent that it can make a very big practical difference in some cases beyond simple one-upnerdship.

Somewhat Technical Postscript in Which I Probably Reveal My Incompetence: I am not an expert on this stuff, and I am sure there are better ways to calibrate the formulas and whatnot, but I think I’m close enough for the purposes of this post. To calibrate each formula: we can solve for a multiplier so that each formula’s league runs created/produced is equal to actual runs. Using the formulas above and round to three decimal places, the multiplier for RC is 993, for ERP is .318. Both RC and ERP are scaled to “absolute” runs (like wRC) as opposed to runs above average as you would see in the wRAA or Batting fields on FanGraphs’ player pages, but that is easy enough re-scale. The short version of how I did this is that since each formula already includes an “absolute” out value, by scaling each formula to the 2011 run environment, one can subtract in league runs per out (defined here as At-Bats minus hits) times outs made from the absolute figure to get runs above average.

We can also see the weights implied by each run estimator by calibrating it to actual runs for 2011, then adding one of each event separately to see how many more runs were produced. This is called the “+1 method.” Remember I have used the 2011 run environment, and that in both formulas outs are simply AB-H. Obviously, the weights are not going to be right because we have left out many events that these are “making up for.” But they are still pretty close to what you get with each sort of estimator generally speaking. Draw your own conclusions.

RC: .248 BB, .563 1B, .878 2B, .1.192 3B, 1.507 HR

ERP: .319 BB, .478 1B, .797 2B, 1.116 3B, 1.435 HR





Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.

5 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Anon
12 years ago

Maybe the error bars on the projections are bigger than the amount of difference between the run estimators in these cases

How can you talk about error without knowing the margin of error for a specific statistic? This is my problem with many stats; I know they have flaws but don’t know the magnitude of those flaws.

2 runs difference could be huge if the margin is .2 but tiny if the margin is 10.

guesswork
12 years ago
Reply to  Anon

In this case, standard error is relatively easy to calculate. According to linear weights, each plate appearance is a multinomial event, which I presume are independent and identically distributed. After n events, we weight each event and find the sum. The variance of an iid weighted sum is a weighted sum of the variances, where the weights are squared.

For example, let us assume a batter only has two possibilities: get on base or produce an out. The latter is good for +0.5 runs, the latter for – 0.25 runs. Now pretend a player gets on base at a rate of 0.350 of the time. Over 500 plate appearances, we expect this player to accrue
175*.5 – 325*.25 = 6.25 runs.
The variance of a single event is n*p*(1-p), which is equivalent to
175*.650*.25 + 325*.350*.1025 = 40.1
Take the square root to get a standard error of 6.33

Note the variance will increase as n increases as we are finding the variance of a sum rather than a mean. Furthermore, power hitters will have considerably more variance, as their more probable outcomes are associated with large weights. A small change in HRs will have a much bigger change than a small change in singles, leading to more variance.

Barkey Walker
12 years ago
Reply to  guesswork

The multinomial actually has a non-trivial VC matrix, so you have to take that into account when calculating the variance of a projection of the vector of outcomes onto the reals (such as wOBA).

guesswork
12 years ago
Reply to  guesswork

Good point. That actually simplifies things a lot now that I think about it. The variance of the multinomial is n(Ip – ppT) where I is the identity matrix, p is a column vector of probabilities, and T just means transpose. Then let w be our column vector of weights, so the variance of linear weights would be

wT(n(Ip – ppT))w

For my example, that comes out as 10.859, so our standard error is 3.295.

Thanks!

Barkey Walker
12 years ago
Reply to  guesswork

guesswork, you used a binomial example where the VC matrix is rank1 (i.e. you don’t need a matrix).