Game Score (and Crowd-Sourcing)

by Tangotiger

July 29, 2011

Bill James created a metric called Game Score that looks at a pitcher’s standard pitching line, to come up with an overall score, centered at 50, with most scores in the 0 to 100 range. I am trying to take that concept, deconstruct Game Score, and reconstruct it by forcing it to follow some set rules.

The scale of Game Score will mimic win percentage. So, a Game Score of 50 means you will win 50% of the time. A Game Score of 70 means you will win 70% of the time, and so on.

We’re not going to be total maniacs about it, and force a Game Score to stop at 99 or 1. Think of the relationship between Game Score and win percentage more as useful guidelines, rather than a hard constraint. This is especially because the relationship between wins and runs is not linear, and so, it’s going to be impossible to create a linear metric that will work at the extremes.

We also know that ten marginal runs equals one marginal win. This means that one marginal run equals 0.10 marginal wins. In terms of the Game Score scale, one marginal run is 10 Game Score points. Keep this in mind as you read the various versions of Game Score.

In addition, we’re going to set the starting point of Game Score to 40, rather than following the Bill James lead of starting at 50. The idea here is to think in terms of replacement level, and if you pitch to one batter and are out of the game, we’d hardly call that an “average” game. Indeed, I would even consider starting the Game Score at 35 or even 30. For the moment, we’ll start at 40, and let’s see where this takes us.

Version 1: Runs

This version focuses only on runs allowed. The basic equation to solve is this:

Game Score = a * IP + b * R + 40

We’ve already established that one run is 10 Game Score points, so that forced b = -10. In order to solve for “a”, we need to know what an average number of innings is for a start (around 6.1 in 2011), runs (around 2.9 these days). By definition, Game Score equals 50 for this start. Therefore, solving for “a” gives us 6.4. I’m not too crazy about the decimal. So, we have:

Game Score = (6.4 * IP) – (10 * R) + 40

A nine-inning shutout gives us a Game Score of 98. On the flip-side, to get a Game Score close to 0 would mean 3 innings and 6 runs allowed (Game Score of minus 1), or 5 innings and 7 runs allowed (Game Score of 2).

If you believe that pitching is only about innings and runs, then stop here! This is the Game Score for you. Everything else I am about to say is going to be useless to you. But, if you are interested in different facets of pitching, then read on.

Version 2: Strikeouts and walks

This version concerns itself with the two resulting events that are highly linked to the pitcher himself. While contacted balls can be influenced by the park or the fielders, the strikeout and walk stabilizes very quickly. If there are only two stats to know about a pitcher, it’s these two things. In addition, we know that the difference between the two is what correlates to runs the most. Finally, it takes about three walks to add one run (and 10 Game Score points).

When I say “walk”, I’m really excluding intentional walks and including hit batters. It makes most sense to consider it that way than not, so that’s what I’m doing. So, we need to solve for:

Game Score = a * IP + b * (SO – BB) + 40

We’ve established that b = 3. Now we just need to solve for “a”. An average start has a strikeout minus walk differential of 2.5. Therefore, solving the equation gives us a = 0.4. Again, not a huge fan of the decimal, but, we’ll have to live with it. So we have:

Game Score = 0.4 * IP + 3 * (SO – BB) + 40

A Clemens game (20 K, 0 walks, 9 IP) would give us a Game Score of 104. Again, while we’d like to not necessarily exceed the 100 scale (since we were hoping to imply that to mean 100% winning), we’re not going to go crazy with that rule. On the flip side, a 0 strikeout, 14 walk, 5 inning game would be a Game Score of 0. We don’t see games like this because a pitcher will get pulled very fast. A pitcher can get runs to pile up because of bad breaks, but he will never be allowed to pile up walks.

Version 3: FIP

FIP is the shortcut version of DIPS, a sabermetric breakthrough mostly at the feet of Voros McCracken, and possibly inspired by Bill James’ DER work. Since the core of FIP is already scaled to runs, it’s a straightforward calculation:

FIPcore = 2 * SO – 3 * BB – 13 * HR

Game Score = a * IP + FIPcore + 40

An average FIPcore in a start is -5.2. So, solving for “a” gives us 2.5. Therefore, we have:

Game Score = 2.5 * IP + FIPcore + 40

A shoutout to Kevin Harlow is in order here.

Again applying a Clemens game here (with 0 HR), we get a Game Score of 103. On the flip side, a 3 HR, 5 BB, 0 K 5-inning start gives us a Game Score of minus 2.

Version 4: Linear Weights

Linear Weights, or component runs, is the inspiration here. Our focus is only on the inputs (hits, walks, home runs), while ignoring any sequencing. If you scatter the hits and walks, you can get a shutout, but if you bunch up the same number of hits and walks, you might get a 4-run inning. So, in this version, we’re explicitly ignoring the runs (the output) and only considering the components (the input).

As with the other metrics, since Linear Weights is already grounded in runs, the conversion to Game Score points is easy enough. A “hit” is not only singles, but all extra base hits. This is how the core looks like:

LWTScore = -(3*BB + 5*H + 8*HR)

So, a HR gets minus 13, while all other hits gets minus 5.

Game Score = a * IP + LWTScore + 40

A standard start has a LWTScore of -41. Solving for “a” gives us 8.4. Our fourth version is:

Game Score = 8.4 * IP + LWTScore + 40

A perfect game gives us a whopping Game Score of 116. To get a Game Score close to 100, you’d need to allow only one HR (Game Score of 103), or allow only 3 hits (Game Score of 101), or allow 5 walks (Game Score of 101). This version, more than the others, shows where the system breaks down. This is because there are many more components being considered. In the other cases, it basically presumed “average” results for the unknown parameters. In this case, since we considered the most important parameters, the linear approach breaks at the extreme. But, as noted, we’re going to live with it. (Or not. Your choice.)

To get a Game Score close to 0 while pitching 5 innings, you need to allow 3 HR, 5 walks, and 5 other hits (Game Score of 3).

Version X: Bringing them together

When you look at Bill James’ Game Score, he includes hits and walks and strikeouts and runs. In none of the above four versions do I have all of these together. This is because I was splitting up the components in a more focused manner. If all you cared about was runs, then you get that. If all you cared about was hits and walkd, then you get that. Bill basically (and implicitly) amalgamates these four versions into one.

Therefore, how best to do that? That’s where YOU come in. I don’t know. I can decide to weight the Runs version at 40% and the Linear Weights version at 30% and the FIP version at 20% and the Strikeout/Walk version at 10%. Or, I can come up with a different scheme. What is the best one? Well, it’s the one that best describes what we want: how well did the pitcher pitch?

Do we count all shutouts the same, regardless if it was a perfect game, or if you had 10 hits scattered throughout? Maybe you do, if all you care about is runs. And you wouldn’t if you think these two games are different. But, how different are they?

What if you had 10 hits + walks scattered throughout the game for no runs, and in another game the very same pitcher bunched them up for 4 runs? Are those two games identical? Well, if you don’t care about sequencing, then, yes, they are the same. If sequencing counts for how well a pitcher pitched, then, no they are not the same.

Crowd-sourcing

So, that’s where the exercise lies: what is the balancing point? How do you weight each of the 4 versions so that it tells the fair story? In order to answer that question, what you the reader need to do is… watch baseball games. You have to look at “weird” games (basically those where the Game Score of the four versions are all over the place for a particular start), and compare them to each other. Francisco Liriano threw a gem that Twins fans were overwhelmingly in agreement that it was a better pitched game than his no-hitter. So, that’s what you have to do.

I know, I know, you need to work. But, if I give you my answer on how to weight the four components, someone is going to say “no way”. That’s because that person is going to have his own balancing scheme. And that’s why we need a consensus.

You can still take advantage that the four Game Score numbers are going to be calculated. But, it would be helpful to have a consensus view as to how those four numbers can be collapsed into a “final” version as well.

David Appelman took the initiative here to run a quick poll a few days ago, comparing the no-hit games of Verlander, Santana, and Liriano. The Verlander and Santana games were in the same ballpark, with the Verlander game slightly ahead, and Liriano’s nowhere in sight.

So, David or I, or perhaps you the reader, will be on top of this. And, we’ll try to figure out the answer. I’m guessing that the upcoming playoffs will provide some great examples (like Doc and Lincecum last year). The advantage here is that we are all going to watch those games, and so, becomes a great crowd-sourcing event. My plan is to have the Version X of the Game Score decided by the end of the World Series.

Recap

Version 1 = 40 + (6.4 * IP) – (10 * R)
Version 2 = 40 + (0.4 * IP) + 3 * (SO – BB)
Version 3 = 40 + (2.5 * IP) + FIPcore
Version 4 = 40 + (8.4 * IP) + LWTScore

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG