Updating and Improving The Outcome Machine

A little while ago, I wrote an article for the Community Research blog about projecting plate appearances before they happen based on the batter and the pitcher. It was pretty well received (which was nice, because I put some serious work into that thing), and apparently it was good enough for Dave Cameron to foolishly kindly decide to call me up to the big leagues.

If you read through the comments there (or if you left a comment!) you probably realized that no, the Outcome Machine — as the tool was dubbed — was not perfect. There were flaws in the way I conducted my research, and some of the assertions I made probably weren’t 100% true. So in this article, I am going to follow up on that first one and hopefully remedy any errors. Those include:

A) Not including league average as a third input (in addition to the batter and pitcher)

B) Declaring that the formulas which I calculated were “x” percent accurate. Whatever x was, it was probably wrong

C) Testing the data on the same dataset that I trained it on

D) Using pretty inconsistent methodology (which none of you could see… but it probably led to some incorrect results)

With all that in mind, let’s take a look at a new, updated, better, right-er version of the Outcome Machine. I’ll give it to you here so you don’t have to scroll down all the way should you not want to read my scintillatingly engaging writing about the way I did this and all the math behind it (I’ll forgive you if you don’t, I wouldn’t want to either).

There you have it. The way I went about doing this research was considerably different than the last time, although the results are fairly similar. The first time around, I used every player’s total 2003-2013 numbers as a measure of their talent. That’s obviously wrong; talent levels change constantly. This time, I instead used each player’s season numbers from the season in which the event occurred. So, for example, when Albert Pujols walked against Cliff Lee in a game in Cleveland in 2006, Pujols’s BB% against lefties was 14.9% and Lee’s against righties was 7.4%.

But then, in 2009, when Lee walked Pujols again, Pujols’s BB% against lefties was 19.5% and Lee’s against righties was 5.1%. In the first iteration of the Outcome Machine, Pujols’s BB% would have been inputted as 10.1% for both matchups, since that was his total BB% against lefties from 2003-2013. Similarly, Lee’s would have been inputted as 6.0% both times, since from 2003-2013, he walked righties 6 percent of the time. Of course, that’s not right, so that was fixed this time through.

Along the same lines, the yearly league average was also included as an input in this version. As noted by Mitchel Lichtman, using league average is important because the environment needs to be accounted for. Going back to our Pujols/Lee example, the 2006 at bat would have had a league average input of an 8.4% walk rate — the league average for that year. 2009 was 8.9%. It’s a subtle change, but it makes a difference, especially when comparing two years like 2003 and 2013, which had tremendously different run environments.

What was really interesting about including league average as an input, though, was that for every single statistic, the coefficient in the regression equation was negative. To put it another way — if the batter and pitcher’s skills remained the same, a higher league average would lead to a lower expected outcome. Here’s a concrete example: If a batter with a true talent level of a .320 wOBA faced off against a pitcher with a .320 wOBA against, and .320 was the league average, the expected outcome would of course be .320. But if you up the league average to .330, the expected outcome becomes lower, not higher.

That seems counterintuitive at first — why would the same situation have a lower average wOBA if the league average went up? But if you think about it more, a higher league average means that the batter is below average and the pitcher is above average, leading to a more pitcher-favorable outcome. (This works the same in reverse — lower the league average, and the expected outcome increases.) It seems a little bit strange to me, but that what Mr. Machine says. Since that’s the case for every single statistic for which I ran a regression, I’m inclined to trust it.

Another detail: in my first article, I noted that some of the batter-pitcher matchups had to be thrown out because there were too few instances to make meaningful judgments. I included this picture to show what I meant (this is for K%):

badresults

To avoid that this time, I set a minimum of 150 plate appearances in the year in question for both the batter and pitcher for every event. This way, I didn’t get groupings like the one above; no pitcher actually has a 60% strikeout rate, and that pitcher in the last row was undoubtedly somebody who faced very few batters. Doing this shrunk the amount of plate appearances in my dataset from about 2 million to about 850,000, but that was still enough. And this time, while running the regression, I weighted each grouping by the number of times it occurred, something that I did not do last time. All of that got rid of any bias that may have been introduced by me picking and choosing which groupings I though were fit to be included in my regression.

And there was also the issue of how accurate the regressions were. In my first article, there were a multitude of comments saying that I would of course get very high r^2 values for my regressions because I was testing the data on the same dataset that I trained it on — essentially, I determined the accuracy of the regression equations using the same dataset that I got the regression equations from, which leads to an artificially high r^2. So in reality, the correlations weren’t quite as high as I had originally determined.

So this time, I tested the regressions on 2014 play-by-play data. That yielded lower, but more accurate, correlation coefficients. The exact numbers, along with the coefficients for the regression equation and the type of equation, are shown in the table below. Note that the r^2 number you want to be looking at is the one in the last column, the “2014 r^2?.

Statistic Linear/ln-ln Bat Pit Year Int r^2 2014 r^2
K% ln-ln 0.9808 0.979 -0.8415 0.2058 0.8873 0.7791
BB% ln-ln 0.8994 0.8031 -0.3338 0.9031 0.9654 0.8829
1B% linear 0.99143 1.00323 -0.35379 -0.09317 0.9367 0.8047
2B% ln-ln 0.894 0.9012 -0.2125 1.8366 0.9709 0.9003
3B% ln-ln 0.8781 0.8462 -0.8884 -0.7635 0.8889 0.5208
HR% ln-ln 0.9947 0.885 -0.2801 2.2078 0.9906 0.949
HBP% ln-ln 0.8855 0.8297 -0.3389 1.7543 0.7489 0.4702
wOBA ln-ln 0.9778 0.8832 -0.5283 0.3883 0.8522 0.6364
BABIP linear 0.95507 0.95932 -0.57244 -0.09551 0.8788 0.6677
wOBA-FIP ln-ln 0.9838 0.3832 -0.6391 -1.2788 0.7135 0.4635

Key: Bat=coefficient for the batter’s numbers, Pit=coefficient for the pitcher’s numbers, Year=coefficient for the yearly league average, Int=intercept of the equation. The linear/ln-ln column tells you what form the equation follows. A linear equation follows the form:

expected outcome = Bat*(batter’s numbers) + Pit*(pitcher’s numbers) + Year*(league average for that year) + Int

A ln-ln equation follows the form:

ln(expected outcome) = Bat*ln(batter’s numbers) + Pit*ln(pitcher’s numbers) + Year*ln(league average for that year) + Int

Where ln() is the natural log of the number. That can be rewritten to be:

expected outcome= e^(Bat*ln(batter’s numbers) + Pit*ln(pitcher’s numbers) + Year*ln(league average for that year) + Int)

The reason for making some equations follow the linear form and some follow the logarithmic form is explained in more detail in the first post about the Outcome Machine, but it is important to realize that the linear equations from the first article may not be linear this time. The same goes for logarithmic equations.

Something that I found to be very curious was the vaulting of 2B% (doubles/PA) from the bottom of the correlation standings to near the top. The first time training the data, 2B% had the lowest correlation out of any metric. This time, it’s the second-highest, trailing only HR%. I can provide no logical explanation for that.

Another somewhat perplexing find was the the changing of the BABIP r^2 from .96 the first time to .8788 this time, and an even further drop — all the way down to .6364 — in the 2014 r^2. The explanation that I think is most likely for that is that in the first version, inputs were included where either the batter or the pitcher had very few plate appearances/batters faced; these pitchers and batters had their BABIP heavily affected by each batted ball.

That means that the “output” BABIP from my table (the “total” column in the screengrab above) was determining the inputs in some cases, which created a very high correlation. I will aim to explore this more in the future, and I think that there are some very interesting conclusions to be gleaned from this.

You also may notice two new metrics in that table: wOBA and wOBA-FIP. Those weren’t part of the original Outcome Machine, but have been added here. If you played around with the tool before, you might have seen that you can find them under the “Simpler version” tab of the spreadsheet. They are probably what you’d expect them to be.

wOBA is the expected wOBA calculated from the batter’s wOBA, pitcher’s wOBA, and league wOBA; wOBA-FIP is the expected wOBA calculated from the batter’s wOBA and pitcher’s FIP (you could also use xFIP if you are so inclined, or maybe ERA; but I’d be careful about ERA just because it’s not as reflective of a pitcher’s skill).

That is pretty much it for the description of the tool. Here are some tips on how to use it:

You want to be editing the bolded cells with lighter shades and borders around them and nothing else, not even the BABIP cells. The BABIP values are calculated by adding singles, doubles, and triples and dividing that by the sum of singles, doubles, triples, and outs in play (which are determined by (1 – the sum of everything else). If you want to change BABIP, change the 1B%, 2B%, and 3B%.

The cells that are meant to be adjusted are all preset to roughly league average for 2014.

Technically, you can overwrite cells that aren’t meant to be written in (like BABIP) if you want to have more control over the sheet. It will revert back to normal if you reload the page.

BABIP doesn’t affect any of the five stats in the bottom row, and there is a difference between BABIP and xBABIP in the green “Expected” part. BABIP is (expected singles + expected doubles + expected triples)/(Expected singles, doubles, and triples + expected outs in play). xBABIP is the expected BABIP based on the regression equation. Use whichever you want; they will usually be pretty similar.

Below the colored areas there is a section for wOBA weights that you can adjust to change the wOBA value. You can also update the FIP constant.

There are three more sheets in this workbook — the “batters” one gives all the 2014 statistics for every batter, and the “pitchers” one does the same for pitchers. This is for you to be able to look up any player you want to input into the main sheet. Finding 1B%, 2B%, and 3B% can be a real pain. The “Batters Projected” tab is the 2015 Steamer projections for batters, complete with righty/lefty splits (courtesy of Jared Cross).
The pitcher handedness projections aren’t available yet, so for now we’ll have to stick with the 2014 stats. Be aware that the numbers in the two 2014 sheets are total numbers, not separated into vs. righties and vs. lefties, which makes the results less accurate. Inputting handedness-based inputs will give you a better estimate, and using the Steamer projections (which are probably a better measure of true skill) will make the estimates more accurate as well.

If the sum of all of your inputs is less than 100%, the numbers automatically add up to 100% because OIP% will be adjusted to make everything add up correctly. However, if the inputs add up to more than 100%, OIP% will be negative and throw everything off. Of course, if you just want to know the expected HR% (as an example) and nothing else, it doesn’t matter if OIP% is negative, but then you will have wacky AVG, OBP, etc.
If the cells aren’t updating after you adjust one, just give it a few seconds.

These are based off of models, so if you put in crazy values, you will get crazy outputs. Obviously a negative OBP isn’t possible in real life, but inputting negative values will give you one. Putting in 1 for each player’s BABIP will actually give you a 1.649 xBABIP, which of course is also impossible.

The pitcher’s FIP and ERA are also there. FIP is calculated as (3*(BB+HBP)-2*K+13*HR)/((OIP+K)/3)+cFIP). This is the standard FIP equation, but I had to calculate innings pitched from strikeouts and outs in play. ERA is a rough estimate based on a regression equation using BB%, 1B%, 2B%, 3B%, HR%, and HBP% as inputs. It may or may not always be so accurate, so I’d use the FIP box there instead. The ERA is really just there for added reference if you need it.





Jonah is a baseball analyst and Red Sox fan. He would like it if you followed him on Twitter @japemstein, but can't really do anything about it if you don't.

11 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
steex
9 years ago

Perhaps I’m alone on this, but I think one possible route to improvement would be making the Outcome Machine look more like the George Michael Sports Machine. This whole “Excel is a powerful tool” thing will never catch on.

Carson Cyst-Stooly
9 years ago
Reply to  steex

Make one that runs using Fortran commands. I couldn’t find the punch-hole card input slot on this “excel web-appliance” doohickey so I wasn’t sure how to enter values 🙁