Tool Version 2: Pitching Correlations with Improved Filtering

Now kids, I’m not a user of the Twitter, but I did follow a link to it in one of Eno’s articles last week, whereupon I came across an interesting question posed in a tweet by one @b_g_h: “Are hitters with low BB% also more volatile [because] of BABIP variance?”  There are different ways to address the question statistically, but it seemed to me that with an altered version of my pitching correlation tool, one could provide insight into issues like this, at least regarding pitchers.  A hitting version is waiting in the wings, don’t worry.  So, what I bring to you today is an interactive and downloadable spreadsheet that allows you not only to analyze the relationship between any two pitching statistics, but also to filter your data by any three statistics of your choosing. Read the rest of this entry »

Tool: Basically Every Pitching Stat Correlation

In doing my research, I often like to take a look at correlations to get an idea about whether factors might be connected.  At the end of this season, I put together a spreadsheet to help me with that.  Well, I haven’t finished the research yet (FG+ subscribers will probably soon find out what’s been keeping me from it), but in the meantime, I thought I’d share what I hope will be a pretty handy tool for whomever out there might be interested in what lies a little beneath the surface of all these stats on FanGraphs.  And I do mean all of them.  Any pitching-related stat on FanGraphs should be represented in this tool.  You can compare one stat to another, or to itself in a different year.  Or, what the heck, you can even compare a stat to a different stat in a different year.  And, for you sticklers out there, it will even give you a confidence interval on these correlations (by default, it gives you the range of correlations that the true correlation has a 95% chance of being within).

What can you do with this?  Well, let’s say you want to see whether a stat is predictive of the next year’s ERA.  You could, for example, set Stat 1 to K% (after selecting the correct white box, type it in, or select from the drop-down list via the arrow to the right of the box), with the year set to 0 (meaning the present year), then set Stat 2 to ERA, with the year set to 1 (meaning the next year).  If you don’t change the IP or Season filters, you should see a correlation of -0.375.  That shows there’s a pretty decent connection between the two stats, in that if a pitcher has a high strikeout percentage in one season, he’ll likely have a low ERA the next (relative to the rest of the pitchers in the comparison).  If you change the year under ERA to 0, you’ll see the correlation gets stronger, whereas if you change it to 2 or 3, you’ll see it gets weaker.  That has a lot to do with the unpredictability of K%, and especially of ERA.  You’ll notice if you compare year 0 K% to year 1 K%, the correlation is a very strong 0.702, whereas if you do the same for ERA, it’s a moderate-to-weak 0.311.  Hopefully the graph will give you an idea of how strong those connections really are.
Read the rest of this entry »

More Fun with Markov: Custom Run Expectancies

Before the season, I put up a three-part series (1, 2, and 3) that explained how linearly-weighted stats like wOBA, while useful for comparing players to each other, don’t necessarily reflect each player’s true contribution to their team’s run scoring.  You see, the weights used to calculate wOBA are based on league averages.  So, for a team with league average breakdowns in walk rate, singles rate, home run rate, etc., wOBA (and its offspring, wRC+) ought to work very well in figuring out how valuable a player is (or would be) to an offense.  However, when it comes to particularly bad or good offenses, or to those with unusual breakdowns, wOBA will lose some of its efficacy.

Why?  There are synergistic effects in offenses to consider.  First of all, if a team gets on base a lot, there will be more team plate appearances to go around, which of course gives its batters more chances to contribute.  Second of all, if the team gets on base a lot, a batter’s hits are generally worth more, because they’ll tend to drive in more runs.  And, of course, once the batter gets on base in such a team, it will be likelier that there will be a hit (or series of hits) to drive him in.  The reverse of all three points is true in a team that rarely gets on base.

But it goes even beyond that.  Let’s say Team A gets on base 40% of the time, and Team B gets on only 20%, but their balances of the ways they get on base are equal (e.g. each hits 7x as many singles as they do HRs) .  A home run is going to be worth something like 14% more to Team A, due to more runners being on base.  However, to Team B, a home run is worth over ten times as much as a walk, whereas to Team A, it’s worth only about 5 times as much.  That’s because Team A has a much better chance of sustaining a rally that will eventually drive in that walked batter.  Team B will be much more reliant on home runs for scoring runs.

Why Strikeouts Secretly Matter for Batters

I got my start at FanGraphs by writing Community Research articles. As you may have noticed, community authors have been very busy this season, cranking out a lot of interesting articles. One that caught my eye the other day was triple_r’s piece on the importance of strikeouts for hitters. The piece correctly pointed out, as other studies have, that there’s basically no correlation between a hitter’s strikeout rate and his overall offensive production. Strikeouts don’t matter; case closed, right? Well, not exactly.

Let me present a hypothetical situation. Say there’s a group of players who go to an “anti-aging” clinic in Florida and pick up some anabolic steroids. Let’s say these hypothetical players are named Bryan Raun, Ralex Odriguez, Tiguel Mejada, Phonny Jeralta, Celson Nruz, and Barry Bon… nevermind. Yet, after using the steroids, it appears that the group of them, on average, has not improved. The steroids didn’t improve their performance, right? But, wait — let’s also say that while visiting Florida, some of them contracted syphilis, which spread to their brains, causing delusions and severely impacting their judgment, strike-zone and otherwise. The players whose brains aren’t syphilis-addled have actually improved quite a bit, but their gains are completely offset by the losses suffered by those whose central nervous systems are raging with syphilis. So, the fact that the steroids actually do improve performance has been completely obscured by another factor that is somewhat — but not necessarily — associated with the steroids.

My Simple(ish) Playoff Chances Simulator

A month ago, I submitted an article with something I came up with that I thought was pretty cool.  It was a simulator similar to the Coolstandings sim, except that it would use Steamer and ZiPS rest-of-season (RoS) projections instead of year-to-date statistics as the measure of each team’s true talent.  Well, as you may have noticed, the boss, David Appelman, must have thought it was a pretty cool idea too, as unbeknownst to me, he had been working on the same sort of thing since long before the idea popped into my head.  But my duplication of effort will hopefully not go entirely to waste, as I’ll be sharing and explaining the simulator I created.  You’ll be able to use it to analyze your own “what if” scenarios, if that’s your sort of thing.  Think ZiPS and/or Steamer is overly optimistic or pessimistic about some teams?  You can fix that by running your own simulations with this.  Or you can apply it to past or completely hypothetical teams.  Go nuts.

Simulating the Impact of Pitcher Inconsistency

I thought Matt Hunter’s FanGraphs debut article last week was really interesting.  So interesting, in fact, that I’m going to rip it off right now.  The difference is I’ll be using a Monte Carlo simulator I made for this sort of situation, which I’ll let you play with after you’re done reading (it’s at the bottom).

Matt posed the question of whether inconsistency could be a good thing for a pitcher.  He brought up the example of Jered Weaver vs. Matt Cain in 2012 — two pitchers with nearly identical overall stats, except that Weaver was a lot less consistent.  However, Weaver had a bit of an advantage in Win Probability Added (WPA), Matt points out.  WPA factors in a bunch of things, e.g. how close the game is and how many outs are left in the game when events occur.  Because of that, it’s a pretty noisy stat, heavily influenced by factors the pitcher doesn’t control much.  It’s not a predictive stat.  For that reason, I figured simulations might be fun and enlightening on the subject.  They sort of accomplish the same thing that WPA does, except that they allow you to base conclusions off of a lot more possible conditions and outcomes than you’d see in a handful of starts (i.e., they can help de-noise the situation).

Reviewing the Preseason Standings Projections

The FanGraphs staff made its obligatory preseason picks before the season (naturally), and I think it’s safe to say that none of us have psychic powers. My picks of the Angels and Blue Jays to win their divisions — they’re not looking so hot right now. In my defense, I was just blindly going along with what our preseason WAR estimates told me. OK, not the greatest defense, but I figured Steamer + ZiPS + FG-created depth charts could produce better guesses than I could on my own. Especially with the roster changes that have happened lately, I thought it would be a good time to revisit our projections. The Angels came up the series victors against the Blue Jays in their recent four-game Battle of the Disappointments, but both teams are still far below the expectations put on them.  However, let’s examine: could they actually be good teams who have just been unlucky?

Most teams have played somewhere around 110 games this season. That leaves plenty of room for unpredictability. If you flipped a coin 110 times, you’d expect to get about 55 heads, right? Well, the binomial distribution says there’s only about a 49.5% chance of the heads total being within even three of that (somewhere between 52 and 58 times). MLB teams are pretty different from coins — they’re a lot more expensive — but I think you can apply the same principle to them. The above calculation for the coin assumes the “true” rate of heads is 50%. What would we see if we were to presume our projections’ estimated preseason win totals are actually representative of the “true” win rates for each team? The following table will show you: Read the rest of this entry »

Batted Ball Types and Handedness Matchups, in General

Last month, I did a two-part analysis that showed what happens — strike out-wise — when, say, a pitcher who strikes out 15% of batters faces a batter who strikes out 20% of the time. As a special bonus for you all, I included a few hundred other K%-matchup types too. I made handedness matchups central to the study, as I think it’s pretty well-established that you can expect a hitter to strike out more often against same-handed pitchers. That is, if I was trying to give an expected result for a righty batter against a lefty pitcher, I looked only at the hitter’s past performance rates against lefties and the pitcher’s history against righties. Before I moved on to performing a similar analysis on batted ball types (grounders, liners, outfield fly balls, and infield popups), I wanted to see whether handedness matchups mattered to these as well.

For this study, my sample was all non-switch-hitting batters from 2002-2012 with at least 300 PA against lefty pitchers plus at least 300 PA against righties. I’d have gone by number of batted balls, except I’m throwing some non-batted ball stats into the analysis.

Let’s get right to it — the following table shows the chances that handedness really makes no difference to each stat, according to paired t-tests:

Could Chris Davis Match Roger Maris?

Chris Davis, with 37 home runs so far this season, has been generating a lot of buzz lately — both on the field and more recently with some comments he made during the All-Star break. When he was asked about the all-time home run record, Davis said:

“In my opinion, 61 is the record, and I think most fans agree with me on that.”

I have no idea if most fans agree with him, but it probably shouldn’t be  surprising that a guy within spitting distance of a 61 home run season would view that as the mark to beat — rather than 73 home runs, which is essentially out of range. So, just for fun, let’s figure out what Davis’ chances are of reaching Roger Maris.

At Tom Tango’s website, there was a discussion that tried to put a number on Davis’ chances of reaching that mark. Tango performed a “quick back-of-envelope calculation” to do so, but today, I’ll be providing you with an interactive tool that might make it easy for you to perform a more sophisticated calculation for situations like this (and many other types of situations).

Batter-Pitcher Matchups Part 2: Expected Matchup K%

In last episode’s thrilling cliffhanger, I left you with a formula that I brashly proclaimed “does a great job of explaining the trends” in strikeout rates for meetings between specific groups of batters and pitchers.  Coming up with a formula to explain what was going on wasn’t pure nerdiness — making formulas to predict these results is the point of this research project.  You see, the goal of my FanGraphs masters is to come up with a system by which we can look at a batter and a pitcher, and tell you, our loyal followers, some educated guesses of the chances of pretty much every conceivable outcome that could result from these two facing off against each other.  Getting a sense of the expected strikeout rate is merely the first step in what will likely be a long process of continuous improvement.

The idea of this matchup system is to not only give you estimates that are more free from the whims of randomness than “Batter A is 8-for-20 with 5 Ks and 1 HR in his career against Pitcher B,” but also to provide some evidence-based projections for matchups that have never even happened.  How do we propose this can be done?  By looking at the overall trends and seeing how players fit within them.  Can it really be done?  It definitely looks that way to me.  Today’s installment will be about attempting to convince you of that.