Archive for Research

Little League Home Runs in MLB History, Part I

This article was originally developed as an oral presentation given by the author to the Society for American Baseball Research at their SABR 45 Convention in Chicago on June 27, 2015. The presentation, which featured the innovative use of video, audio and transitional animation embedded within a PowerPoint deck, was awarded the annual Doug Pappas Research Award as the best of the 32 oral presentations made during the convention that weekend.

This article has been repurposed from that deck. Since the Retrosheet play-by-play data on which this study was predicated were updated just days before the original presentation, all the data provided during the oral presentation have been updated for this article.

Let’s start off this article the way I started off my presentation to SABR: with a quick poll. And you might as well be honest, now, because otherwise you’re just bullshitting yourself, and that would just be pathetic.

  • How many of you played Little League when you were a kid? Hands up, please. OK… keep them up. Now:
  • How many of you ever hit a home run in a Little League game? If you did, keep your hands up. OK… now, finally:
  • How many of you hit an actual home run clear over the outfielders’ heads and were able to trot all the way around the bases in a Little League game?

Not so many of you, right? Only the very best players on any given Little League team ever hit that kind of home run. If you’re like me, and like most Little Leaguers, if you ever hit a home run in Little League, this is what it probably looked like this:

Read the rest of this entry »


The 2015 Strike Zone, Through July

With strikeout rates soaring and run scoring dipping to generational lows in recent seasons, word came in the offseason that the Competition Committee would be monitoring the expanding strike zone in 2015. Given the scrutiny it is receiving at the league level, I have been tracking the strike zone over the course of the season, with updates at the end of each month. At the following links you can find the updates from the end of April, May and June.
Read the rest of this entry »


A New Way to Look at Sample Size: Math Supplement

This article is co-authored by Jonah Pemstein and Sean Dolinar.

For the introductory, less math-y post that explains more about what this project is, click here.

The concept of reliability comes from the classical test theory designed for psychological, research, and educational tests. The classical test theory uses the model of a true score, error (or noise) and observed score. [2]

CTT_EQUATION_diagram

To adapt this to baseball, the true “score” would be the true talent level we are seeking to find, and observed “score” is the actual production of a player. Unfortunately, the true talent level can’t be directly measured. There are several methods to estimate true talent by accounting for different factors. This is, to an extent, what projection systems try to do. For our purposes we are defining the true talent level as the actual talent level and not the value the player provides adjusted to park, competition, etc. The observed score is easy to measure, of course — it’s the recorded outcomes from the games the player in question has played. It’s the stat you see on our leaderboards.

The error term contains everything that can affect cause a discrepancy between the true score and the observed score. It contains almost everything that affects the observed outcome in the stat: weather, pitcher, defenses, park factors, injuries, and so on. This analysis isn’t interested in accounting for those factors but rather measuring the noise those factors in aggregate impart to our observed stat.

Read the rest of this entry »


A New Way to Look at Sample Size

Jonah Pemstein and Sean Dolinar co-authored this article.

Due to the math-intensive nature of this research, we have included a supplemental post focused entirely on the math. It will be referenced throughout this post; detailed information and discussion about the research can be found there.

INTRODUCTION

“Small sample size” is a phrase often used throughout the baseball season when analysts and fans alike discuss player’s statistics. Every fan, to some extent, has an idea of what a small sample size is, even if they don’t know it by name: a player who goes 2-for-4 in a game is not a .500 hitter; a reliever who hasn’t allowed a run by April 10 is not a zero-ERA pitcher. Knowing what small sample size means is easy. The question is, though, when do samples stop becoming small and start becoming useful and meaningful?

Read the rest of this entry »


On Rotation, Part 2: The Effects of Spin on Pitch Outcomes

On Monday, I looked at how different spin rates for different pitches affect the way those pitches move through the air towards a batter. That post was useful for understanding the relationship between spin and velocity and movement. What it didn’t tell us, however, is too much about what the spin actually does for the pitcher: does more spin make pitches harder or easier to make contact with? Does more spin induce weaker contact? To answer those questions (as well as others), we can look at the actual production from hitters on these pitches. That’s the goal of this post.

The first such stat we’ll consider is contact rate (Contact%), or times made contact (balls in play or foul balls) per swing.

Read the rest of this entry »


What Hard-Hit Rate Means for Batters

Recently, one of the hot topics in baseball statistics has been the appearance of a measurement for hard-hit balls: here at FanGraphs, we added hard-hit rate to our leaderboards before this season, adding along with it a wealth of opportunities for analysis. An issue with any new statistic is that it can be cited without fully knowing its true use or impacts, and so hard-hit rate has been making the rounds in player analysis, generally cited in respect to how well or how poorly they have been performing.

For hitters, it might go without saying that hitting the ball harder is generally a good thing: the aim of hitting, in a certain sense, would seem to be to hit the ball as hard as possible as often as you can (except in the cases of bunting or other situational circumstances). However, it hasn’t been clear yet how hitting the ball hard impacts other rate and counting statistics, and that seems to be a hole in our understanding of a statistic that is undergoing a moment in the spotlight.

The aim today is, at the very least, to explore how hard-hit rate impacts a few of those stats, as well as to begin a conversation that more astute statistical minds may be able to take to deeper and exciting places. There are a couple levels to this piece today, but there are surely many more that I have not reached: I don’t intend to make hard conclusions, but rather to explore and provide a well-intentioned foray into the data. With that said, onward.

Read the rest of this entry »


On Rotation, Part 1: The Effects of Spin on the Flight of a Pitch

My last article was a look at the effects of pitch location on batted balls. While it ended with on somewhat disappointing note, showing that the results couldn’t really be applied to individual pitchers, it did make me think more about which components of a pitch affect the pitch, and in which ways.

So I decided to examine spin. Spin is captured by PITCHf/x in two measurements: rate (in revolutions per minute) and direction (the angle in degrees). As it turns out, the spin of a pitch has quite the effect on its outcome, much like location. Different spin rates make the pitch move differently (obviously) and get hit differently. (For a look at this topic from a physics standpoint, check out this infographic and this much more complicated article, both from the excellent Alan Nathan. And, to make sure everybody knows: I know little about the actual physics of this past what I can infer from my baseball playing and watching experience. I am just looking at the PITCHf/x data.)

Before we get right to the graphs, a quick note about my methodology. I grouped each pitch from 2009 onward — which is the year PITCHf/x started to record spin rate consistently — into buckets based on spin rate (pitches were rounded to the nearest 50 RPM) and pitch type (I included four-seam fastballs, curveballs, changeups, two-seam fastballs, cutters, knuckleballs, and sliders). I then found a multitude of stats for each bucket: contact rate, average speed, average movement, ground ball rate, and many more. I also did the same with spin angle, grouping pitches into buckets by rounding to the nearest 20 degrees, but the results weren’t particularly meaningful.

I also combined two-seam fastballs and sinkers when I was doing this. There has been some discussion in the past about whether there is a difference between those two pitches. While PITCHf/x classifies them separately, they are more or less indistinguishable, and when I first did this without combining them, they overlapped on nearly all of the various graphs.

Read the rest of this entry »


Batted-Ball Rates vs. Velocity Changes

Last year, I revisited Mike Fast’s “Lose a Tick, Gain a Tick” article and found how much a pitcher should expect to see his ERA, FIP and xFIP change with a velocity decline. Additionally, I found the rate of decline of strikeouts and walks. An interesting finding from the work was that FIP and ERA change by the same amount with a velocity decline while xFIP doesn’t follow the other two. I decided to examine some batted-ball stats to see which ones change when a pitcher’s velocity changes.
Read the rest of this entry »


Modeling Salary Arbitration: Stat Components

This post is part of an ongoing arbitration research project and is coauthored by Alex Chamberlain and Sean Dolinar.

April 24: Modeling Salary Arbitration: Introduction

Feb. 25: 2015 MLB Arbitration Visualized

* * *

A couple of weeks ago, we introduced a couple of regressions that modeled arbitration results using a basic formulae predicated on wins above replacement (WAR). Ultimately, the models estimated that an arbitration-eligible pitcher could expect his salary to increase by 14 percent, and his raise in salary to increase by 56 percent, for each additional WAR. A hitter could expect increases of 13 percent and 46 percent, respectively.

The models, however, were incomplete: they did not incorporate any other stats aside from WAR. This was by design, as we wanted to introduce simple one-variable equations for the sake of demonstration. WAR is, conveniently, a comprehensive variable that attempts to summarize a player’s worth in one easily digestible number. But what about the effects of a player’s age or arbitration year?

Moreover, the r-squared statistic — a quick-and-easy check of a model’s validity — for each specification is not especially strong, clocking in anywhere between .30 and .56. This is partly a result of specifying only one explanatory variable, so including more variables — which we have done in this post — should improve the goodness of fit of the models, assuming the variables are relevant.

With that said, we have new-and-improved models to share with you: one comprised of composite statistics and another comprised of traditional statistics. They are all vanilla, linear ordinary least squares (OLS) regression models, and it is important to remember that the values for each stat can only be used in the context of that specific model.

Non-Traditional Statistics

For each player, we specify…

  • a composite statistic, such as wins above replacement (WAR) for batters and RA9-WAR for pitchers, to measure overall performance (RA9-WAR uses runs allowed per nine innings rather than FIP);
  • a service statistic, such as plate appearances (PA) and innings pitched (IP), to measure playing time;
  • a “glory” statistic, such as home runs (HR) and saves (SV), to account for baseball’s affinity for traditional statistics and social constructs;
  • arbitration year (for pitchers*), indicating a player’s total service time;
  • and his age (for hitters*), to measure as best we can the number of years for which he has inhabited the earth.

We identify these particular stats not only to cover as much analytical ground as possible but also minimize the use of stats that have high correlation among themselves (multicollinearity). We want to isolate different aspects of player performance or value as best we can.

Read the rest of this entry »


A Look at Quality of Contact Profiles

It seems like it should matter how hard you hit the baseball. That statement probably seems self-evident, but until this year we haven’t really had a whole lot of evidence to demonstrate whether that’s true. We have an old month of HITf/x data from 2009 and there’s non-public data about exit velocity, but until StatCast data arrived this year, we didn’t really have the tools to determine how much quality of contact matters.

Last week, FanGraphs launched quality of contact statistics courtesy of Baseball Info Solutions to add to this effort. The methodology isn’t based solely on a raw exit velocity, but the data stretches back to 2002 and it’s publicly available now and easy to manage. As soon as people realized the data was available, the sabermetric masses went to work to run preliminary tests on the data. One of the interesting things that showed up right away was that the data didn’t do a great job predicting itself in the future and things like Hard% didn’t correlate with stats like BABIP or LD% as well as we might have otherwise thought.

Read the rest of this entry »