Archive for Research

Fielding Independent Offense, Part 2


Dare to dream.

On Thursday, we looked at Fielding Independent Offense (FIO) — as well as the Should Hit formula — and decided to toss stolen bases into the equation. The result were, let’s say, brow-elevating.

Today, we are going to put that result — the FIO formula — into action.

In the timeless words of Sir Samuel Leroy Jackson: “Hold onto your butts!”
Read the rest of this entry »


Fielding Independent Offense: Part 1


IT’S SO *** **** HARD TO THINK
WITH ALL THESE DUCKS EVERYWHERE!

In August of 2011, I introduced Should Hit (in three iterations: ShH, SHAP!, and Complete SHAP!). Should Hit is essentially a simple regression of walk rates, strikeout rates, home run rates, and BABIP on weighted runs created plus (wRC+). In both its calculation and its simplicity, it is very similar to FIP — but its uses and impact are quite unlike FIP.

Like FIP with groundball pitchers, the formula has some biases — known, accepted (by me, at least) biases. For instance, because it ignores doubles and triples completely, Should Hit naturally undervalues players who excel at the extra bags and overvalues to the sluggers stuck at first. It presumes a certain number of doubles and triples for every player based on their home run rate and other peripherals — all poor proxies for something that is a verifiable skill or weakness in many players.

Ultimately, though, the tools (ShH and its brethren) work rather well. For the curious thinker, ShH can admirably predict what a player might hit with a normal/career BABIP or if their BB% or K% or HR% changes. However, at the time of its uncovering, I was wrongly under the impression that the current FanGraphs iteration of the wRC+ formula did not include stolen bases. It mattered little to me at the time — the only reason I thought the uncovering was so interesting to begin with was that only four peripherals could explain almost 93% of the variation within wRC+ (and that is still amazing to me!)

But today, we are going to add in SBs and stand back with a decanter of thought and ask ourselves: “What the hell did we just make here?”
Read the rest of this entry »


Park Factors and ERA Estimators: Part III

When we last left the question on Park Factors’ effect on ERA estimators we found that the estimators performed the best in hitters’ parks when looking at starting pitchers. FIP and xFIP performed better than tERA or SIERA when predicting the next year’s ERA for this group of pitchers. For the other park types, the pattern looked similar to what we generally see — SIERA generally performs best, while all estimators provide better leverage over a pitcher’s YR2_ERA.

But what if we want to predict how pitchers with certain batted-ball profiles (fly ball vs. ground ball) will perform in different parks? If we’re trying to predict how C.J. Wilson (lifetime 1.68 GB/FB ratio) will perform moving from Texas to Anaheim — or Michael Pineda’s (0.81 GB/FB ratio) move from pitcher-friendly Safeco to hitter-friendly Yankee Stadium will turn out — in which estimator(s) should we have more faith? That is the focus of Part III.

I used the same methodology as Part II to determine park type. I then coded each pitcher as ground ball or fly ball based on their GB/FB ratio. A pitcher’s GB/FB is one of the most consistent metrics (for starter pitchers, the year-over-year correlation is 0.87, which is highest for all outcome metrics), so there was little concern about a pitcher changing their batted-ball profile between seasons. A GB/FB greater than 1 was coded as ground ball; less than 1 was coded as fly ball. In the end, 1,387 season pairs were included in the analysis:
Read the rest of this entry »


The First Overall by Actual Record

Prior to the 2005 version, the Major League draft used to alternate picks between the American and National Leagues like they also used to do with home field advantage in the World Series before Bud Selig had to dip his meddlesome fingers in. There are some well-known times when the first overall pick did not go to the team with the worst record in the previous season.

The most recent was when the Padres got to select ahead of the Tigers in 2004 despite the 2003 Tigers having happened. Luckily for the Tigers, the Padres picked Matt Bush and the Tigers landed Justin Verlander. I don’t think they’re crying foul over that missed opportunity.

Read the rest of this entry »


Park Factors and ERA Estimators: Part II

In my series’ first part, I looked at the effect that Park Factors have on various ERA estimators. The original question I attempted to answer was whether certain estimators were better suited for predicting performance, depending on whether a park is hitter-friendly or pitcher-friendly. The short answer was that ERA estimators did a much better job in hitter-friendly parks than pitcher-friendly parks, relative to YR1_ERA.

One question I didn’t answer was whether the effectiveness of estimators in various types of parks also varied by pitcher role (i.e. starters versus relievers). Generally speaking, ERA estimators perform better when you restrict the analysis to starters only — since relievers tend to be more volatile year-over-year. The question is whether this same pattern will hold given park factors’ impact. And as predicted, ERA estimators do a better job predicting performance for starters versus relievers.

The current data set includes 533 pairs of starter seasons and reliever seasons where the pitchers threw in the same parks in the first and second years, and did so as starters or relievers both years. Before segmenting by park type, we see results that are consistent with previous analysis regarding ERA estimators and their predictive powers for starters and relievers:

Read the rest of this entry »


2011 Disabled List Spreadsheet and Team Information

I have gone through all of the 2011 MLB transactions and compiled the disabled list (DL) data for the 2011 season. I have put all the information in a Google Doc for people to use

Read the rest of this entry »


Positional Differences in the Price of WAR

This week, I’ve talked about the retrospective price of WAR on an aggregate level. What I haven’t studied is the retrospective price of WAR by position. I thought this was particularly important in light of my finding that positional adjustments didn’t matter much for arbitration salaries. Players who played tougher defensive positions were underpaid in arbitration, relative to those who played easier defensive positions. As it turns out, the price of WAR has been much more expensive for some positions.

Read the rest of this entry »


Effects of Intentional Walks on Non-Intentional Walks

Intentional walks (IBB) are usually given to good and/or unprotected players in a lineup. Pitchers would rather face the next, weaker hitting batter. The IBBs lead to an inflated walk rate (BB%) for hitters. By removing IBB from a player’s BB%, a true walk rate emerges. A problem I noticed was that when a player’s IBB% increases so does their non-intentional walk rate (NIBB%). Here is an attempt at putting some numbers behind the assumption.

Read the rest of this entry »


Park Factors and ERA Estimators: Part I

(Note: I noticed a coding issue in the data, which resulted in three parks having a different classification. The data has been re-run to reflect the new results and the article updated to reflect the findings.)

Researchers have gone to great pains to highlight and account for factors outside of an individual player’s control when evaluating their performance and value. The standard for this is of course Voros McCracken’s seminal research into defense independent pitching and Tom Tango’s fielding independent pitching (FIP). While baseball is arguably the most “individualistic” of the major team sports, players do not perform in isolation from each other or from their environment.

Lately I’ve become more interested in how the physical environment of a team and its players affects their outcomes on the field. My initial research led me to look at whether a team’s home park and the degree to which it inflated or suppressed run scoring put the team at a fundamental advantage or disadvantage in terms of winning. The results suggested that hitter-friendly parks do, in fact, put a team at a fundamental disadvantage, likely due to the stress that playing 81 games a year in that environment places on the pitching staff.

In this article, I am concerned with how park factors may affect the various constructs we’ve developed to help us better evaluate a player’s talent and likely performance in the future. Specifically, to what extent to do park factors affect the usefulness of various ERA estimators? It seems reasonable to assume that while much of what happens when a ball is put in play is not controlled by a pitcher. However, given that some extreme parks are likely to exercise their own environmental force over the outcome of batted balls it stands to reason that ERA estimators that factor in a pitcher’s batted ball profile may do a better job in certain types of parks than others.

Read the rest of this entry »


Mile Fly City?

Recently, one of our readers, Simon, noted that the Rockies might be targeting fly-ball pitchers with the recent additions of Guillermo Moscoso, Jamie Moyer and Jeremy Guthrie. I decided to examine if going after fly-ball pitchers was a practical method for limiting runs at Coors Field.

In an ideal world, the Rockies would love to have all extreme sinker-ball pitchers. The Rockies GM, Dan O’Dowd, stated this stance recently on Clubhouse Confidential.

In an ideal world, every single guy in Colorado would be a heavy sinker ball guy who would have a tremendous ground ball to fly ball ratio.

It is not an ideal world and he knows it. He goes on further to state:

Unfortunately not all of our decisions are made in an ideal world. When we balance fly ball rates, we really try to balance soft and hard.

Read the rest of this entry »