Archive for Research

Comparing FIPS and xFIPS Using Batted Ball Distance

In one of the World Series chats I hosted, it was stated that Matt Cain gave up weak fly balls and that is the reason that his xFIPs (2010 = 4.19 and lifetime = 4.43 ) are higher than his FIPs (2010 = 3.65, lifetime = 3.84). After finally getting all the wrinkles worked out, I am able to get the average distance for fly balls given up by a pitcher. So, does the fly ball distance given up by a pitcher help to explain the difference between his xFIPs and FIPs?

I took just the pitchers that threw over 60 innings in 2010 and subtracted their FIPs from their xFIPs. Then I got the average distance of all the fly balls for these pitchers and here are the top five leaders and laggards:

Read the rest of this entry »


Predicting a Team’s Wins Using Underlying Player Talent

I have been wanting to have this win prediction tool available for a while and finally have what I think is rather simple working model. This spreadsheet can be filled out with the players anyone thinks will be playing, along with their all their stats and then the team’s projected wins will be calculated.

Note: An error was found on the spreadsheet dealing with position adjustment and corrected around 4:30 EST on 11/4. If you downloaded it before then, you will need to re-download it. Sorry for the inconvenience. -Jeff

While it can be used to get an idea of how many wins a team might get in the up coming season, I plan on using it to evaluate changes in a team. Those changes could be a free agent signing, a trade, an injury or a rookie called up to the majors. The team’s expect wins before or after the roster change can be evaluated .

Today, I am not going to do look at any team. I just wanted to make it available and once the Royals sign Cliff Lee, I can see how their expected wins compare before and after the acquisition.

Read the rest of this entry »


2010 Disabled List: Injury Locations

In a couple of previous articles, I looked at the 2010 disabled list data for all the teams and player positions. I have finally gone through and compiled the data on body part locations.

Last year Josh Hermsmeyer compiled an injury database with 2002 to 2009 DL information. In it, he set a format for selecting the various body parts. I have gone ahead and used the same categories. I have made another spreadsheet (Sheet 2) with these various designations for people to use. I will format the 2010 injury information so it can be combined with Josh’s data once the new retrosheet data set is released after the playoffs are completed.

To make the body location data easier for interpretation, I have combined Josh’s categories into a few broad categories:

Arms and Hands – hand, wrist, forearm, upperarm
Elbow – elbow
General – generalmedical
Hip – hip
Legs and Feet – ankle, foot, knee, thigh, lowerleg
Neck and Head – face, head, neck
Shoulder – shoulder
Trunk – upperback, back, chest, abdomen

First, here is the injury location data for total days and trips for pitchers and hitters to the DL:

Location_days

Location_trips

No real surprise here with pitchers dominating days lost to elbow and shoulder injuries and hitters mainly having injuries to their arms and legs.

Finally, here is a comparison of the average number of days lost for pitchers and hitters.

Location_ave

Pitchers take longer to get off the DL, excect for hip and general injuries. Well that is it for today. My next project is a little more ambitious as I want to find the chances of a player going on the DL knowing a set of conditions (i.e. previous injury history, age, etc).


Are Umpires Expanding the Strike Zone as the Season Goes On?

David Ortiz of the Boston Red Sox recently complained that the strike zone has been expanding as the season has gone on. He stated that the zone is growing in order to speed up the games. I decided to have a look to see if there was validity to this statement

I examined the area of the strike zone where about 50% of the time a pitch taken is called a strike and the other 50% of the time it is called a ball. This zone is just smaller than the rule book strike zone. Here is the percentage of called strikes strikes compared to the sum of called strikes and balls for each month and year:

percent

Note: Combined September data should be taken with a grain of salt because none is available from 2010 and this year more pitches generally are being called strikes.

As a rule, the number of pitches called strikes in this zone increase by ~ 2.5% from the beginning of the season to the end. The called balls and strikes thrown into this zone account for only 12.4% of all pitches throw over the time frame. This works out to 0.3% of all pitches during a game — or, in a 300-pitch game, one extra pitch that is called a strike versus a ball. Not that much difference.

To further show the difference, here are the called strike zones of all umpires from 2008 to 2010 for right-handed batters in the months of April and August.

Note: The numbers indicate the decimal format of the percentage of pitches that are called strikes. The circle means nothing, it is just used for visual reference.

April

4_April_Right

August

8_August_Right

The zone extends a bit on the right and left parts of the plate, but not that much. There seems to be some increasing of the zone over the season, but it is not that much for teams or players to worry about.


Introducing Team NERD

This is a post introducing and explaining NERD scores for teams. I’m including the results first and then the background, methodology — all that junk — second.

Curious as to what NERD is? The short answer is: it’s a number, on a 0-10 scale, designed to express the “watchability” of teams for those of the sabermetric persuasion.

For more information, consult the index right after the results.

The Results
Here are the results for team NERD:

Read the rest of this entry »


How Much Is Fielding Weighted in WAR?

Occasionally (okay, rather frequently), I’ll see people debate the accuracies between the WAR displayed on FanGraphs and Rally’s WAR on Baseball-Reference.

Joe Posnanski speculated on the differences in a recent article about Josh Hamilton’s MVP chances:

*I could be reading this wrong, but Fangraphs seems to put more emphasis on defense. For instance, Carl Crawford’s WAR at Baseball Reference is 3.7 — his defense is worth eight runs above average. But Fangraphs credits him for 22 runs above average, which thrusts his WAR up to 5.6 and into the No. 4 spot in baseball.

I’ve seen similar sentiments echoed throughout the blogosphere and on Twitter.

In reality, on a per-player basis in 2009, UZR distributed 441 fewer runs than TZ did, excluding pitchers and catchers. And there is not a year that UZR is available where its absolute value has been higher than TZ.

In 2009, the maximum spread of UZR was +31 to -37 and TZ showed a similar spread of +31 to -34. Here’s a graph of the full spread. The blue overlap shows the points at which TZ starts showing a greater spread.

This might not be a perfect comparison in how much defense actually contributes to WAR and a better one might be how much as a whole does fielding contribute to total runs. In 2009, fielding made up about 14.2% of all positive and negative runs according to FanGraphs WAR, while Rally’s WAR made up about 15.5% of all positive and negative runs.

All in all, they are similar in how fielding is weighted as a whole. The biggest difference between the two is how each individual player’s fielding is evaluated.


Aggregate Defensive Evaluations – 2009 LF

Update: Tangotiger pointed out to me that fans scouting report numbers for Holliday and Wilson looked off. Turns out I was calculating players who switched teams for the Fans Scouting Report incorrectly and as a result Matt Holliday and Jack Wilson are now adjusted accordingly and the tables below have been updated.

Earlier this week I ran the the Aggregate Defensive Evaluations (ADE) on 2009 shortstops. For those who missed it, this is an attempt to take 5 different fielding metrics (UZR, Fans Scouting Report, John Dewan’s DRS, Total Zone, and Total Zone with Location), put them on the same scale and then see which player’s defensive abilities we are fairly certain about and those which we are not.

In response to some comments, I’ve added a weighted average and standard deviation. This excludes standard Total Zone in favor of TZL. It also weights UZR, DRS, and TZL 3 times each and then the Fans Scouting Report only 1 time. (These are the last two columns)

It’s probably not much of a surprise that Carl Crawford sits atop the list. UZR has actually rated him as 56 runs above average the past three years, more than double the next closest player. Same goes for pretty much all the defensive metrics.

Matt Holliday I would say has the highest level of disagreement of any player. The Fans Scouting Report hates him, DRS loves him, UZR thinks he’s above average, and Total Zone thinks he’s just average. I’d consider the inconsistency with Holliday different than Juan Rivera’s situation where there’s also a high level of disagreement. In Rivera’s case at least all metrics agree he’s average or better.

Ryan Braun is also pretty interesting in that the Fans rated him as +15, while all the other defensive models thought he was well below average. This season Ryan Braun continues to be rated poorly by the defensive models. It will be interesting to see what the fans think of him next season.

I’d still consider these reports a bit of a work in progress, but for those interested, here’s the shortstops again, but this time with the weighted averages column:


Aggregate Defense Evaluations

There’s no denying defensive metrics are controversial. Whether they clash with what you’ve seen with your own eyes, or you just don’t believe them, it seems like everyone has some sort of opinion to offer on their validity.

On FanGraphs, we carry no less than four different defensive metrics:

UZR – Mitchel Lichtman’s Ultimate Zone Rating
DRS – John Dewan’s Defensive Runs Saved
TZL – Rally’s Total Zone (location based version)
TZ – Rally’s Total Zone (standard version)

There’s no denying that we use some more frequently than others (cough, UZR), but the reason we have all four is because it’s great to see what different data sets and different models spit out. And In addition to the four, there’s also a fifth completely unrelated metric in the Fans Scouts Report that is run each and every year on insidethebook.com by Tangotiger.

It’s important to note that all these defensive metrics are not on the same scale, so it’s difficult to glance at all four (five if you use the Fans Scouting Report) and get a good sense if they’re in agreement or not. Which brings me to the preliminary look at the Aggregate Defensive Evaluations, where each metric is put on the same scale for each position, averaged, and then a standard deviation is computed for each player. Here are the 2009 Shortstops (min 82 games played):

As you can see, Paul Janish and Brendan Ryan are the clear leaders atop the list and even all the metrics are for the most part in agreement. +/- 5 runs in either direction is still going to make them elite defenders.

And there are players like Yunel Escobar who is considered by Total Zone and DRS to be very good, but by UZR and the Fans to be more or less average. On an aggregate level he still ends up as very good, though there is a good amount of disagreement as to just how good he is, even if no system thinks he’s below average.

All in all, it should be easy to go up and down the list and see which players there’s a high level of confidence about defensively, and which there is not.

From a mere computational standpoint, is this the best way to go about combining defensive metrics? I’m really not sure and it’s certainly worth looking into further. There are a lot of options in weighting the metrics differently and how to scale them, but overall I feel this is at least a decent start and something I hope to delve into a bit more.

The point here is that there’s a lot of information in these metrics with so many models out there it’s becoming increasingly important to try and identify what we’re fairly confident about and what we’re not so confident about instead of making the mistake of throwing them all away.


All Star Pitchers and the Cy Young Award

Ubaldo Jimenez and David Price got the nod this last Tuesday as the All-Star Game starters. Both pitchers had a decent first half, but do the pitchers that get chosen at starters end up winning that season’s Cy Young Award like Tim Lincecum did last season?

I decided to go back and look at how often the Cy Young Award winner was that year’s All-Star Game starter. Both leagues started awarding Cy Young Awards in 1967 and that is were I started my comparison. To begin with, there were seven instances where a reliever won the Cy Young. I removed these instances since there was no chance for him to start the game. So out of a possible 79 Cy Young Award winners, 20 — or 25% — of them started the All-Star Game. Of these 20 instances, 16 of the them have happen since 1988, the halfway point in years since the award started.

Besides just looking at the All-Star game starter, I looked to see if that season’s Cy Young Award winner, starter or reliever, was selected to participate in the All Star Game. Of the 86 Cy Young Award winners, 70 — or 81% — of them were selected for that year’s All-Star Game. Since 1998, when the All-Star rosters started to expand from 28 players to the current 34, 23 of the 24 Cy Young Award winners were selected for the All-Star Game. The only exception was Johan Santana in 2004.

Most of these numbers make common sense. For someone to win the Cy Young, they are going to need a good first and second half of the season. All the pitchers that had a good first half are on display for the All-Star Game. These players are seen as the top pitchers and create a short list of players for the voters to watch in the second half of the season.

Ubaldo Jimenez and David Price are not guaranteed to win the Cy Young Award, but they shouldn’t be counted out either. There exists a good chance that this year’s Cy Young Award winner was on the All-Star roster though, especially considering recent trends.


Who Are Creating Outs Running the Bases?

Last Tuesday night, I watched the Royals score 2 runs in the tenth inning to beat the Mariners. The Mariners were actually lucky because 2 Royals were thrown out trying to steal in the tenth inning. The Royals seem to get 1-3 runners thrown out on the base paths during each game (e.g. as I write this Billy Butler gets thrown out making a wide turn at 1st base).

Well, how does the Royals base running compare to the rest of the league. I took all the times a runner was safe at first and was then picked off, caught stealing or thrown out running the bases in a non-force out situation (e.g. trying to go from 1st to 3rd on a single). Some teams will have more chances to get thrown out because they have more base runners so I found the percentage of times a runner was thrown out on a non-force play once the runner was safe at first.

How does the Royals compare to the rest of the league? Here is a list of the worst running teams so far this season (team and percent of time thrown out by a non-force play) :

White Sox 7.9%
Angels – 7.7%
Rays – 7.6%
Rangers – 7.6%
Padres – 7.6%
Royals – 7.5%

Now that base path incompetents have had their 5 seconds of shame, here are the teams that get thrown out the least amount of times on the base paths:

Red Sox – 3.3%
Phillies – 3.6%
Blue Jays – 4.0%
Tigers – 4.2%
Braves 4.4%

Note: Here is a Google Spreadsheet of the all the teams for reference

The Royals are not the league’s worst team, but are not too far off. A few more running errors and they could quickly over take the White Sox.

Unnecessary outs on the base paths are frustrating to deal with as fans. Some of us fans though have to deal with it more than others.