Author Archive

Improved Article Searching

Over the past two years, there’s been a couple thousand posts published on FanGraphs. Searching the archive has been difficult to say the least. In comes Google Custom Search which in my opinion allows for a simple and effective way to search the FanGraphs archives.

On the homepage and in the blog sections you can now search for any topic you like, including players and hopefully you’ll be able to find what you were looking for much more easily than in the past.

I’m sure at some point we’ll come up with a great way to do a unified article/player search on the site, but until then, I think this will do the trick.


Pitchf/x Page Fixes

Just some quick notes on the 2010 pitchf/x data:

– There appears to be some new pitch type categorizations that were breaking some pitchf/x pages. There is EP (Eephus Pitch), SC (Screwball), KC (Knuckle Curve) and FO (Forkball).

– SC and FO I have temporarily lumped into FT (Two-Seam Fastball). I’ve been explained that screwballs (and possibly forkballs) are not really two seam fastballs, but there have been a total of 19 screwballs thrown this year, 18 of them by Daniel Ray Herrera and then the other one was thrown by Dallas Braden. There have been 12 Forkballs thrown, 8 by Livan Hernandez and 4 by Kenshin Kawakami. Overall, these two pitches didn’t seem to warrant their own category, but I can be convinced otherwise.

– Eephus pitches (EP) I have lumped into the UN (Unknown) category. Vicente Padilla has thrown all 16 of them.


Catcher Defense in WAR

Along with the rollout to the improved UZR data, catcher defense is now incorporated into WAR from 2003 onward. In this case we’ve opted to use the Stolen Base Runs Saved (rSB on FanGraphs) from the Fielding Bible:

Stolen Base Runs Saved gives the catcher credit for throwing out runners and preventing them from attempting steals in the first place.

For the most part, all catchers will remain about the same in value, especially on an individual season basis. But certain ones, like Yadier Molina, ends up with an extra 3.6 wins over the past 6 years. On the down side, Jason Varitek probably sees the biggest decrease in value, with -2.2 wins being attributed to his catcher defense since 2003.


UZR Updates!

The first UZR updates of the 2010 season are in, and from here on out they’ll be updated every Sunday night.

There have been a few improvements made to UZR this year, which will also be reflected in prior year’s UZR data. The changes do impact a few players, but for the most part, each player’s UZR has remained unchanged or is within a couple runs of what a player was rated before the improvements. Mitchel Lichtman, the man behind UZR, outlines the changes below:

Park factors have been improved, especially for “quirky parks and portions of parks,” such as LF and CF at Fenway, LF in Houston, RF in the Metrodome, and the entire OF in Coors Field. Of course, park factors in general are updated every year, as we get more data in each park, and as new parks come into existence and old parks make material (to fielding) changes.

In the forthcoming UZR splits section, we will also be presenting UZR home and road splits, as a sanity check for those of you who are skeptical of park factors. Please keep in mind that regardless of the quality of the park adjustments, there can and will be substantial random fluctuations in the difference between home and away UZRs and it is best to evaluate a fielder based on as much data as possible (e.g., using home and road stats combined), as we do with most metrics and statistics.

Adjustments have been added to account for the power of the batter as a proxy for outfielder positioning, so that, for example, if an outfielder happened to have “faced” a disproportionate percentage of batters with less than or more than average power, the UZR calculations will make the appropriate adjustments (as best as it can). Obviously, these kinds of adjustments are more important for smaller samples of data than for larger samples, since, in larger samples, these kinds of anomalies (in terms of opponents faced) tend to “even out.”

For infielders, similar adjustments are made for the speed of the batter, as a proxy for infielder positioning and how quickly the infielders have to field and release the ball, as well as the speed of the throw.

When a “shift” is on in the infield, according to the BIS stringers, if the play was affected by the shift, the UZR engine ignores the play. As well, if an air ball hits the outfield wall and in the judgment of the BIS stringers, no outfielder could have caught the ball, the play is similarly ignored.

Also keep in mind that UZR does not include first basemen “scoops” or the ability of the first baseman to influence hits and errors caused by errant throws from the other infielders. According to my (MGL) research, yearly “scoops” numbers are generally in the 1-4 run range, which means that the true talent range of most first basemen with respect to “scoops” is probably in the plus or minus 2 runs per year range – i.e., not much.


Dave Cameron Joins FanGraphs Full Time

I’m pleased to announce that Dave Cameron will be joining FanGraphs full time. Dave will continue to be the managing editor of FanGraphs and will have his duties expanded to other areas of the business.

Dave has played a pivotal role in the growth of FanGraphs since joining in early 2008. It’s very exciting to be able to bring Dave onboard in a larger capacity where he’ll be able to devote even more of his time to all things baseball, in addition to working on new and exciting FanGraphs projects.


Some Thoughts on Batted Ball Data

Colin Wyers wrote a post today about potential bias in batted ball data. While I don’t have anything in particular to say about the results of his bias study, I have to disagree with his conclusion and debunk some of the information provided about the differences between Stat Corner tRA and FanGraphs tRA, which he uses to illustrate his point:

For starters, the difference in tRA between FanGraphs and Stat Corner is a poor stat to illustrate GB/FB/LD bias because there are other differences in the way both sites calculate the stat. Let’s take Felix Hernandez this year, for whom BIS and Gameday have very, very similar batted ball profiles for 2010.

         GB      LD     FB 
BIS     67.6    13.5   16.6 
GD      65.8    13.2   15.8

Now, here’s the difference in FanGraphs tRA vs StatCorner tRA

FG – 4.62
SC – 5.05

Almost a half a run difference. Why are they so different? It’s probably the component park factors, mainly on LD% and HR%, I would imagine.

Actually, I’ll plug both of those stat lines into the FanGraphs tRA calculator and see what I get: 4.62 and 4.70. So, about .08 of the differences is because of GB/FB/LD differences and the other .35 is park factors (or potentially slightly different weights).

Furthermore, if you look at individual player GB% correlation from 2003 to 2008 between BIS and Retrosheet data, you get .94. That’s among all players, whether they pitched 1 inning or 200 innings. Here’s the others:

GB% – .94
FB% – .85
LD% – .72

It’s not like the two data sources are telling you completely different things. For the most part, they agree, especially on GB%.

Baseball Info Solutions also rotates their scorers, to try and avoid any scorer bias as Ben Jedlovec stated here:

BIS Scorers are assigned “randomly”. We’re not using a random number generator, but it’s almost as effective. Scorers have a designated number (Ex. Scorer #11) which are then rotated through different slots in the schedule. If scorers 7 and 8 are scoring the late (west coast) games one day, they’ll be rotated to early games the next time around. There’s some miscellaneous switching to accommodate vacation, etc. too. In the end, everyone’s getting a good mix of every team in every park.

We also have several different quality control methods in place to make sure that scorers are consistent with their hit locations and types. We added some new tests this season using the hit timer to flag the batted ball data, so the 2009 data is better than ever.

Ben continues with:

BIS gets an almost entirely new set of video scouts each season. If you’re seeing the same “bias” in the same parks year after year, I can’t see how it would be related to the individual scorer.

It’s also important to note that BIS has an additional classification of batted ball data, Fliners, which is not displayed on FanGraphs and lumped in with Line Drives and Fly Balls. Fliners come in two varieties, Fliner-Line Drives and Fliner-Fly Balls.

Colin tackled the line drive issue before on the Hardball Times, in which Cory Schwartz of MLBAM responded:

our trajectory data is indeed validated as thoroughly as all of our other data: not just once, but three times: first, by a game-night manager who monitors the data entered by the stringer, second by a next-day editor who reviews trajectories against video, and third by Elias Sports Bureau. We take great care in the accuracy of all our data, including trajectories.

None of this is to say that your original premise is not true: line drive vs. fly ball is indeed a somewhat subjective distinction that may be influenced by a number of factors, not just press box height. But I disagree with your assertion that the accuracy of our quality is inferior in this (or any other) regard.

Now we know that there is subjectivity in batted ball stats, but in Colin’s conclusion he writes:

In the meantime, consider this my sabermetric crisis of faith. It’s not that I don’t believe in the objective study of baseball. I’m just not convinced at this point that something dealing with batted-ball data is, at least wholly, an objective study. And where does this leave us with existing metrics that utilize batted-ball data? Again, I’m not sure.

For me, this is a bit of an extreme conclusion to make. For stats like GB% I think there is little to be concerned about, but once you get to LD%, I think you should realize there is some subjectivity involved. Is it worth disregarding entirely or having a “sabermetric crisis of faith” over? In my opinion, probably not.

We all want best data possible and there are some exciting projects underway to collect more granular and precise data, but in the meantime, I don’t see any reason to dismiss the data that is currently available. Better batted ball data will certainly lead to more accurate results; I don’t think it will show completely different results.

Authors Note: This was an expansion on my thoughts from a comment I posted on insidethebook.com


ZiPS In Season Projections

The ZiPS in-season projections have just gone live on the site and all the pre-season projections are now hidden by default. You can still see the pre-season projections by clicking on the “Show Projections” button right below each section’s heading.

There are two separate in-season ZiPS projections:

RoS = Rest of Season, or what a player will do for the remainder of the season.
Update = An updated full season projection for the player in question.

Huge thanks to Dan Szymborski of Baseball Think Factory for allowing us to use these again this season!


Defensive Runs Saved – Clarification

I’ve seen some confusion out there about the Fielding Bible runs saved metrics on FanGraphs. The Fielding bible has two metrics, one in plays made above average and then another in runs above average.

We are only displaying the information in Runs Above Average.

For those of you who are looking for comparables to UZR it is DRS. rPM (plus minus runs saved), would be comparable to RngR + ErrR.

Here’s correlation between the two for players who played at least 500 innings in a season from 2003 to 2009:

Here’s another more indepth look: A Quick Comparison of UZR and Plus/Minus

For the most part they will agree, but there are definitely some players where they don’t. You can read more about the differences between UZR and Defensive Runs Saved here:

MGL on the differences between UZR and +/-


+/-, RZR, and New Fielding Stats

There have been a few changes to the fielding sections of the site.

The biggest change is that John Dewan’s Fielding Bible +/- runs saved is now available on all the player pages and leaderboards going back to 2003. The stats that are associated with +/- runs saved are:

All in Runs Above Average:
rSB – Stolen Base Runs Saved (Catchers/Pitchers)
rBU – Bunt Runs Saved (1B/3B)
rGDP – Double Play Runs Saved (2B/SS)
rARM – Outfield Arms Runs Saved
rHR – HR Saving Catch Runs Saved
rPM – Plus Minus Runs Saved
DRS – Total Defensive Runs Saved

We’ve also added Revised Zone Ratings, which The Hardball Times used to carry. I’ll quote from THT:

Revised Zone Rating is the proportion of balls hit into a fielder’s zone that he successfully converted into an out. Zone Rating was invented by John Dewan when he was CEO of Stats Inc. John is now the owner of Baseball Info Solutions, where he has revised the original Zone Rating calculation so that it now lists balls handled out of the zone (OOZ) separately (and doesn’t include them in the ZR calculation) and doesn’t give players extra credit for double plays (Stats had already made that change). We believe both changes improve Zone Ratings substantially. To get a full picture of a player’s range, you should evaluate both his Revised Zone Rating and his plays made out of zone (OOZ). You can read more about the Revised Zone Ratings in this article.

BIZ – Balls in Zone
Plays Made – Total Plays Made
OOZ – Plays Made out of Zone

Then there’s some other fielding stats that were added such as:

FE – Fielding Errors
TE – Throwing Errors
DPS – Double Plays Started
DPT – Double Plays Turned
DPF – Double Plays Finished
Scp – First Baseman Scoops

Everything is available in the player pages and leaderboards and will be updated nightly.

This season’s UZR updates will be coming soon and we’ll have more on that later.


Contact% and Swing Strike% Correction

It’s come to my attention that the 2010 Contact% and Swinging Strike % numbers looked a bit off. The 4 stats impacted (Z-Contact%, O-Contact%, Contact%, and SwStr%), will be corrected in tonight’s data load are now fixed.