Archive for Research

How Many Pitches Does it Take? Part Two

As promised, let’s split up the starters from the relievers. Rather than set an innings barrier, I instead opted to eliminate all of those with less than 50% of their appearances coming in the form of starts. Our friendly neighborhood Ryan Franklin no longer qualifies to be spoken about and the overall numbers drop like you would expect. Here’s the causality breakdown:

5 P – 7 dropped
4 P – 34 dropped
3 P – 64 dropped
2 P – 30 dropped

That leaves:
5 P – 21 pitchers
4 P – 85
3 P – 70
2 P – 9

How did they fare?

If you’re thinking to yourself that those numbers look a lot like those presented yesterday, then you have a good memory. In fact, here’s the differentials between yesterday and today:

Note that a negative value indicates a drop in FIP.

5 P: 0.02 runs
4 P: 0.03 runs
3 P: 0.08 runs
2 P: 0.66 runs

As expected, looking at mostly starters sees the run averages increase.

To address some concerns from this analysis:

I’m not looking at grips, arm slots, release points, etc. instead simply the classification of the pitch. Some of the classifications are erroneous or too simplistic for the pitch style. That’s understandable.

Also not looking at the quality of the pitch, that would take examination on a pitch-by-pitch basis.

Looking at individual pitchers before and after the addition of a new pitch is definitely something I’ll look into pursuing. No guarantees though.


How Many Pitches Does it Take? Part One

I’ve been talking about pitches, pitching patterns, and pitch usage a lot lately. Whether it be through PitchFx charts, simply sharing observations, or talking about a pitcher who needs an additional pitch. Finally, I broke down and gathered the data needed to see whether having a surplus of pitches or only a couple mattered to performance.

Most people have the idea that quality matters more than quantity in mind. I know I did. In fact, while running the query (last three years, at least 5% usage of the pitch, at least 150 innings) only one pitcher recorded more than five pitches and that was Ryan Franklin with six. As you’ll see, Ryan Franklin is not a particularly good pitcher. Franklin is passable, but I think you would expect more from someone who has a constant advantage in game theory. Now, it is possible that Franklin falls into patterns, tips his pitches, or simply throws hittable garbage, I’ll leave that up to you to figure out, my only interest is the amount of pitches used modestly and whether it makes for better pitchers.

We begin today with that query I mentioned earlier. No restriction on amount of games started and only 150 innings over the last three years; meaning relievers like Joe Nathan, Mariano Rivera, and Jonathan Papelbon were eligible to make the cut. Let’s get to the data, shall we?

Franklin was the only pitcher with six pitches and 28 pitchers had five pitches qualify. Tradition has most starters throwing 3-4 pitches and most relievers having one or two. Tradition holds true here. 119 pitchers had four pitches qualify, 134 had three, and 39 had two. Franklin failed to make a start over the last three years meanwhile pitchers with 5 pitches saw 64% of their games come as starters, 60.2% for four pitch qualifiers, 32.2% for three pitches, and 11.7% for two pitches.

Let’s look at how they actually performed:

Are the relievers skewing the two and three pitch numbers? Tomorrow we’ll separate the starters from the pack and see if that’s the case.


More on Catcher’s Fielding…WP&PB

Other than stolen bases, which I addressed a few weeks ago, very little has been published on catcher’s fielding numbers. Tom Tango first conceived his WOWY technique in studying catchers. Now I’ve extended my stolen base study back to the beginning of the current RetroSheet in 1953, and added the rates of wild pitches and passed balls allowed back to the same date. It should put a smile on Tango’s face that Gary Carter of his beloved Montreal Expos rates third in career SB_RAA behind Ivan Rodriguez and Jim Sundberg, and fourth in career WP_RAA behind Bill Freehan, Bruce Benedict and Brad Ausmus, and second overall behind only Pudge, along with the best single season of +28.2 in 1983…the worst, Dick Dietz, -18.6 in 1970.

I had earlier included groundballs to catchers when I ran my infield defense. There just aren’t that many grounders fielded by catchers – the most in any one season over the past sxi years was 74 by Jason Kendall in 2006. Single season RAA on grounders ranges from Jason Phillips’ +1.1 in 2004 to Mike Lieberthal‘s -2.4 in 2003. Totals for the last six years range from Carlos Ruiz’s +2.4 to Lieberthal’s -3.4. (I don’t yet have a groundball table built for seasons before 2003).

The process is the same as I descrobed in the previous article on stolen bases. I queried RetroSheet’s events table, creating a new table of every combination of catcher and pitcher in each year, how many batters were faced with runners on base, and how many wild pitches and passed balls occured. A total was made of each catcher’s stats in each year (the “with” part) and also the stats of each pitcher he caught, while working with any other catcher (the “without”). These were weighted to the smaller of the sample sizes, and then summed into season and career totals.

The single best season for preventing wild pitches and passed balls, since 1953, was Bill Freehan of the Tigers in 1971. The pitchers he caught that year would have been expected to throw 62 wild pitches and 20 passed balls in Freehan’s playing time, but he only allowed 31 wild pitches and 7 passed balls to get by hum, saving an estimated 12.6 runs that season. His total allowed of 38 was 46% of the expected 82. Freehan had the highest career RAA of +52.0, while Jorge Posada had the lowest at -38.2.

On the other end is one of America’s favorites, who not only couldn’t hit, but apparently couldn’t catch either, Bob Uecker. In 1967, appropriately his last in the majors, in which Uecker split time between the Phillies and Braves, in only 80 games played he allowed 40 wild pitches and 25 passed balls, 222% above his expected totals of 18 and 12.

The major league average is .016 wild pitches and .004 passed balls per plate appearance with a runner on base. The best career normalized wild pitch rates go to Bruce Benedict, Yogi Berra and Mike Redmond at .010; Brian Downing, Del Crandall and Jason Varitek at .011; and Rod Barajas, Manny Sanguillen, Bill Freehan, Kirt Manwaring, Sherm Lollar and Steve Yeager at .012. The worst wild pitch rates are Earl Battey at .021; Junior Ortiz and Mike Macfarlane at .021; and Miguel Olivo, Johnny Roseboro, Tim Laudner, Jorge Posada, Pat Borders, Thurman Munson, Hal Smith, Darrell Porter and George Mitterwald at .020.

The lowest normalized passed ball rates were Brian Downing, Charlie O’Brien, Bruce Benedict, Dan Wilson, Yogi Berra, Brad Ausmus, Del Crandall, Sherm Lollar and Ron Karkovice at .002, with the worst being Miguel Olivo and Bob Brenly at .008; and Joe Azcue, Jorge Posada, Earl Battey and Lance Parrish at .007.

The top 5 ratios of reducing both are Bruce Benedict 56%, Yogi Berra 59, Brian Downing 60 and Mike Redmond and Del Crandall 64% each. The worst were Bob Brenly 142%, Earl Battey 140, Miguel Olivo 140, Jorge Posada 132, and Junior Ortiz and Mike Macfarlane 129% each.

In 2008, the best at runs saved blocking the plate were Kurt Suzuki +6.9, Kenji Johjima +5.8, Brian McCann +5.2, Ramon Hernandez +5.1 and Jason Varitek +4.1, while the worst were Miguel Montero -3.0, Miguel Olivo -2.8, Kevin Cash -2.7, Greg Zaun -2.7 and Jesus Flores -2.6. In case you were thinking that one year might be a small sample size for some of these backup catchers, Montero, Olivo and Flores are also among the five worst career rates for active catchers, along with Mike Rivera and Jorge Posada.

Career WP&PB Records
Yearly WP&PB Records

Read the rest of this entry »


Shift!

One of the really cool things that Baseball Info Solutions keeps track of is when there is a shift and it effects the outcome of the play. If it doesn’t effect the outcome of the play, it’s not recorded as a shift, even if one was employed.

In 2008, the top 5 players that were most effected by shifts (positively or negatively) were:

Carlos Delgado
Ryan Howard
Jim Thome
David Ortiz
Adam Dunn

Delgado’s BABIP on shift effected plays was at the .191 mark, compared to his .284 BABIP on every play. This is entirely different from say, Ortiz’s BABIP on shift effected plays which was .299, compared to his overall .273 BABIP. Makes you wonder if shifting on Ortiz is a good idea, though it would definitely take a deeper dive into the numbers to know for sure.

Anyway, this was really just a quick preliminary look at the data, but with everyone talking about shifts and BABIP lately, I thought this might be of some interest.


A-Rod’s Numbers

After a leak of the results of MLB’s 2003 anonymous survey testing for performance enhancing drugs, Alex Rodriguez admitted in an interview with ESPN’s Peter Gammons using them from 2001 to 2003, the three years he played for Texas.

“When I arrived in Texas in 2001, I felt an enormous amount of pressure, felt all the weight of the world on top of me to perform and perform at a high level every day.”…When asked if his usage took place from 2001-2003, Rodriguez said, “That’s pretty accurate.”
Rangers owner Tom Hicks, who took over the team in 1998, was shocked by Rodriguez’s admission.
“I certainly don’t believe that if he’s now admitting that he started using when he came to the Texas Rangers, why should I believe that it didn’t start before he came to the Texas Rangers

If ARod started using PEDs in 2001, then they had no effect, as his three years in Texas are statistically indistinguishable from his previous two years in Seattle. His jump in performance was between the 1998 and 1999 seasons.

Read the rest of this entry »


First Basemen “Scoops” – The Value of Handling Errant Throws at First Base

Most of you are familiar with Ultimate Zone Rating (UZR), a metric for evaluating fielding using detailed play-by-play data. For first basemen, like all other fielders, it measures the number of ground balls that are fielded and turned into outs as compared to an average-fielding first basemen, given the same number, type, and location of ground balls, as well as the same number of outs, the base runner configuration, and batter handedness (plus an adjustment for the parks and the ground/fly tendency of the pitchers). The difference between these two values is a fielder’s UZR. It is usually expressed as a number of runs saved or cost compared to an average player at that position over a specified period of time (usually defensive games). You will often see it as a rate stat, generally per 150 games.

What UZR does not measure for first basemen, because the requisite data is not readily available, are the theoretical runs saved or cost by virtue of a first basemen’s skill at successfully catching errant throws or throws in the dirt. For lack of a better word, I call these “scoops,” even though they necessarily include poor throws (e.g., high or off-line) that are not in the dirt.

Fortunately, there is a way to estimate this skill, using a relatively simple method “invented” by Tom Tango, called “without and with you”, or WOWY. The WOWY methodology was explained by Tango in an excellent article in the 2008 Hardball Times as well as in various online forums, including his (and my) blog, www.insidethebook.com.

Basically, it goes like this: Figure out what happens when a particular player is on the field (generally something that explicitly involves that player, or at least is affected by that player, such as the number of ground balls a particular SS – say Derek Jeter – fields and turns into outs). Then figure out the same thing when that player is not on the field, but all other relevant variables (in this example, mostly the pitcher, park, and opponent batters) remain constant. The difference between the two rates (per whatever you want) should reflect the difference in skill, or at least in performance (skill is usually performance regressed toward some mean, the amount of regression being a function of the sample size of the performance) of whatever you are measuring, between the player in question and the average player when the player in question is not on the field (but in the data set).

In other words, to use the Jeter example, if Derek fields 4 balls per 9 innings and all other SS (let’s say that our sample of “other SS” is large and unbiased enough to consider them league average) field 4.5 balls per 9 innings, given the exact same pool of pitchers, batters, and parks, then we can safely say (more or less – within the bounds of sample error) that Jeter is .5 balls per 9 innings worse than the average SS. As I said, it is simple and brilliant. The results are extremely telling if we can get large enough samples of Jeter and lots of other SS that are “matched” with Jeter based upon parks, pitchers, batters, etc.

The methodology is a little more complicated than that, but hopefully you get the general idea. Anyway, the same thing can be done with first basemen in order to estimate their “scooping” performance and ability. What I did was look at every ground ball that was thrown from each infielder to each first baseman. I put that ground ball into one of two buckets: One, when there was no error on the throw, or two, when there was an error on the throw. The assumption is that when a throwing error is made, there is an errant throw (no duh) and the first baseman is not able to somehow coax that bad throw into an out by scooping it out of the dirt, jumping in the air, catching it while off the bag and still making the play, etc. Obviously most of the time when a throwing error occurs, there is nothing that any first baseman can do about it. However, in the long run, we can assume that a certain fixed percentage of bad throws from each infield position will always result in an error while another fixed percentage of those bad throws have a chance to be “saved” by a skilled (or tall) first baseman.

We can also assume that when an error is not made, that sometimes a bad throw occurs and the first baseman “saves the day.” Again, most outs and non-errors occur on easily catchable throws, but a certain fixed percentage of them will occur on bad throws that are skillfully and successfully handled by the first baseman.

Given these assumptions (which are true, by the way), if we look at all throws by a particular player at a particular position to a certain first baseman and then compare the results (error or no error) to when throws are made from the same infielder at the same position to all other first basemen, the difference can be attributed to the “scooping skill” of that particular first baseman.

An example:

Say that all infielders threw to Todd Helton 1000 times in 2007. And say that those exact same infielders threw to some other first baseman (and we are going to assume that all of these “other” first basemen are average, collectively) around 1000 times also (it does not really matter how many throws went to these other first basemen, although we would like it to be a lot). Now let’s say that 20 throwing errors were made on those throws to Helton and only 15 were made to all the other first basemen. We can safely say (with some uncertainty of course, due to sample error) that Helton was 5 plays per 1000 throws (about a full season actually) better than the other pool of first basemen, who we are assuming are average. So Todd is 5 plays, or around 4 runs, per season, above average at “scoops.”

In actuality what I did was to match up every player at every infield position (including pitcher and catcher) with every first basemen, and then for each, specific first baseman, I did a “with and without” and prorated or weighted that difference by the minimum of these two numbers – the throws made to the first baseman in question and the throws made to all other first basemen – by the same fielder.

For example, let’s say that over the sample time period, Edgar Renteria threw 300 balls to Albert Pujols (by the way, I used all data from 2000 to 2008) and he made 6 throwing errors. Now let’s say the Renteria also threw 800 balls to other first basemen (on the Cardinals or any other team he played for) and made20 errors. The difference in error rate is .5 errors per 100 throws in favor of Albert. Thus we give him credit so far for .5 errors per 100, weighted by 300 throws (the lesser of 300 and 500). We do that for every fielder who threw to Albert at least 20 times and 20 times to other first basemen, and then we add everything up to get a weighted average. This weighted average represents a first baseman’s “scooping” skill as compared to an average first baseman, or at least as compared to the average first baseman in that player’s “matched pool” of first basemen.

That last part of the last sentence is the primary flaw in the methodology. Since I am only comparing each first basemen to all other first baseman who had the same fielders throwing to them, it is possible, and in some cases, inevitable, for one first baseman to be compared to a pool of primarily good or bad first basemen, since each one will tend to be compared with the others on his own team. For example, Helton will tend to be compared with whomever else manned first base for the Rockies when he was not on the field. If those few backup first basemen happened to be particularly bad at “scooping” balls, then Helton will be overrated by this methodology. In fact, because of this flaw, and because regular first basemen tend to be better at “scooping” than backup first basemen, any regular will tend to be overrated (because they are often compared to a backup) and any backup will tend to be underrated (because they are often compared to a regular). Keep in mind, for example, that because some of Helton’s infielder teammates played on other teams, he is not always going to be compared to another Rockies’ first baseman. Anyway, for now, it looks like we are going to have to live with this flaw or weakness in the system. The correct way to handle the data, of course, is to adjust each first baseman’s rating by that of his “others” by doing an iterative process. In the future I may do this, but for now, we are stuck with version 1.0.

Here are the results:

Again, I used play-by-play data from 2000 to 2008.

Interestingly, and not unexpectedly, there were significant across-the-board differences in the scooping ability of the average tall and short, lefty and righty, and regular and backup first baseman.

Tall (>6’1”) RH: .6 runs per 1000 matched throws, or around a full season.
Tall LH: 1.2 runs

Short RH: -.8 runs
Short LH: .6 runs

Less than 300 matched throws: -1.5 runs
300 to 1000 throws: .4 runs
More than 1000 throws: .6 runs

Players with at least 1000 matched throws total where the minimum of the two pair for each fielder is added to the total:

Best per 1000 throws

Berkman +4 runs, 2928 matched throws
Choi +4, 1361
Conine +3, 3100
Connor Jackson +3, 1790
Loney +3, 1679
Mientkewicz +3, 4344
D Ward +3, 1119
Olerud +2.5, 4481
Sexson +2.5, 6471
Tony Clark +2.5, 2880
Dan Johnson +2.5, 1774
Overbay +2.5, 4834

Worst

Karros -4, 2986
Mo Vaughn -4, 1644
Galaraga -3, 1842
Stairs -3, 1122
Bagwell -2.5, 4931
Casey -2.5, 6444
Julio Franco -2.5, 2010
Swisher -2.5, 1134
Thome -2.5, 4476

When we regress these numbers in order to turn them into “true talent” scooping ability, we get slightly smaller values, depending on the number of matched throws (sample size) of course. In fact, it appears that the spread in talent between the best and worst “scoopers” at first base is on the order of 2-3 runs, plus or minus (a 4-6 run spread). So before you start opining about how your favorite first baseman is so great defensively because he “saves so many errors,” consider that scooping ability is probably worth less than a ¼ of total defensive ability or value at first base. Fielding grounders is at least 75% of the package and “scooping” is the rest. But every little bit helps.


Stealing on Catchers (Not Named Molina)

I spent the last week on my newest tool, which analyzes play by play in order to rate catchers on their throwing.

The task is to seperate the catcher’s ability to throw out base stealers from that of the pitchers they are teamed with. My initial table, extracted from the RetroSheet events for 2003-2008, contains IDs for the catcher, pitcher, baserunner, the hand of the pitcher and batter, natural or artificial turf, and the the number of steal opportunities and total of each type of result for each combination of these factors. For each catcher, for each season and base, there is how he did with each pitcher (the observed values). In a WOWY fashion, that is compared to the results for each of those pitchers, over the past six seasons, with every other catcher (the expected values). The sum of all players forms the mean values. Using a variation of the Odds Ratio I’ve called the Inverse James Function, I then calculate what true talent level would give us that observed value, given the expected and the mean.

The mean CS% for the past six seasons is .243. If a player’s observed value is the same as his expected, then his normal value is equal to the league mean. If the observed is higher (or lower) than expected, then the normal will be higher (or lower) than the mean, with the normal value limited to between 0 and 1.

Check out the top 5. What’s on that gene pool? Bengie, who’s always been a little better than average, had a great season despite adverse expectations, and jumped to the head of the list ahead of his brothers. Jason Kendall has been on a roller coaster, being average, average, poor, average, poor and very good the past six years. Despite the fluctations, averaging all that together, he projects as league average for next year. Most of the values are consistent from year to year.

Henry Blanco is the “career” leader of the six year period, with a normal CS% of .445, and a runs allowed above average (RAA) of 10.1 per 1800 base stealing opportunities. He’s followed by Yadier Molina at .442 and Gerald Laird at .406. The worst rates are Gary Bennett .139, Michael Barrett .152 and Mike Piazza .160, with Piazza having the worst R/1800 at -12.1. At age 36, Blanco has shown no signs of slowing down, having a normal CS% over .500 in 3 of the past 4 seasons.

On the other hand, Jason Varitek and Brad Ausmus, two catchers with a past record of defensive accolades, have both shown sharp downward trends. Varitek’s normal CS% has been .419, .255, .211, .160, .139 and .118, dropping from very good to very bad. Ausmus similarly has marks of .308, .222, .180, .169, .103 and .136.

Despite four consecutive years of poor throwing, the rate of steal attempts versus Ausmus has been at or below average, likely based on his past reputation and the ability of his pitchers to hold runners. Ivan Rodriguez, who in the 1990’s routinely had observed CS% of over .500, has the past two seasons only been slightly better than average at .273 and .275. Although the rate attempt rate against Rodriguez has risen to .040 (average=.047), over the last six years he has the lowest rate at .029, followed by Rod Barajas at .036, Joe Maurer .037, Toby Hall .038, Chris Snyder .038 and Ausmus .040. The catchers run against most often have been Mike Piazza .074, Brandon Inge .066, Victor Martinez .060, Paul LoDuca .058 and Michael Barrett .055.

Read the rest of this entry »


Historical Position Adjustments

Here on Fangraphs there have been a lot of posts on the relative value of different positions. I agree with the idea that position adjustments should be based on relative defensive skill and player scarcity (as the infield positions draw talent from a smaller pool; lefthanders can’t play there) as opposed to the inverse of batting production.

Using my TotalZone defensive data (based on retrosheet play by play files) I looked at defensive differentials by decade, from the 1950’s (actually starting with 1953) to the 2000’s. I look at players who played two different positions in the same year, and aggregate all such player-seasons grouped by decade. With grouping by decade, I’ve increased the sample size of players. I’m also limiting my sample size by only looking at players who played multiple positions in one season, but this is necessary to avoid distortion caused by aging – such as a player who was a shortstop in 1961 and a third baseman in 1968. I’m leaving out catchers and first basemen and focusing on the relative value of outfielders, second and third basemen, and shortstops.

For the most recent time period, I use Tango Tiger’s position adjustments, which are also used here on Fangraphs on the player valuation sections. They are:

SS +7.5
2B, 3B, CF +2.5
LF, RF –7.5

The TotalZone data is reasonably close to this. Center fielders are 8.7 runs per full season better than corner outfielders. Shortstops are 4.7 runs better than third basemen and 4.2 runs better than second basemen. Second basemen are 1.4 runs better than third basemen. The results are not identical, but close enough that I don’t see value in arguing about them. You could make 2B +3 and 3B +2, and still be in balance, but it’s only half a run.

Tango’s adjustments show an average gap of 8.3 runs between the three infield positions and the 3 outfield positions. TotalZone shows only a 3.2 run difference when looking at players who played both infield and outfield in the same season. Are we overvaluing infielders?

There are two problems here. One is handedness, all players can potentially play the outfield, but only right-handed throwers play the infield. In addition, movement between the positions is almost entirely one-way. Teams have no trouble taking an infielder and asking him to play the outfield. Some examples off the top of my head are Jerry Hairston, Willie Bloomquist, Ryan Freel, and Chone Figgins. How many outfielders are sent to play second, third, and short? Very, very few, mostly in emergency situations, such as when multiple players get hurt or the game goes deep into extra innings. I’m not forgetting Juan Rivera’s second base appearance last year, but I doubt the Angels plan on him doing it again.

I think that in the case of infielders vs. outfielders it is appropriate to look at the relative offense at the positions. I don’t think this is appropriate to compare center fielders to left fielders, if center fielders outhit left fielders and outfield them, then they are better players, period, and we should not artificially set the positions as equal in value. For 2000 to 2008, outfielders hit better than infielders to the tune of 11.3 runs per season. Average this with the observed defensive difference and we get 7.3 runs, just one off from what Tango’s adjustments imply. For previous decades, I will use the average offensive and defensive adjustments between the infield and outfield groups. Within the groups, my adjustments will be completely based on the defensive differences.

1990’s:

The gap between infield and outfield is very similar- 12.7 on offense, 3.5 on defense, an average of 8.1. Center fielders were 10.6 runs better than corner outfielders. So far, it looks pretty similar. Shortstops, however, were 7.6 runs better than third basemen and 6.3 runs better than second basemen. There was only a 0.2 gap between second and third, in favor of the second basemen. For this decade, I’ll change things a bit by giving more credit to the shortstops. The 1990s adjustments:

SS +9
2B +2
3B +1.5
CF +2.5
RF, LF –7.5

The 1980’s

Here we have a larger gap between infield and outfield, 15.8 on offense, 5.1 on defense, for a 10.5 average. There is only a 7.5 run gap between corner outfielders and center, and this should not be a surprise. In recent years we’ve seen Adam Dunn, Manny Ramirez, Raul Ibanez, Josh Willingham, Carlos Lee, Hideki Matsui, Pat Burrell, and Jack Cust play left field. That lumbering group represents more than 25% of starting left fielders. In the 1980’s we had Rickey!, Tim Raines, Vince Coleman, Willie Wilson, and Gary Redus playing left field. I’m sure there were some terrible left fielders back then as well, and guys like Carl Crawford, Eric Byrnes, and Dave Roberts buck the trend, but I think that on average the left fielder of 1985 was faster than his 2005 counterpart.

Shortstops were 6.6 runs better than third basemen and 3.3 better than second basemen. Second basemen were 4.7 runs ahead of third basemen.

For the 1980’s I use these adjustments:

SS + 9, 2B +5, 3B +1
CF +0, RF, LF –7.5

The 1970’s

The infield/outfield gap keeps getting bigger as we go back in time, for this decade it’s 20.2 on offense and 8.6 on defense. Center fielders had only a 5.6 run advantage on the corners. Second basemen were 7.6 runs worse than shortstops, but 3.4 run better than third basemen. So if we add that up, shortstops must have been 10 or more runs better than third basemen, right?

If only it was that easy. In fact, players who played both third and short in the 1970’s were 1.1 runs worse as third basemen. Sometimes the pieces of this data puzzle do not fit very well together. In every other decade, shortstops were at least 4.7 runs better than third basemen and at least 6.6 runs excluding the 2000’s. For the 1970’s the other pieces – shortstop to 2B, 2B to 3B, show the normal pattern. Chalk this one up to a fluke, and I’ll try to make the adjustments make sense.

The adjustments:

SS +10
2B, 3B +4
CF –2
LF, RF –8

The 1960’s

The infield/outfield gap was the same as the 1970’s on offense, 20.2 runs, and 3.7 runs on defense. Center fielders were 8.7 runs ahead of corner outfielders. Shortstops had a 7.2 run advantage on third and 5.1 on second. 2B and 3B were essentially even with a 0.2 run difference (2B slightly better). Since those two were equal, I kept them as equals in the adjustments, and averaged their gap with shortstop to make the shortstops 6 runs better than other infielders.

The adjustments:

SS +10
2B, 3B +4
CF +0
RF, LF -9

And finally the 1950’s

Outfielders out-hit infielders by 19.9 runs, while infielders had an 8.7 run defensive advantage. Center fielders were 7.2 runs better than corner outfielders. Shortstops were 7.5 runs ahead of third base, and 5.3 runs ahead of second. Those last two figures are similar to the 1960’s, but since second baseman had a 3 run advantage on third base, I changed their relative value a bit in the adjustments:

SS +10
2B +5
3B +2
CF –1
LF, RF –8

Using these position adjustments will lead to conclusions that not all positions are created equal. Sometimes we might find that center fielders hit as well or better than corner outfielders. Or else shortstops hit the same or equal as second basemen. Their defensive value is still superior, and in such situations we’ll find that one position has a better collection of baseball talent than another has. Most likely you’ll find that teams recognize this, and pay the positions differently as well.


The Great Derek Jeter Conspiracy

or Part III of ‘Things Aren’t Always as They Appear”.

How is it that Derek Jeter can win three consecutive Gold Glove awards (2004-2006) for being the best defensive shortstop in the American League, but virtually every saber fielding metric rates him among the worst?

The image of a fielder standing over a muffed grounder as the batter crosses first is easily burned into our memory. It’s the avoidance of not only the errors but also infield hits that have impressed us as to who the Golden Glovers should be. Over past six seasons (2003-2008), the most recent period for which RetroSheet has complete batted ball information, Jeter is third at 91.7% among major league shortstops in “sure handedness”, the percentage on infield grounders where an out is recorded. The top spot is held by Omar Vizquel at 92.2%, and Vizquel has won two Gold Gloves during that period, and nine more earlier in his career. Second is Alex Rodriguez at 91.9%, with two Gold Gloves, and fourth is Cesar Izturis at 91.2%, with one award. Eight of the last twelve Gold Gloves at shortstop have gone to the four players with the highest rate of converting ground balls.

However, making outs on the balls you get to is not nearly the total measure of an infielder’s range. While it is easy to remember the booted grounder, it seems that we don’t mentally catalogue how many extra grounders make their way to the outfield for a hit. This is where Jeter falls down.

I counted the number ground ball hits to each outfield position, along with the fielder at each of the four infield spots and the handedness of the batter. I assigned which infielder was responsible for each hit based on the ratio of infield grounders to each position, based on bat hand. It’s an estimate, and it can be improved by adding vector data that is available from GameDay, but even the preliminary results match very well to who is expected to be in the top, middle and bottom.

The player with the highest rate of grounders kept in the infield is Adam Everett at 83.5%, while the worst is Ramon Vazquez at 76.5%. Jeter is next to last at 77.3%. No other shortstop today has such a wide divergence of the highly visible “hands” and the nearly invisible “range” as Jeter.

Let’s say we are designing a table top baseball game (that’s what we played before PCs were invented), and then let’s rate the shortstops on their range. 76.5% of groundballs to short are always outs, 16.5% are always hits. That leaves 7.0% to be contested. For those, we have to roll a 20-sided die. Vazquez is a 0, Everett is a 20, Jeter is a 2. If we roll a 1 or a 2, Jeter gets to the ball – anything from 3 to 20, it goes to the outfield. The difference from best to worst, over a full season, is about 40 hits.

There’s another problem. 6 of the top 11 in “hands” are also in the bottom 16 in “range”. If a player doesn’t get to that many balls, the ones he does get to are likely closer and thus easier to field. This is a bias in the “hands” rating, as those players with less range will have a higher expected value on the balls they do get to. Therefore, players with a high “hands” rating combined with low “range” (Jeter, A-Rod, Keppinger, Betancourt) likely don’t really rate as high, because their expected rate is likely closer to their observed. I will account for this when I process the GameDay vector data.

What really counts is when the ball is hit, does the fielder make an out? That’s the definition of Defense Efficiency Rating (DER) on a team level. Whether it’s by range, throwing arm or good hands, it’s the out that counts. With 1000 or more ground balls, the bottom five at shortstop are Angel Berroa 71.1%, Michael Young 71.0%, Jeter 70.9%, Felipe Lopez 70.2% and Carlos Guillen 69.8%. At the top are Adam Everett 75.7%, Omar Vizquel 74.9%, Troy Tulowitzki 74.3%, Julio Lugo 74.1% and Khalil Greene 74.1%.

Don’t let your eyes fool you.


Using Leaderboard Splits

Did you know that FanGraphs leaderboards featured splits? It’s a feature I feel goes underutilized, and thus I wanted to devote a post to it. There’s a ton of interesting tidbits you can pull from using the split options, and while the “month” options are interesting, the “Past 3 Calendar Years” filter has my eye. Here’s some of what I’ve learned from using it.

Albert Pujols pretty much owns at everything. Pujols leads all qualifiers in WPA (21.04) while only Lance Berkman (16.22) sits over 15, and only two handful of others sit above 10. The anti-Albert is Ivan Rodriguez (-5.3), fellow senior citizen Omar Vizquel (-4.58) sits in “second”, fellow “pudge” Yuniesky Betancourt (-4.09), and Jeff Francouer (-4.06) round out players with -4 or less WPA.

Pujols also leads the league in wOBA (.442) and wRAA, and wRC, and basically everything that matters. Chipper Jones (.438) is second and the rest of the top five includes David Oritz (.438), Alex Rodriguez (.421), and Matt Holliday (.420). Following our theme of the good and the bad, the bottom five are Omar Vizquel (.295), Jason Kendall (.298), Pedro Feliz (.303), Yuniesky Betancourt (.307), and Khalil Greene (.308).

Some other offensive stats:

BB%
Best: Pat Burrell (17.7%), Adam Dunn (17.2%), Jim Thome (17.1%), Ortiz (16.5%), Todd Helton (16.1%)
Worst: Betancourt (2.9%), Bengie Molina (3.6%), I. Rodriguez (3.8%), Jose Lopez (4%), A.J. Pierzynski (4.2%)

K%
Best: Juan Pierre (5.7%), Placido Polanco (6.2%), Betancourt (8.9%), Kendall (9.1%), B. Molina (9.4%)
Worst: Ryan Howard (33.5%), Dunn (32.7%), Thome (30%), Bill Hall (29.6%), Mike Cameron (28.3%)

BABIP
Best: Derek Jeter (.367), Holliday (.365), C. Jones (.361), Ichiro Suzuki (.359), Miguel Cabrera (.353)
Worst: Andruw Jones (.255), Pedro Feliz (.261), Ken Griffey Jr. (.272), Greene (.273), Kevin Millar (.275)

LD%
Best: Mark Loretta (25.3%), Michael Young (24.8%), Freddy Sanchez (24.7%), Helton (23.7%), Chone Figgins (23.2%)
Worst: Gary Matthews Jr. (15.9%), Feliz (16%), Dan Uggla (16.1%), Luis Castillo (16.2%), Jason Bay (16.3%)

O-Swing%
Best: Castillo (12.9%), Brian Giles (14.8%), Bobby Abreu (15.7%), Troy Glaus (15.9%), C. Jones (16.2%)
Worst: Vladimir Guerrero (43.9%), Pierzynski (39.1%), I. Rodriguez (38.5%), Alfonso Soriano (37.3%), B. Molina (36.7%)

Contact%
Best: Pierre (94.5%), Castillo (94%), Polanco (93.6%), B. Giles (92.7%), Vizquel (91.9%)
Worst: Howard (66.4%), Thome (71.3%), Dunn (71.4%), A. Jones (72.1%), Brad Hawpe (72.3%)

Dollars
Best: Pujols ($100), Chase Utley ($93.1), David Wright ($84.5), Grady Sizemore ($81.3), Alex Rodriguez ($81.2)
Worst: Griffey Jr. ($0.8), Melky Cabrera ($1.8), Millar ($3.1), Jose Bautista ($3.5), Craig Biggio ($3.7)

Try using the leaderboards yourself to look at the last three years of defensive and pitching performances.