It’s almost opening day, and it seems like everyone is talking about projections.
When considering a projection, there are really two questions to be answered – what is the player’s “True Talent Level” right now, and how will he perform next year? Between now and the end of next year, his talent level very well might change, as he’s a year older and might recover from or succumb to injuries. Even then, there’s still the random variance of a single season performance. In this article I’d like to explore how some of the major projection systems work when predicting different subgroups of players.
I tested the following projections: PECOTA (2006-2009), ZiPS (2006-2009) CHONE (2007-2009) and my own Oliver (2006-2009).
The first test was to group the yearly projections to the nearest .010 of wOBA, and then see how that group of players actually performed. There were 468 players who had projections from all four systems, and had at least 350 plate appearances in the major leagues in the following season. As 2009 is yet to be played, and CHONE is not available for 2006, these projections to next year comparisons are for the 2007 and 2008 seasons. All four projections were tested on the same 468 players. The observed results were unadjusted major league stats, so that the results of the test would not be influenced by which park factors or MLE formulas I chose to normalize stats.
To read the results, CHONE of the players would have a wOBA between .375 and .385, averaging .380, 25 of them had 350 or more PAs in MLB in the following seasons, and those 25 players had an average wOBA of .363, so at that level CHONE was .017 high. Oliver was .008 high on 21 projections, PECOTA .027 high on 26, and ZiPS .014 on 26. The last line of the table shows the root mean square error (weighted by number of players). Oliver had the lowest mean error at .006, followed by CHONE .011 and PECOTA and ZiPS at .012 each.
Read the rest of this entry »
FanGraphs is now presenting five different player projections systems: Bill James, ZiPS (Dan Szymborski), CHONE (Sean Smith), Marcel (Tom Tango) and Oliver (Brian Cartwright). The natural question from you the readers is “Just how good is this projection?”
First, we need to understand the importance of sample size. Season statistics are just a sample of a player’s true talent. You might catch a player during a hot or cold streak, and without enough data, be misled into forming an incorrect perception of the player. On one hand, the more data, the better. On the other, we’re trying to capture a moving target. By the time we get a sufficient sample, the player’s true ability might well have changed.
My first test for a sufficient sample size was to compare wOBAs in all consecutive single seasons of unadjusted major league batting stats in my projections database, weighted by the smaller of the two plate appearances. I could presumably get more accurate results by normalizing with park factors, but I did not want to bias these results by using any of my proprietary formulas.
Other than stolen bases, which I addressed a few weeks ago, very little has been published on catcher’s fielding numbers. Tom Tango first conceived his WOWY technique in studying catchers. Now I’ve extended my stolen base study back to the beginning of the current RetroSheet in 1953, and added the rates of wild pitches and passed balls allowed back to the same date. It should put a smile on Tango’s face that Gary Carter of his beloved Montreal Expos rates third in career SB_RAA behind Ivan Rodriguez and Jim Sundberg, and fourth in career WP_RAA behind Bill Freehan, Bruce Benedict and Brad Ausmus, and second overall behind only Pudge, along with the best single season of +28.2 in 1983…the worst, Dick Dietz, -18.6 in 1970.
I had earlier included groundballs to catchers when I ran my infield defense. There just aren’t that many grounders fielded by catchers – the most in any one season over the past sxi years was 74 by Jason Kendall in 2006. Single season RAA on grounders ranges from Jason Phillips’ +1.1 in 2004 to Mike Lieberthal‘s -2.4 in 2003. Totals for the last six years range from Carlos Ruiz’s +2.4 to Lieberthal’s -3.4. (I don’t yet have a groundball table built for seasons before 2003).
The process is the same as I descrobed in the previous article on stolen bases. I queried RetroSheet’s events table, creating a new table of every combination of catcher and pitcher in each year, how many batters were faced with runners on base, and how many wild pitches and passed balls occured. A total was made of each catcher’s stats in each year (the “with” part) and also the stats of each pitcher he caught, while working with any other catcher (the “without”). These were weighted to the smaller of the sample sizes, and then summed into season and career totals.
The single best season for preventing wild pitches and passed balls, since 1953, was Bill Freehan of the Tigers in 1971. The pitchers he caught that year would have been expected to throw 62 wild pitches and 20 passed balls in Freehan’s playing time, but he only allowed 31 wild pitches and 7 passed balls to get by hum, saving an estimated 12.6 runs that season. His total allowed of 38 was 46% of the expected 82. Freehan had the highest career RAA of +52.0, while Jorge Posada had the lowest at -38.2.
On the other end is one of America’s favorites, who not only couldn’t hit, but apparently couldn’t catch either, Bob Uecker. In 1967, appropriately his last in the majors, in which Uecker split time between the Phillies and Braves, in only 80 games played he allowed 40 wild pitches and 25 passed balls, 222% above his expected totals of 18 and 12.
The major league average is .016 wild pitches and .004 passed balls per plate appearance with a runner on base. The best career normalized wild pitch rates go to Bruce Benedict, Yogi Berra and Mike Redmond at .010; Brian Downing, Del Crandall and Jason Varitek at .011; and Rod Barajas, Manny Sanguillen, Bill Freehan, Kirt Manwaring, Sherm Lollar and Steve Yeager at .012. The worst wild pitch rates are Earl Battey at .021; Junior Ortiz and Mike Macfarlane at .021; and Miguel Olivo, Johnny Roseboro, Tim Laudner, Jorge Posada, Pat Borders, Thurman Munson, Hal Smith, Darrell Porter and George Mitterwald at .020.
The lowest normalized passed ball rates were Brian Downing, Charlie O’Brien, Bruce Benedict, Dan Wilson, Yogi Berra, Brad Ausmus, Del Crandall, Sherm Lollar and Ron Karkovice at .002, with the worst being Miguel Olivo and Bob Brenly at .008; and Joe Azcue, Jorge Posada, Earl Battey and Lance Parrish at .007.
The top 5 ratios of reducing both are Bruce Benedict 56%, Yogi Berra 59, Brian Downing 60 and Mike Redmond and Del Crandall 64% each. The worst were Bob Brenly 142%, Earl Battey 140, Miguel Olivo 140, Jorge Posada 132, and Junior Ortiz and Mike Macfarlane 129% each.
In 2008, the best at runs saved blocking the plate were Kurt Suzuki +6.9, Kenji Johjima +5.8, Brian McCann +5.2, Ramon Hernandez +5.1 and Jason Varitek +4.1, while the worst were Miguel Montero -3.0, Miguel Olivo -2.8, Kevin Cash -2.7, Greg Zaun -2.7 and Jesus Flores -2.6. In case you were thinking that one year might be a small sample size for some of these backup catchers, Montero, Olivo and Flores are also among the five worst career rates for active catchers, along with Mike Rivera and Jorge Posada.
Career WP&PB Records
Yearly WP&PB Records
After a leak of the results of MLB’s 2003 anonymous survey testing for performance enhancing drugs, Alex Rodriguez admitted in an interview with ESPN’s Peter Gammons using them from 2001 to 2003, the three years he played for Texas.
“When I arrived in Texas in 2001, I felt an enormous amount of pressure, felt all the weight of the world on top of me to perform and perform at a high level every day.”…When asked if his usage took place from 2001-2003, Rodriguez said, “That’s pretty accurate.”
Rangers owner Tom Hicks, who took over the team in 1998, was shocked by Rodriguez’s admission.
“I certainly don’t believe that if he’s now admitting that he started using when he came to the Texas Rangers, why should I believe that it didn’t start before he came to the Texas Rangers
If ARod started using PEDs in 2001, then they had no effect, as his three years in Texas are statistically indistinguishable from his previous two years in Seattle. His jump in performance was between the 1998 and 1999 seasons.
I spent the last week on my newest tool, which analyzes play by play in order to rate catchers on their throwing.
The task is to seperate the catcher’s ability to throw out base stealers from that of the pitchers they are teamed with. My initial table, extracted from the RetroSheet events for 2003-2008, contains IDs for the catcher, pitcher, baserunner, the hand of the pitcher and batter, natural or artificial turf, and the the number of steal opportunities and total of each type of result for each combination of these factors. For each catcher, for each season and base, there is how he did with each pitcher (the observed values). In a WOWY fashion, that is compared to the results for each of those pitchers, over the past six seasons, with every other catcher (the expected values). The sum of all players forms the mean values. Using a variation of the Odds Ratio I’ve called the Inverse James Function, I then calculate what true talent level would give us that observed value, given the expected and the mean.
The mean CS% for the past six seasons is .243. If a player’s observed value is the same as his expected, then his normal value is equal to the league mean. If the observed is higher (or lower) than expected, then the normal will be higher (or lower) than the mean, with the normal value limited to between 0 and 1.
Check out the top 5. What’s on that gene pool? Bengie, who’s always been a little better than average, had a great season despite adverse expectations, and jumped to the head of the list ahead of his brothers. Jason Kendall has been on a roller coaster, being average, average, poor, average, poor and very good the past six years. Despite the fluctations, averaging all that together, he projects as league average for next year. Most of the values are consistent from year to year.
Henry Blanco is the “career” leader of the six year period, with a normal CS% of .445, and a runs allowed above average (RAA) of 10.1 per 1800 base stealing opportunities. He’s followed by Yadier Molina at .442 and Gerald Laird at .406. The worst rates are Gary Bennett .139, Michael Barrett .152 and Mike Piazza .160, with Piazza having the worst R/1800 at -12.1. At age 36, Blanco has shown no signs of slowing down, having a normal CS% over .500 in 3 of the past 4 seasons.
On the other hand, Jason Varitek and Brad Ausmus, two catchers with a past record of defensive accolades, have both shown sharp downward trends. Varitek’s normal CS% has been .419, .255, .211, .160, .139 and .118, dropping from very good to very bad. Ausmus similarly has marks of .308, .222, .180, .169, .103 and .136.
Despite four consecutive years of poor throwing, the rate of steal attempts versus Ausmus has been at or below average, likely based on his past reputation and the ability of his pitchers to hold runners. Ivan Rodriguez, who in the 1990’s routinely had observed CS% of over .500, has the past two seasons only been slightly better than average at .273 and .275. Although the rate attempt rate against Rodriguez has risen to .040 (average=.047), over the last six years he has the lowest rate at .029, followed by Rod Barajas at .036, Joe Maurer .037, Toby Hall .038, Chris Snyder .038 and Ausmus .040. The catchers run against most often have been Mike Piazza .074, Brandon Inge .066, Victor Martinez .060, Paul LoDuca .058 and Michael Barrett .055.
or Part III of ‘Things Aren’t Always as They Appear”.
How is it that Derek Jeter can win three consecutive Gold Glove awards (2004-2006) for being the best defensive shortstop in the American League, but virtually every saber fielding metric rates him among the worst?
The image of a fielder standing over a muffed grounder as the batter crosses first is easily burned into our memory. It’s the avoidance of not only the errors but also infield hits that have impressed us as to who the Golden Glovers should be. Over past six seasons (2003-2008), the most recent period for which RetroSheet has complete batted ball information, Jeter is third at 91.7% among major league shortstops in “sure handedness”, the percentage on infield grounders where an out is recorded. The top spot is held by Omar Vizquel at 92.2%, and Vizquel has won two Gold Gloves during that period, and nine more earlier in his career. Second is Alex Rodriguez at 91.9%, with two Gold Gloves, and fourth is Cesar Izturis at 91.2%, with one award. Eight of the last twelve Gold Gloves at shortstop have gone to the four players with the highest rate of converting ground balls.
However, making outs on the balls you get to is not nearly the total measure of an infielder’s range. While it is easy to remember the booted grounder, it seems that we don’t mentally catalogue how many extra grounders make their way to the outfield for a hit. This is where Jeter falls down.
I counted the number ground ball hits to each outfield position, along with the fielder at each of the four infield spots and the handedness of the batter. I assigned which infielder was responsible for each hit based on the ratio of infield grounders to each position, based on bat hand. It’s an estimate, and it can be improved by adding vector data that is available from GameDay, but even the preliminary results match very well to who is expected to be in the top, middle and bottom.
The player with the highest rate of grounders kept in the infield is Adam Everett at 83.5%, while the worst is Ramon Vazquez at 76.5%. Jeter is next to last at 77.3%. No other shortstop today has such a wide divergence of the highly visible “hands” and the nearly invisible “range” as Jeter.
Let’s say we are designing a table top baseball game (that’s what we played before PCs were invented), and then let’s rate the shortstops on their range. 76.5% of groundballs to short are always outs, 16.5% are always hits. That leaves 7.0% to be contested. For those, we have to roll a 20-sided die. Vazquez is a 0, Everett is a 20, Jeter is a 2. If we roll a 1 or a 2, Jeter gets to the ball – anything from 3 to 20, it goes to the outfield. The difference from best to worst, over a full season, is about 40 hits.
There’s another problem. 6 of the top 11 in “hands” are also in the bottom 16 in “range”. If a player doesn’t get to that many balls, the ones he does get to are likely closer and thus easier to field. This is a bias in the “hands” rating, as those players with less range will have a higher expected value on the balls they do get to. Therefore, players with a high “hands” rating combined with low “range” (Jeter, A-Rod, Keppinger, Betancourt) likely don’t really rate as high, because their expected rate is likely closer to their observed. I will account for this when I process the GameDay vector data.
What really counts is when the ball is hit, does the fielder make an out? That’s the definition of Defense Efficiency Rating (DER) on a team level. Whether it’s by range, throwing arm or good hands, it’s the out that counts. With 1000 or more ground balls, the bottom five at shortstop are Angel Berroa 71.1%, Michael Young 71.0%, Jeter 70.9%, Felipe Lopez 70.2% and Carlos Guillen 69.8%. At the top are Adam Everett 75.7%, Omar Vizquel 74.9%, Troy Tulowitzki 74.3%, Julio Lugo 74.1% and Khalil Greene 74.1%.
Don’t let your eyes fool you.
This is my first post at FanGraphs, and I would like to thank David Appelman for inviting me onboard. I have previously written for Seamheads.com and StatSpeak.net, and frequent “The Book” blog. If you’d like to know some more about my background, check out this article I wrote a few months ago.
Today I am going to start off by climbing up on my soapbox to address one of my pet peeves, the use of Line Drive rates as a predictor for Batting Average on Balls in Play (BABIP). The standard practice is to estimate BABIP by LD/Balls in Play + .12. It is claimed that LD rateas are more stable than BABIP from year to year, and that when the actual observed BABIP varies from the predicted by a large margin, this indicates a future regression to the mean.
I’m in the process of updating my park factors for 2008, along with adding in 1999, 1955 and 1953 that the folks at RetroSheet have included in their most recent release. I’ve added a couple more categories, foul flies and line drives. Now, I’ve never heard anyone mention park factors when using LD rates, but in fact they are quite large. I might guess that there could different opinions of what is a line drive from one ballpak to another, or maybe it’s the air or the hitting background. I limited my LD factors to 2003-2008, when the RetroSheet data has complete information on whether a ball is a line drive, ground ball, fly ball or popup on every batted ball, including hits. In Arlington, a batter is 18% more likely to have a batted ball coded as a LD, which may have helped Milton Bradley to have the 2nd highest LD rate in 2008 – while in Minneapolis, it’s 20% less likely. Four of the lowest six LD rates belong to Michael Bourn, Geoff Blum, Ty Wigginton and Hunter Pence, and Minute Maid Park has the second lowest LD park factor at 0.82. This is not saying that Houston batters hit fewer line drives – it’s that Houston and it opponents both have 18% fewer balls scored as liners in Houston than they do on the road.
PARK_ID PARK_NAME First Last PAw LDf
PHI12 Veterans Stadium 2003 2003 4768 1.23
ARL02 Ballpark Arlington 2003 2008 26850 1.18
TOK01 Tokyo Dome 2004 2008 283 1.13
CIN09 Great American 2003 2008 28827 1.11
DEN02 Coors Field 2003 2008 29158 1.10
STL10 Busch Stadium III 2006 2008 13967 1.09
KAN06 Kauffman Stadium 2003 2008 27530 1.09
WAS11 Nationals Park 2008 2008 4790 1.09
TOR02 Rogers Centre 2003 2008 27513 1.08
SFO03 Phone Co Park 2003 2008 29439 1.07
MON02 Stade Olympique 2003 2004 7684 1.07
STL09 Busch Stadium II 2003 2005 14280 1.06
STP01 Tropicana Field 2003 2008 27830 1.06
DET05 Comerica Park 2003 2008 28008 1.06
PHI13 Citizens Bank Park 2004 2008 24640 1.06
MIL06 Miller Park 2003 2008 29354 1.06
WAS10 RFK Stadium 2005 2007 14885 1.05
OAK01 Oakland Coliseum 2003 2008 26719 1.03
SEA03 Safeco Field 2003 2008 26683 1.01
CHI12 Comiskey Park II 2003 2008 28644 1.00
NYC16 Yankee Stadium 2003 2008 28722 1.00
MIA01 Dolphin Stadium 2003 2008 29849 1.00
CLE08 Jacobs Field 2003 2008 28136 0.99
BAL12 Camden Yards 2003 2008 29103 0.99
PIT08 P.N.C. Park 2003 2008 27652 0.98
PHO01 Bank One Ballpark 2003 2008 28810 0.98
SJU01 Hiram Bithorn 2003 2004 2598 0.98
SAN01 Jack Murphy 2003 2003 4943 0.98
LOS03 Dodger Stadium 2003 2008 29555 0.98
CHI11 Wrigley Field 2003 2008 28663 0.96
SAN02 PetCo Park 2004 2008 24432 0.95
NYC17 Shea Stadium 2003 2008 29299 0.92
BOS07 Fenway Park 2003 2008 28311 0.86
ATL02 Turner Field 2003 2008 29016 0.86
ANA01 Anaheim Stadium 2003 2008 26490 0.86
HOU03 Minute Maid Park 2003 2008 28271 0.82
MIN03 Metrodome 2003 2008 28048 0.80
Point Two – are line drives really more predictive? It’s said that if a player’s BABIP is not close to his LD+.12, that it’s becuse of luck, and this should be expected to correct itself next season. Expect the overachiever to come back to Earth.
For all the batters from 2003-2008, in non-bunt plate appearances, I added up the base hits, line drives, ground ball, fly balls and popups. I compared the predicted BABIP to the observed one in each season, which showed a root mean square (RMS) error of .045. Then I compared each years predicted value to the next years observed, and the RMS was .048 – slightly larger. For pitchers, the RMS was .039 in the same season, .039 in the next. I don’t see the evidence of future regression.
Complete line drive data is only available since 2003, and for a few seasns in the 1990s. In the seasons when it was not available, a “true talent level” of BABIP can be estimated by using a rolling weighted mean of past data, commonly referred to as Marcel. I used a seasonal weight of 0.7 – the most recent season is weighted at 1.00, the one before that at 0.70, two seasons back at 0.49, etc, each previous year 0.7 times the next. In this test, I did not use any regression to the league mean. The RMS of LD+.12 compared to the Marcel for the same season was .048 for batters, .046 for pitchers. The Marcel compared to the observed BABIP in the NEXT season was .041 for batters, .039 for pitchers. Historical BABIP data is better than the current season’s LD rate.
If LD data is available, so are GB, FB & PU. I tried a more complex model using .15*FB+.24*GB+.73*LD to estimate BABIP. This worked much abtter at reducing the mean errors, even surpassing historical BABIP. For batters, the yearly RMS came down from .048 to .036, for pitchers from .041 to .031.
Still, you can’t assume that every batter has the same rate of hits on their ground balls. Some batters hit more balls to the left side than the right, some run fast and some run slow. Instead of trying to profile each batter on each type of batted ball, I will continue to use Marcel to weight each batter’s historical BABIP in my projections.
On the other hand, DIPS theory states that a pitcher has little control over the outcome once a ball has been put into play. There is clearly an ability to be a flyball or groundball pitcher. Line drives are considered mistakes, and that may be evidenced ny looking at the six-year totals which show the lowest LD rates nelonging to Mariano Rivera, Fausto Carmona and Derek Lowe, while the highest belong to guys like John Van Benschoten, Edwin Jackson and Tony Armas Jr. Using the FB-FB-LD estimator on the six-year totals drops the pitchers RMS all the way down to .016.
Even so, some pitchers consistently defy the estimates. Roger Clemens, Brian Bannister, Chien-Ming Wang, Carlos Zambrano, Dan Haren, Brandon Webb, Chris Young and Greg Maddux all do at least .020 better than estimated. On the other end, Zach Duke, Sidney Ponson and Glendon Rusch all under perform by at least .020. Is it the ballpark? Is it their defense? The batters they faced? Or is it their own skill or lack of it?
Here’s my plan (I won’t have the answers next week) I want to compile park factors for each type of batted ball in each ballpark – what is the normalized rate of hits for flyballs to left in Dodger Stadium? Then do a WOWY analysis of fielders, showing the rate that each fielder allows more or fewer hits than expected on each groundball, flyball, linedrive and popup. Finally, each batter’s rates. Then go back and look at how many times each pitcher faced each batter, and with which fielders, and in which ballparks. Once those are controled, see how many hits, plus or minus, are left over for each pitcher.