Author Archive

Carlos Zambrano?

There’s been a ton of news about Carlos Zambrano’s contract recently, and how the Cubs may or may not give him a five year deal and how he deserves more money than Barry Zito, etc…. Now it’s true that Zambrano has been a rock in the ERA department the past 4 years, never posting an ERA above 3.50, not to mention he’s thrown 861 innings; the 5th most in baseball since 2003. So it seems that everything has gone swimmingly for Zambrano the past 4 years, with no cause for concern, right?

Well, his walks per 9 innings (BB/9) hasn’t exactly been the model of consistency that his ERA has been. Actually, it shot through the roof last year to a poor 4.84. You know who has a 4.84 BB/9? Daniel Cabrera in 2005. Oddly enough, they had similar strikeout rates (if you compare 2005 and 2006). Now, I’m not saying Carlos Zambrano is Daniel Cabrera because, well, Daniel Cabrera’s BB/9 hasn’t seen been on the right side of “4”, ever.

305_p_season_blog_2_20061001.png

But doesn’t it make you wonder where his control went and if this is something that’s going to rear it’s ugly head again in 2007? It’s not like he had an isolated spot in 2006 where he was completely awful and then returned to his usual 3-something BB/9. He was not so wonderful all season long.

305_p_daily_blog_2_20061001.png

I guess the question is, was it just an off season for walks, or is there something else going on? Perhaps he’s wearing down from his rather crazy workload? He has thrown over 14,000 pitches (the third most in baseball) since the 2003 season. He also has the second most 100 pitch games with 107 during that same time period. Only Barry Zito had more.

So, we have a sky-rocketing walk rate and a pretty incredible workload to tag to Carlos Zambrano going into the 2007 season. Maybe it’s nothing and he’ll post another sub-4 ERA with workhorse numbers, but I’d say these are two warning signs of a potentially disappointing 2007.


Minor League Leaders

As promised, the Minor League Leaderboards are up! It’s worth noting that instead of the actual minor league team, the leaderboards show the affiliated team in their respective league. Also, qualified batters are 2.7 plate appearances per game and pitchers 0.8 innings per game. Other than that, just pick a league and sort/filter away!

If you have any ideas or suggestions on how to make these better, just send us a note and we’ll throw it on the to-do list.


2006 Minor League Stats

After some painfully annoying work, we’ve managed to integrate the full set of 2006 Minor League stats into FanGraphs. Our plan is to eventually backfill minor league stats for all major league and minor league players. We’ll also have nightly updates for Minor League stats during the 2007 season.

Our plan is to have Minor League leaderboards (in the usual FanGraphs style) up and running by the end of the week. Additionally, we’ll be figuring out the best way to include the Minor League stats in the season graphs.

As always, please let us know if you have any problems, comments, or suggestions!


Last But Not Least, ZiPS

Dan Szymborski was kind enough to let us post the ZiPS projections he provides to Baseball Think Factory. Here’s a bit about ZiPS:

“Disclaimer: ZiPS projections are computer-based projections of performance. Performances have not been allocated to predicted playing time in the majors – many of the players listed above are unlikely to play in the majors at all in 2007. ZiPS is projecting equivalent production – a .240 ZiPS projection may end up being .280 in AAA or .300 in AA, for example. Whether or not a player will play is one of many non-statistical factors one has to take into account when predicting the future. “

Just like with the Bill James projections, we’ve used the Batting Average, On-Base Percentage, Slugging Percentage, Runs Created, Runs Created/27 and ERA supplied in the original file instead of calculating it ourselves.

I’m fairly certain this will be the last addition to the projections this season, which leaves us with a grand total of four projections (Bill James, CHONE, Marcels, and ZiPS) to choose from.

As always, if you notice any problems or errors, please let us know and we’ll do our best to fix the problem immediately.


Bill James Projections

In addition to the Marcel & CHONE projections, the projections from the Bill James Handbook are now available in the stats pages courtesy of Baseball Info Solutions. They will not be available in the leaderboard format on FanGraphs, but you can purchase them from Baseball Info Solutions here.

Just a quick note about how the projections are integrated: Batting Average, On-Base Percentage, Slugging Percentage, ERA, Runs Created, and Runs Created/27 were left as-is from the Bill James Handbook and were not calculated using the projected raw statistics; unlike the Marcel and CHONE projections.

This will probably be the last set of projections we add, unless someone else volunteers to throw their hat in the ring.


Stats Pages Updated

The 2007 Marcel & CHONE projections are now available in the regular stat pages. They will remain there until the 2007 regular season starts and will then be hidden to make way for the real 2007 stats. You’ll still be able to use the “Show Projections” button to see the stats after they are hidden; they just won’t be visible by default.

We’ve also added Balls and Strikes for batters, and corrected a bug in the daily split graphs.

Hopefully you won’t find the projections intrusive to your usual stat browsing. Please let us know what you think, especially if you don’t like it.


CHONE & Marcel Projections

Both Tom Tango and Chone Smith were both kind enough to let me post their 2007 player projections.

The Marcel (the Monkey) Forecasts are “the minimum level of competence that you should expect from any forecaster.” You can read exactly how they’re computed here: Marcel Methodology.

Chone Smith, on the other hand, has put a great deal of time and effort into his CHONE Projections. You can read more about his projection system and efforts here: Chone Projections.

There are two things I should make note of: I have used the FanGraphs positions for filtering by position where we have the player in our database. There are a number of minor league players in CHONE that aren’t in our database (yet) and for their positions we use those supplied in CHONE’s original file. And for the Marcels, we took the average of projected earned runs and base runs-earned runs and used that to display ER. This was how ERA was calculated in the original file anyway.

While I’m at it, there have been a few excellent discussions of projection systems recently:

The Hardball Times’ David Gassko did five part, Projection Roundtable, II, III, IV, V.

And Tom Tango recently asked “Who’s Smarter Than a Monkey?” followed by an insightful discussion in the comments.


Fun With BaseRuns

With a brand new (to me) historical database of all players, my project for the day was to calculate BaseRuns for all batters. BaseRuns models run creation, much like Bill James’ Runs Created, but BaseRuns is a more accurate model. As for the calculations, I decided to stick to David Smyth’s BaseRuns Primer. I used the “simple” version for seasons prior to 1955 and the more “complex” version for anything 1955 to the present. Here’s the more complex version, where BaseRuns = A*B/(B+C)+D

A = H + BB + HBP – HR – .5*IBB
B = [1.4*TB -.6*H -3*HR +.1*(BB+HBP-IBB) +.9*(SB-CS-GDP)] * X
C = AB – H + CS + GDP
D = HR

The quick and dirty version of what I did was, determine the B multiplier or X for each major league team by season, use BaseRuns to calculate the number of runs a team would have had without a particular player, and then subtract that from the actual runs the team had, to get that player’s BaseRuns.

To determine the B multiplier, I dug up my 8th grade algebra skills to solve the following equation for X: Runs = A * (B * X)/((B * X) + C) + D

X = ((Runs – D) * C) / B / (A – (Runs – D))

Hopefully, even with my rusty algebra skills, this was (and still is) correct. Now that I had my B multipliers (X), I could go ahead and calculate what teams would have done without a particular player and then finally get a players BaseRuns. So just for kicks, let’s look at a few lists:

Top 20 All Time:

Name                  BSR      RC
Babe Ruth             2638     2757
Ty Cobb               2534     2524
Cap Anson             2514     1794
Barry Bonds           2451     2791
Hank Aaron            2400     2553
Stan Musial           2382     2569
Willie Mays           2238     2369
Ted Williams          2231     2384
Tris Speaker          2208     2176
Lou Gehrig            2199     2264
Rickey Henderson      2166     2167
Pete Rose             2116     2220
Mel Ott               2104     2085
Jimmie Foxx           2072     2146
Honus Wagner          2064     1888
Carl Yastrzemski      2050     2147
Frank Robinson        2012     2127
Eddie Collins         1997     1799
Roger Connor          1949     1498
Rafael Palmeiro       1922     2040

Top 20 Seasons: All Time

Name                 Season    BSR     RC
Babe Ruth            1921      212    233
Hugh Duffy           1894      204    187
Tip O'Neill          1887      202    173
Babe Ruth            1923      199    216
Jimmie Foxx          1932      191    206
Babe Ruth            1920      190    206
Billy Hamilton       1894      188    148
Joe Kelley           1894      186    152
Lou Gehrig           1927      186    211
Lou Gehrig           1930      185    197
Lou Gehrig           1936      184    190
Babe Ruth            1927      183    203
Babe Ruth            1924      183    199
Lou Gehrig           1931      183    183
Babe Ruth            1931      182    185
Babe Ruth            1930      181    187
Rogers Hornsby       1922      180    206
Rogers Hornsby       1929      178    188
Ted Williams         1949      177    180
Jimmie Foxx          1938      175    184

Interesting how only 9 players are in the top 20 seasons of all time. Of the modern day players, Barry Bonds‘ 2001 season and Todd Helton‘s 2000 season make the top 30. Ryan Howard‘s 2006 MVP season amounts to the 157th best of all time and Justin Morneau’s 2006 is 967th best.

But since we’re looking at a player’s production in the context of his own team, it might be interesting to see who is responsible for the highest percentage of BaseRuns by a single player.

Top 25 All Time (> 500 BSR):

Name                 BSR       BSR%
Ralph Kiner          1100    16.68%
Albert Pujols         817    16.62%
Barry Bonds          2451    15.72%
Roger Connor         1949    15.48%
Jesse Burkett        1867    15.42%
Babe Ruth            2638    15.20%
Stan Musial          2382    15.12%
Hank Aaron           2400    15.02%
Ted Williams         2231    14.98%
Bob Johnson          1369    14.94%
Ty Cobb              2534    14.82%
Honus Wagner         2064    14.70%
Jeff Bagwell         1658    14.62%
Willie Mays          2238    14.61%
Mickey Mantle        1901    14.59%
Tris Speaker         2208    14.43%
Harry Stovey         1447    14.16%
Lou Gehrig           2199    14.06%
Paul Hines           1401    13.98%
Todd Helton          1192    13.92%
Billy Hamilton       1669    13.89%
Ichiro Suzuki         650    13.86%
Cap Anson            2514    13.79%
Ed Delahanty         1786    13.77%
Eddie Mathews        1656    13.73%

What I like about expressing BaseRuns as a percentage of a teams total runs is that you can see just how big a part of the offense that particular player is.

Top 10 in 2006:

Name                 BSR       BSR%
Albert Pujols        133     17.06%
Lance Berkman        125     17.01%
Jason Bay            117     16.90%
Ryan Howard          144     16.69%
Alfonso Soriano      124     16.63%
David Ortiz          134     16.29%
Miguel Cabrera       120     15.81%
Garrett Atkins       126     15.52%
Matt Holliday        124     15.20%
Grady Sizemore       132     15.18%

Jason Bay was an extremely large part of the not so wonderful Pirate offense in 2006. Other notables include Justin Morneau falling in at 12th with 14.65% of the offense.

Anyway, at some point in the future, I’d like to include BaseRuns in the FanGraphs player pages and leader-boards. Since this is my first shot at calculating BaseRuns, I want to make sure I’m calculating them in a way that makes sense. If you see any problems with my methodology, please let me know as I’d hate to have blatantly wrong data on the player pages.

For more information on BaseRuns, Tangotiger had an excellent series on BaseRuns and Linear Weights.


Current Happenings at FanGraphs

Happy (belated) New Year everyone. If you haven’t noticed, FanGraphs now has all historical players dating back to 1871 after heavily massaging the data found at baseball-databank.org. More on that later.

With the 2007 season sneaking up on us, I thought I’d announce two things that FanGraphs has in store for next season that I’m very excited about.

1. FanGraphs will have nightly, accurate, minor league stats updates for all affiliated minor league teams. Sometime this month the 2006 minor league stats will be up on the site and we’ll continue to backdate minor league stats for as many players as possible in the hopes of having a complete as possible minor league database.

2. Real Time Win Probability will be available for all major league games. We’ll possibly have a beta/preview version up and running midway through spring training and try and squish all the bugs by the time the season starts.

On a side note, we’ve stopped collecting news data from blogs for the time being. I was not happy with how things were being categorized and found the whole thing not that useful. I’m hoping to have a new and improved system in place eventually.

In the meantime, enjoy the new historical stats & graphs and feel free to give us your feedback.


More on Plate Discipline

Last year I did a two part series on plate discipline that delved into a few statistics that I thought better represented a batter’s actual plate discipline than your traditional metrics. The stats are in for the 2006 season, so I figured it’d be worth taking another look. Here’s a quick recap of last year’s findings:

Z% (Zone Percentage) – The percentage of pitches a batter sees inside the strike zone. Correlates with walk rate (BB%) and home runs per fly ball ( HR/FB). Batters with more power are pitched more cautiously resulting in a lower Z% and a higher BB%.

OSwing (Outside Swing Percentage) – The percentage of pitches a batter swings at that are outside the strike zone. Correlates with walk rate (BB%). This year, OSwing will be represented as OSwing above the MLB average.

Contact (Contact Percentage) – The percentage of times a batter makes contact with the ball when he swings the bat. Correlates with strikeout rate (K%) and home runs per fly ball (HR/FB). Batters who can’t make contact with the ball obviously strike out more often, and batters who swing “harder” often make less contact, resulting in higher HR/FB and more strikeouts.

So, with the recap out of the way, let’s look at some year to year correlations for all three for the first time.

YtY-PitchZone.png

They all correlate well from year to year, but Both OSwing and Contact correlate extremely well. I consider OSwing about the best measure of plate discipline since not swinging at pitches outside the strike zone is pretty much the definition of plate discipline.

Seeing that it correlates so well from year to year (at least in 2005 & 2006) suggests that players do not quickly develop plate discipline. Perhaps it’s a skill that can be learned over time, but there are few players who saw drastic changes in OSwing from 2005 to 2006. Less than 10% of all players with 300 at-bats in 2005 and 2006 saw more than a 5% change in OSwing from 2005 to 2006.

Name                Dif    Name                  Dif
Andruw Jones      8.09%    Jeromy Burnitz     -5.10%
Geoff Jenkins     7.13%    Dave Roberts       -5.34%
So Taguchi        6.97%    Mark Loretta       -5.38%
Willy Taveras     6.62%    Freddy Sanchez     -5.66%
Aaron Miles       6.16%    Vladimir Guerrero  -5.71%
Scott Hatteberg   6.08%    Jay Payton         -5.89%
Joe Crede         5.92%    Kevin Mench        -6.82%
Jorge Cantu       5.13%    A.J. Pierzynski    -7.50%
Eric Chavez       5.08%    Clint Barmes       -8.39%

Contact showed an even higher correlation from year to year than OSwing, also suggesting that players don’t really change their approach from year to year. In fact, there were only 10 players who had more than a 5% change in Contact from 2005 to 2006.

Name                Dif    Name                  Dif
Corey Patterson   5.10%    Brad Wilkerson     -8.87%
Mike Piazza       5.17%    Bill Hall          -7.41%
Adam Everett      5.51%    Nick Swisher       -7.02%
Reed Johnson      5.71%    Chris Shelton      -6.37%
Troy Glaus        6.71%    Craig Monroe       -5.21%

Finally, Z% showed the least amount of correlation from year to year, but it wasn’t a poor correlation by any means. The decreased correlation I suspect is due to this metric not being entirely within the batters control. While how a batter is pitched is indicative of his various skills (mainly power and overall plate discipline), it’s still up to the pitcher to decide how to proceed.

Between both Contact and OSwing there appears to be a sort of “sweet spot” for batters. Let’s apply some filters to OSwing and see what happens. Particularly, let’s look at power batters who have a HR/FB greater than 15%.

For the first list, let’s limit the batters to those who have “considerably better” plate discipline than the rest of the league. Let’s call “considerably better” an OSwing of 5% or more than league average.

Name                  Contact     HR      HR/FB
Jason Giambi           80.97%     37     20.00%
Morgan Ensberg         74.65%     23     16.43%
Barry Bonds            85.78%     26     16.56%
Nick Johnson           84.53%     23     15.97%
Pat Burrell            79.50%     29     18.13%
Jim Thome              71.92%     42     27.81%
Chipper Jones          82.27%     26     19.12%
Frank Thomas           86.58%     39     17.41%
Carlos Beltran         84.08%     41     21.13%
Adam Dunn              70.42%     40     22.22%
Troy Glaus             75.66%     38     18.72%
Jason Bay              75.26%     35     18.82%
Nick Swisher           71.07%     35     17.86%
Austin Kearns          74.11%     24     15.29%

If we move to the next list, which I’ll use the same criteria for, but instead of batters who are “considerably better”, this will just be batters who have “above average” plate discipline (OSwing between 0% and 5% above league average).

Name                  Contact    HR     HR/FB
Josh Willingham        79.31%    26    15.85%
David Ortiz            77.92%    54    26.09%
Albert Pujols          86.24%    49    22.48%
Casey Blake            82.82%    19    16.67%
Raul Ibanez            80.26%    33    16.50%
Jim Edmonds            73.31%    19    16.81%
Travis Hafner          72.73%    42    30.22%
Lance Berkman          79.24%    45    24.59%
Phil Nevin             72.79%    22    21.57%
Jermaine Dye           78.26%    44    25.43%
Bill Hall              71.97%    35    19.44%
Brad Hawpe             73.98%    22    16.18%
Paul Konerko           82.11%    35    17.50%
Moises Alou            85.24%    22    17.46%
Andruw Jones           72.94%    41    22.04%
Richie Sexson          69.40%    34    19.32%
Mark Teixeira          79.93%    33    15.94%
Alex Rodriguez         74.27%    35    20.23%
Mike Piazza            81.88%    22    17.05%
Ken Griffey Jr.        80.70%    27    18.00%
Manny Ramirez          78.46%    35    23.49%
Miguel Cabrera         80.65%    26    15.57%
Adrian Gonzalez        79.93%    24    15.69%

Next up are the batters who have “below average” plate discipline (OSwing between 0% and 5% below league average).

Name                  Contact     HR      HR/FB
Ray Durham             88.15%     26     15.95%
Adam LaRoche           76.81%     32     21.19%
Marcus Thames          73.19%     26     17.11%
Carlos Delgado         74.39%     38     22.89%
Ty Wigginton           76.38%     24     16.90%
Carlos Lee             86.49%     37     16.09%
Ryan Howard            67.49%     58     39.46%
Mike Cuddye            76.26%     24     15.69%
Aramis Ramirez         84.28%     38     15.14%
Juan Rivera            84.38%     23     17.69%
Mark Teahen            79.05%     18     16.51%
Craig Wilson           70.20%     17     15.74%
Craig Monroe           74.79%     28     15.14%
Matt Holliday          78.84%     34     20.00%
Prince Fielder         76.55%     28     15.82%
Vernon Wells           83.45%     32     15.02%
Torii Hunter           78.02%     31     18.34%
Wilson Betemit         76.67%     18     18.00%
Preston Wilson         76.26%     17     16.67%
Miguel Tejada          84.33%     24     15.48%

And finally, the batters who have “considerably worse” plate discipline (OSwing of 5% or more below average).

Name                  Contact    HR     HR/FB
Jeromy Burnitz         72.47%    16    16.00%
Ben Broussard          77.00%    21    15.56%
Justin Morneau         81.07%    34    16.43%
Rocco Baldelli         77.09%    16    16.00%
Jacque Jones           73.82%    27    25.47%
Alfonso Soriano        73.92%    46    18.25%
Jeff Francoeur         76.47%    29    15.26%
Vladimir Guerrero      83.15%    33    16.34%

Now that you’ve seen the lists, it seems clear to me at least, that it’s preferable to have above average plate discipline. Some of the guys in the below average list are borderline, but for the most part, it’s just not as prestigious a list.

The “considerably worse” list is fascinating, since some of these players actually get away with such an aggressive approach. Vladimir Guerrero, who swings at pretty much everything, is talented enough to get away with it. Justin Morneau got away with it last year, but it’s worth noting his plate discipline didn’t improve from 2005 to 2006 and his 2005 season was, fairly forgettable. For what it’s worth, his contact rate did rise by about 3%.

It looks like high contact rates may be able to counter poor plate discipline. It would seem to me that the truly “special” players (with exceptions like Vladimir Guerrero) have that rare combination of power, plate discipline, and contact rates. You see this in players like Barry Bonds, Jason Giambi, Frank Thomas, Carlos Beltran and of course Albert Pujols. Of course, this isn’t the be-all-end-all filter, since there are a few players who sneak in like Casey Blake, who I wouldn’t consider particularly special.

So far we’ve looked at players who had at least 300 at bats, but maybe it’s possible to identify some breakout players from batters who had less than desired playing time.

Here are the players in 2006 who had an OSwing greater than -2% below average and a HR/FB over 12%. I relaxed the HR/FB filter slightly since being able to hit for power might not be quite there yet in younger players.

Name                  Contact    HR     HR/FB
Hideki Matsui          87.70%     8    12.12%
Gabe Gross             76.60%     9    14.06%
Chris Snyder           80.31%     6    12.24%
David Dellucci         73.62%    13    14.44%
Greg Norton            76.61%    17    17.89%
J.J. Hardy             85.78%     5    13.89%
Derrek Lee             81.04%     8    15.09%
Corey Koskie           77.05%    12    17.14%
Damion Easley          83.13%     9    14.06%
Aaron Guiel            77.69%     7    17.07%
Luke Scott             78.83%    10    14.71%
Jason LaRue            71.65%     8    15.69%
Wes Helms              77.88%    10    14.71%
Freddie Bynum          72.73%     4    15.38%
Ben Johnson            72.95%     4    12.90%
Michael Napoli         68.34%    16    17.20%
Chris Duncan           77.25%    22    29.33%
Scott Spiezio          80.91%    13    13.83%
Corey Hart             75.84%     9    12.16%
Russell Branyan        63.90%    18    22.50%
Dave Ross              71.51%    21    23.86%
Yorvit Torrealba       78.96%     7    16.28%
Ryan Doumit            74.43%     6    14.63%
Marlon Anderson        81.94%    12    13.79%
Daryle Ward            77.90%     7    15.22%
Josh Bard              85.16%     9    15.79%
Carlos Quentin         74.38%     9    18.00%
Joe Borchard           70.60%    10    17.54%
Cody Ross              76.85%    13    14.77%
Adam Melhuse           73.71%     4    14.81%

This is by no means a “magic bullet” list, but I’d consider it one of many starting points for narrowing down possible breakout players. There are certainly a few players on this list such as Josh Bard, Chris Duncan, Luke Scott, and others that appear to be quite promising. It’s also a reminder that injured players shouldn’t be forgotten such as Derrek Lee, Hideki Matsui, and David Dellucci.

If you were to look at the same list last year, out of about 30 players, you’d have identified Frank Thomas, Jim Thome, J.D. Drew, Mark DeRosa, Milton Bradley, Matt Murton, Ty Wigginton, Nomar Garciaparra, Marcus Thames and Curtis Granderson. So, about one third of the players ended up being at least decent to excellent sleepers.

If you’re still with me, we’ll look at one last list of filters, which I’d consider a sort of potential breakout power hitter list with already established players. I’ll filter on players with an above average OSwing, a contact rate between 70% and 85%, a HR/FB greater than 7.5%, and players all under the age of 30.

Name                  Contact    HR    HR/FB
Ryan Langerhans        76.50%     7     8.54%
Jhonny Peralta         73.48%    13     9.22%
Bobby Crosby           77.88%     9     9.09%
Jonny Gomes            70.95%    20    13.33%
Rickie Weeks           73.58%     8     9.09%
Jose Bautista          77.72%    16    11.59%
Chris Shelton          73.46%    16    12.60%
Edwin Encarnacion      80.15%    15    12.10%
Curtis Granderson      70.84%    19    11.66%
Matt Murton            83.57%    13    13.54%
Jeremy Hermida         78.88%     5     6.17%

Last year, using the same filter, yielded a group of 15 batters who hit 190 home runs in 2005 and 266 home runs in 2006. The group included Carlos Beltran, Grady Sizemore, Mark Teahen, Nick Swisher, Brad Hawpe, and others.

The bottom line is, that since stats like OSwing and Contact do correlate so well from year to year, it would definitely make sense to include them in a projection system (instead of me boring you to death with random filters). I’d say Contact is arguably better than using strikeouts, and OSwing is really unlike any of the traditional statistics that would go into projections.