Archive for Research

THT Projections: A (Quick) Closer Look

Earlier this week the much anticipated Hardball Times 2007 Season Preview was released, and with it a brand new projection system. I recently took a look at Bill James, CHONE, ZiPS, and the Marcel projection systems to see how they differed. Let’s throw THT into the mix and see where it has its major differences.

First off, let’s see how THT fares against the other projection systems in OPS and ERA as a whole when compared to the Marcel projection system (the simplest of the five).

System        ERA-R^2    OPS-R^2
ZiPS             .725       .908
Bill James       .714       .875
CHONE            .699       .865
THT              .681       .837

And in English, when comparing the other projection systems to the Marcel projection system, THT’s system is the least similar. (When look at batters with 300+ at-bats and pitchers with 100+ innings.)

So which batters does THT disagree on the most in terms of OPS?

Name            Bill James    CHONE   Marcel     THT    ZiPS
Frank Thomas          .939     .853     .874    .982    .892
Hanley Ramirez        .801     .791     .843    .714    .777
Robinson Cano         .860     .842     .852    .766    .836
Chris Duncan          .862     .776     .891    .753    .803
Melky Cabrera         .766     .796     .787    .715    .800

Except for Frank Thomas, who THT projects is going to have a phenomenal season, they’re the low point for the other four players. It’s interesting to note that those four are also first or second year major league players. There’s generally a lot of disagreement about Chris Duncan and Hanley Ramirez, but the THT projections for Robinson Cano and Melky Cabrera appear to be the sole point of difference. Let’s look at the pitchers:

Name            Bill James    CHONE   Marcel     THT    ZiPS
Tony Armas Jr.        4.85     4.64     4.96    5.81    4.88
Carlos Zambrano       3.40     3.47     3.48    2.77    3.46
Cliff Lee             4.43     4.20     4.48    5.04    4.55
James Shields         4.03     4.29     4.72    5.03    4.70
Brandon Webb          3.53     3.60     3.65    3.07    3.85
Randy Johnson         4.31     3.77     4.33    3.43    3.63

THT clearly hates Tony Armas Jr. (more) with his ERA about a point higher than the others, while they love Carlos Zambrano who they have at about a .75 lower ERA than the other systems. I threw in Randy Johnson since he was next on the list. It looks like the projections are pretty well divided for him between the 4.30-ish ERA, and the 3.50-ish ERA.

Anyway, the THT projections are certainly similar to the others, but there are clearly a number of key differences which are definitely worth a look. There’s also a lot more to projections than ERA and OPS, so I’m sure you’ll find many other unique aspects to THT’s projection system. Like with any projection system, we’ll have to wait and see which one happens to be the most accurate for 2007.


Batted Ball Splits

There was an excellent study done by Dave Studeman in the 2007 Hardball Times: Annual that looked at the run value of each event in baseball using linear weights. I thought it might be fun to look at your typical splits by batted ball type instead of by run value:

Type     AB      H     2B    3B     HR    RBI   AVG   SLG    OPS 
FB    43439  11512   3434   483   5127  11734  .265  .720  0.978 
GB    59246  13996   1212    67      0   4300  .236  .259  0.495 
IFFB   5083     15      4     0      0      1  .003  .004  0.007 
LD    26447  19005   4485   402    259   6028  .719  .948  1.663

And a few more stats:

Type      ISO  BABIP   HR/Type      RC   RC/G 
FB       .455   .167    11.47%    8227   6.70 
GB       .023   .236     0.00%    2614   1.44 
IFFB     .001   .003     0.00%       0   0.00 
LD       .229   .716     0.97%   17947  63.62

Clearly line-drives are the cream of the crop. Oddly enough, about 1% of line drives turn out being home runs, which means about 10.5% of all fly balls (including infield fly balls) end up being home runs.

Fly balls are a tricky one because as long as you’re hitting 11.5% of them out of the park, you’re better off hitting them than groundballs. But if you’re hitting them in the park, then it’s a completely different story. Fly-balls that aren’t home runs have a mere .167 batting average compared to groundballs that have a .236 batting average.

Infield fly-balls or pop-ups are completely worthless. Of all 5000 of them in 2006, only 15 landed for hits. Pretty amazing 4 of them were doubles. I’m not sure how that’s even possible. If you’re going to hit pop-ups all day long you’re better of just not swinging the bat and hope for a walk.

Anyway, that’s just a quick look at the aggregates. Not all players hit line-drives, fly balls, and ground balls the same as you’ll soon see. Let’s look at the best and worst fly ball batters first.

Name             AVG    SLG    OPS    ISO  BABIP     RC  HR/FB  RC/27    FB% 
Ryan Howard     .507  1.824  2.309  1.316   .173    122  38.7%   45.1  36.2% 
Travis Hafner   .448  1.425  1.873  0.978   .237     86  27.6%   31.2  40.3% 
Chris Duncan    .418  1.463  1.868  1.045   .133     40  31.9%   26.5  35.2% 
Lance Berkman   .435  1.367  1.780  0.932   .210     85  27.1%   25.1  41.8% 
Jim Thome       .415  1.400  1.803  0.985   .160     77  29.5%   25.0  43.1% 
Manny Ramirez   .406  1.256  1.639  0.850   .194     66  24.8%   20.3  42.0% 
Wilson Betemit  .404  1.096  1.500  0.691   .273     42  18.1%   20.1  36.6% 
Adam LaRoche    .396  1.245  1.624  0.849   .215     67  22.1%   20.0  40.9% 
Preston Wilson  .414  1.103  1.503  0.690   .282     39  17.8%   19.5  26.7% 
Jacque Jones    .388  1.155  1.540  0.767   .213     46  22.1%   19.4  25.5% 
David Ortiz     .366  1.274  1.632  0.909   .119     86  27.4%   18.9  46.8% 
Alex Rodriguez  .385  1.142  1.517  0.757   .202     64  22.4%   18.2  39.6% 
Carlos Beltran  .376  1.194  1.554  0.818   .183     72  22.7%   17.8  46.6% 
Jermaine Dye    .376  1.178  1.540  0.803   .176     68  23.3%   17.7  40.4% 
Derek Jeter     .393  1.056  1.433  0.663   .280     36  15.1%   16.8  18.3% 
Richie Sexson   .366  1.131  1.487  0.765   .192     62  21.0%   16.7  39.9% 
Andruw Jones    .356  1.228  1.564  0.872   .111     63  26.0%   16.2  41.6% 
Nick Johnson    .372  1.047  1.410  0.674   .243     50  16.7%   16.0  35.6% 
Vlad. Guerrero  .377  1.017  1.386  0.640   .243     66  17.3%   15.9  37.2% 
Jason Bay       .371  1.106  1.458  0.735   .219     68  18.4%   15.8  44.0%

On this list, four names really stand out to me: Preston Wilson, Jacque Jones and Derek Jeter. Even though Jeter hit fly-balls an extremely low 18.3% of the time, he really did make the most of them. I’ve written several times about Jacque Jones’ “hidden power”, and clearly when he gets the ball in the air he’s really quite successful. Same goes for Preston Wilson. Let’s have a look at the worst fly-ball batters.

Name             AVG    SLG    OPS    ISO  BABIP     RC  HR/FB  RC/27    FB% 
D. Eckstein     .123  0.211  0.331  0.088   .107      3   1.7%    0.8  29.1% 
P. Polanco      .119  0.284  0.402  0.165   .086      4   3.6%    1.0  27.9% 
Joey Gathright  .114  0.314  0.417  0.200   .088      1   2.6%    1.0  16.9% 
Jason Kendall   .148  0.235  0.376  0.087   .140      4   0.8%    1.1  25.9% 
Neifi Perez     .133  0.267  0.396  0.133   .114      3   2.2%    1.1  41.0% 
Abraham Nunez   .130  0.296  0.426  0.167   .096      2   3.7%    1.2  23.1% 
So Taguchi      .145  0.303  0.444  0.158   .122      3   2.6%    1.4  30.5% 
Kenny Lofton    .152  0.333  0.483  0.182   .132      7   2.2%    1.6  33.4% 
Nick Punto      .168  0.307  0.467  0.139   .160      5   0.9%    1.6  30.1% 
Jack Wilson     .139  0.391  0.525  0.252   .083      6   5.8%    1.7  30.3% 
Mark Loretta    .167  0.312  0.475  0.145   .144     10   2.6%    1.7  37.6% 
Juan Pierre     .153  0.343  0.496  0.190   .134      7   2.2%    1.7  23.8% 
Alf. Amezaga    .156  0.377  0.530  0.221   .122      5   3.9%    1.9  32.6% 
Yadier Molina   .159  0.373  0.530  0.214   .117      7   4.7%    1.9  39.1% 
Luis Castillo   .163  0.370  0.531  0.207   .135      6   3.2%    1.9  20.8% 
Y. Betancourt   .162  0.372  0.533  0.209   .121      9   4.7%    1.9  35.7% 
Clint Barmes    .173  0.358  0.524  0.185   .141     10   3.6%    2.0  47.9% 
Brian Roberts   .155  0.423  0.573  0.268   .101     11   5.8%    2.0  35.5% 
Aaron Miles     .177  0.367  0.538  0.190   .156      5   2.4%    2.1  24.5% 
Brad Ausmus     .179  0.358  0.533  0.179   .161      6   2.1%    2.1  28.1%

No surprises here really. These guys are not your power hitters and as mentioned before, if you’re not a power hitter, you’re better off hitting groundballs. Maybe Clint Barmes and Neifi Perez are trying to be something they’re not. Moving on to groundballs, here are the best groundball batters:

Name             AVG    SLG    OPS    ISO  BABIP     RC   IFH%  RC/27    GB% 
Rocco Baldelli  .342  0.389  0.732  0.047   .342     19  10.1%    5.2  50.5% 
Carl Crawford   .321  0.366  0.687  0.045   .321     28  10.6%    4.1  52.2% 
Hanley Ramirez  .303  0.376  0.679  0.073   .303     23  10.6%    3.9  43.8% 
Esteban German  .338  0.369  0.708  0.031   .338     13   7.7%    3.8  58.0% 
S. Victorino    .316  0.354  0.671  0.038   .316     16   8.2%    3.8  44.5% 
Wily Mo Pena    .355  0.382  0.737  0.026   .355      8  11.8%    3.7  39.8% 
Ichiro Suzuki   .307  0.316  0.623  0.009   .307     30  13.0%    3.7  50.7% 
Ryan Freel      .312  0.351  0.662  0.039   .312     15  12.3%    3.7  43.9% 
Daniel Uggla    .310  0.330  0.640  0.020   .310     19   9.5%    3.6  41.0% 
Rickie Weeks    .320  0.352  0.672  0.033   .320     12   9.8%    3.5  46.2% 
Ben Broussard   .328  0.351  0.679  0.022   .328     13   3.7%    3.5  40.2% 
Chris Burke     .324  0.353  0.676  0.029   .324     10   5.9%    3.4  36.0% 
Marcus Thames   .273  0.333  0.606  0.061   .273      6   7.6%    3.4  25.7% 
Mike Lamb       .328  0.351  0.679  0.022   .328     12   3.7%    3.2  40.5% 
Y. Betancourt   .303  0.333  0.637  0.030   .303     20   6.8%    3.2  46.4% 
Chris Duffy     .284  0.306  0.590  0.022   .284     11  10.5%    3.2  58.0% 
Alf. Amezaga    .298  0.319  0.617  0.021   .298     12  11.4%    3.1  50.5% 
Mike Cameron    .299  0.344  0.643  0.045   .299     13  12.3%    3.0  37.6% 
G. Matthews     .284  0.321  0.604  0.037   .284     22   7.1%    3.0  51.0% 
Rafael Furcal   .285  0.311  0.596  0.026   .285     21   6.4%    2.9  49.9%

The one name that really stands out for me here is Wily Mo Pena. He just hits the ball hard, so chances are it makes his groundballs just that much more difficult to field. The rest of these guys are pretty much groundball batters, many of them quite fast. And now the worst groundball batters:

Name             AVG    SLG    OPS    ISO  BABIP     RC   IFH%  RC/27    GB% 
Barry Bonds     .135  0.135  0.271  0.000   .135      1   1.0%    0.2  30.3% 
Adam Dunn       .136  0.146  0.282  0.010   .136      1   1.0%    0.2  27.8% 
Bengie Molina   .153  0.153  0.307  0.000   .153      1   2.0%    0.3  38.7% 
Adam Kennedy    .161  0.168  0.329  0.006   .161      2   2.6%    0.3  40.7% 
Yadier Molina   .156  0.181  0.338  0.025   .156      2   3.7%    0.3  42.5% 
Gregg Zaun      .168  0.189  0.358  0.021   .168      1   2.1%    0.3  37.6% 
Phil Nevin      .168  0.192  0.360  0.024   .168      2   4.8%    0.4  42.7% 
Alex Cintron    .157  0.165  0.322  0.009   .157      1   3.5%    0.4  46.0% 
Damian Miller   .173  0.182  0.355  0.009   .173      1   5.5%    0.4  44.2% 
Dd. Navarro     .171  0.184  0.355  0.013   .171      1   4.0%    0.4  35.0% 
B. Schneider    .172  0.172  0.344  0.000   .172      2   3.1%    0.4  47.3% 
Brad Ausmus     .183  0.198  0.381  0.015   .183      3   4.1%    0.4  53.2% 
Jason Giambi    .171  0.200  0.371  0.029   .171      2   2.9%    0.5  30.3% 
Adr. Gonzalez   .194  0.219  0.413  0.025   .194      3   1.0%    0.5  43.8% 
Khalil Greene   .204  0.239  0.442  0.035   .204      2   0.9%    0.5  34.6% 
Kevin Millar    .189  0.220  0.409  0.031   .189      2   3.9%    0.5  35.5% 
Russell Martin  .187  0.192  0.379  0.005   .187      3   2.2%    0.5  50.4% 
Mike Lowell     .194  0.230  0.423  0.036   .194      4   5.6%    0.6  37.8% 
Brian Giles     .183  0.188  0.372  0.005   .183      4   4.1%    0.6  39.8% 
Eric Chavez     .212  0.232  0.444  0.020   .212      3   2.7%    0.6  38.6%

It’s not often you find out that Barry Bonds is the worst at something. All in all, I find this a rather bizarre mix of players and I’m really not sure what to make of it. Let’s look at the best line-drive batters:

Name             AVG    SLG    OPS    ISO  BABIP     RC  HR/LD  RC/27    LD% 
Eric Hinske     .875  1.188  2.063  0.313   .875     33   0.0%  224.4  16.2% 
J.D. Drew       .865  1.216  2.081  0.351   .865     78   0.0%  210.2  18.8% 
Wily Mo Pena    .872  1.179  2.029  0.308   .868     40   2.5%  177.9  20.9% 
Mig. Cabrera    .842  1.123  1.965  0.281   .841    108   0.9%  161.7  24.2% 
Jason Bay       .848  1.045  1.894  0.197   .844     59   3.0%  158.1  15.6% 
Austin Kearns   .833  1.154  1.987  0.321   .831     75   1.3%  155.8  19.2% 
Brad Hawpe      .829  1.134  1.963  0.305   .829     77   0.0%  148.7  21.7% 
G. Sizemore     .810  1.170  1.980  0.360   .806     95   2.0%  134.7  19.8% 
Scott Spiezio   .805  1.171  1.976  0.366   .800     39   2.4%  130.4  19.9% 
Russ. Martin    .817  1.169  1.975  0.352   .814     67   1.4%  129.8  19.9% 
Matt Stairs     .826  1.000  1.826  0.174   .826     38   0.0%  128.3  17.4% 
Jay Gibbons     .809  1.085  1.894  0.277   .809     41   0.0%  123.7  15.9% 
Reed Johnson    .808  1.055  1.863  0.247   .808     62   0.0%  120.0  19.7% 
G. Matthews     .788  1.192  1.980  0.404   .781     93   3.0%  119.5  18.8% 
Jose Valentin   .796  1.122  1.918  0.327   .796     44   0.0%  118.2  15.6% 
Chase Utley     .804  1.118  1.914  0.314   .798     91   2.9%  117.2  19.5% 
Todd Helton     .807  1.088  1.888  0.281   .805    100   0.9%  116.9  23.6% 
Matt Holliday   .788  1.144  1.933  0.356   .780     94   3.9%  115.2  21.0% 
David Wright    .824  1.033  1.839  0.209   .822     77   1.1%  115.0  19.5% 
Bill Hall       .789  1.225  2.003  0.437   .783     68   2.8%  114.9  19.2%

Obviously there are a lot of solid to excellent players on this list, but nothing especially noteworthy. And last but not least, the worst line-drive batters:

Name             AVG    SLG    OPS    ISO  BABIP     RC  HR/LD  RC/27    LD% 
Cliff Floyd     .540  0.740  1.280  0.200   .540     20   0.0%   23.5  18.1% 
David Bell      .600  0.730  1.313  0.130   .596     43   1.0%   27.3  23.4% 
Endy Chavez     .593  0.780  1.373  0.186   .593     27   0.0%   28.6  20.1% 
Juan Uribe      .585  0.862  1.437  0.277   .578     32   1.5%   29.5  17.2% 
Rondell White   .600  0.767  1.357  0.167   .593     27   1.6%   29.7  21.3% 
Alf. Amezaga    .617  0.702  1.319  0.085   .617     20   0.0%   30.5  16.9% 
Ronny Cedeno    .594  0.841  1.435  0.246   .594     34   0.0%   33.2  16.4% 
Chone Figgins   .627  0.745  1.373  0.118   .627     48   0.0%   33.9  20.7% 
Carl Crawford   .609  0.848  1.450  0.239   .609     47   0.0%   34.5  18.3% 
Moises Alou     .594  0.906  1.500  0.313   .587     34   1.6%   35.8  20.1% 
Damon Hollins   .600  0.940  1.528  0.340   .583     28   3.9%   35.9  19.0% 
Willy Taveras   .620  0.817  1.437  0.197   .620     36   0.0%   35.9  17.5% 
B. Phillips     .635  0.800  1.428  0.165   .635     43   0.0%   36.3  19.2% 
Chris Duncan    .622  0.822  1.444  0.200   .622     23   0.0%   36.6  21.1% 
Jason Kendall   .642  0.758  1.400  0.117   .642     58   0.0%   36.7  23.9% 
Aaron Boone     .639  0.778  1.417  0.139   .639     36   0.0%   37.2  24.7% 
W. Betemit      .643  0.857  1.478  0.214   .636     30   1.7%   37.4  21.3% 
Cory Sullivan   .651  0.831  1.459  0.181   .651     44   0.0%   37.4  31.5% 
Joey Gathright  .622  0.844  1.467  0.222   .622     24   0.0%   37.6  16.2% 
S.Hatteberg     .651  0.779  1.423  0.128   .651     43   0.0%   37.9  20.8%

Line-drive percentage will fluctuate from year to year, but I wonder if how a player hits line-drives changes much from year to year. I suppose you could ask that question for any of the batted ball types. When I get the data I’ll be sure to take a look at that, but just thinking off the top of my head, I’ll bet the fly-balls and groundballs remain fairly constant, while line-drives do not.

Furthermore, at some point this season, we’re hoping to have batted ball splits available for all players for 2002 onward.


Fun With BaseRuns

With a brand new (to me) historical database of all players, my project for the day was to calculate BaseRuns for all batters. BaseRuns models run creation, much like Bill James’ Runs Created, but BaseRuns is a more accurate model. As for the calculations, I decided to stick to David Smyth’s BaseRuns Primer. I used the “simple” version for seasons prior to 1955 and the more “complex” version for anything 1955 to the present. Here’s the more complex version, where BaseRuns = A*B/(B+C)+D

A = H + BB + HBP – HR – .5*IBB
B = [1.4*TB -.6*H -3*HR +.1*(BB+HBP-IBB) +.9*(SB-CS-GDP)] * X
C = AB – H + CS + GDP
D = HR

The quick and dirty version of what I did was, determine the B multiplier or X for each major league team by season, use BaseRuns to calculate the number of runs a team would have had without a particular player, and then subtract that from the actual runs the team had, to get that player’s BaseRuns.

To determine the B multiplier, I dug up my 8th grade algebra skills to solve the following equation for X: Runs = A * (B * X)/((B * X) + C) + D

X = ((Runs – D) * C) / B / (A – (Runs – D))

Hopefully, even with my rusty algebra skills, this was (and still is) correct. Now that I had my B multipliers (X), I could go ahead and calculate what teams would have done without a particular player and then finally get a players BaseRuns. So just for kicks, let’s look at a few lists:

Top 20 All Time:

Name                  BSR      RC
Babe Ruth             2638     2757
Ty Cobb               2534     2524
Cap Anson             2514     1794
Barry Bonds           2451     2791
Hank Aaron            2400     2553
Stan Musial           2382     2569
Willie Mays           2238     2369
Ted Williams          2231     2384
Tris Speaker          2208     2176
Lou Gehrig            2199     2264
Rickey Henderson      2166     2167
Pete Rose             2116     2220
Mel Ott               2104     2085
Jimmie Foxx           2072     2146
Honus Wagner          2064     1888
Carl Yastrzemski      2050     2147
Frank Robinson        2012     2127
Eddie Collins         1997     1799
Roger Connor          1949     1498
Rafael Palmeiro       1922     2040

Top 20 Seasons: All Time

Name                 Season    BSR     RC
Babe Ruth            1921      212    233
Hugh Duffy           1894      204    187
Tip O'Neill          1887      202    173
Babe Ruth            1923      199    216
Jimmie Foxx          1932      191    206
Babe Ruth            1920      190    206
Billy Hamilton       1894      188    148
Joe Kelley           1894      186    152
Lou Gehrig           1927      186    211
Lou Gehrig           1930      185    197
Lou Gehrig           1936      184    190
Babe Ruth            1927      183    203
Babe Ruth            1924      183    199
Lou Gehrig           1931      183    183
Babe Ruth            1931      182    185
Babe Ruth            1930      181    187
Rogers Hornsby       1922      180    206
Rogers Hornsby       1929      178    188
Ted Williams         1949      177    180
Jimmie Foxx          1938      175    184

Interesting how only 9 players are in the top 20 seasons of all time. Of the modern day players, Barry Bonds‘ 2001 season and Todd Helton‘s 2000 season make the top 30. Ryan Howard‘s 2006 MVP season amounts to the 157th best of all time and Justin Morneau’s 2006 is 967th best.

But since we’re looking at a player’s production in the context of his own team, it might be interesting to see who is responsible for the highest percentage of BaseRuns by a single player.

Top 25 All Time (> 500 BSR):

Name                 BSR       BSR%
Ralph Kiner          1100    16.68%
Albert Pujols         817    16.62%
Barry Bonds          2451    15.72%
Roger Connor         1949    15.48%
Jesse Burkett        1867    15.42%
Babe Ruth            2638    15.20%
Stan Musial          2382    15.12%
Hank Aaron           2400    15.02%
Ted Williams         2231    14.98%
Bob Johnson          1369    14.94%
Ty Cobb              2534    14.82%
Honus Wagner         2064    14.70%
Jeff Bagwell         1658    14.62%
Willie Mays          2238    14.61%
Mickey Mantle        1901    14.59%
Tris Speaker         2208    14.43%
Harry Stovey         1447    14.16%
Lou Gehrig           2199    14.06%
Paul Hines           1401    13.98%
Todd Helton          1192    13.92%
Billy Hamilton       1669    13.89%
Ichiro Suzuki         650    13.86%
Cap Anson            2514    13.79%
Ed Delahanty         1786    13.77%
Eddie Mathews        1656    13.73%

What I like about expressing BaseRuns as a percentage of a teams total runs is that you can see just how big a part of the offense that particular player is.

Top 10 in 2006:

Name                 BSR       BSR%
Albert Pujols        133     17.06%
Lance Berkman        125     17.01%
Jason Bay            117     16.90%
Ryan Howard          144     16.69%
Alfonso Soriano      124     16.63%
David Ortiz          134     16.29%
Miguel Cabrera       120     15.81%
Garrett Atkins       126     15.52%
Matt Holliday        124     15.20%
Grady Sizemore       132     15.18%

Jason Bay was an extremely large part of the not so wonderful Pirate offense in 2006. Other notables include Justin Morneau falling in at 12th with 14.65% of the offense.

Anyway, at some point in the future, I’d like to include BaseRuns in the FanGraphs player pages and leader-boards. Since this is my first shot at calculating BaseRuns, I want to make sure I’m calculating them in a way that makes sense. If you see any problems with my methodology, please let me know as I’d hate to have blatantly wrong data on the player pages.

For more information on BaseRuns, Tangotiger had an excellent series on BaseRuns and Linear Weights.


More on Plate Discipline

Last year I did a two part series on plate discipline that delved into a few statistics that I thought better represented a batter’s actual plate discipline than your traditional metrics. The stats are in for the 2006 season, so I figured it’d be worth taking another look. Here’s a quick recap of last year’s findings:

Z% (Zone Percentage) – The percentage of pitches a batter sees inside the strike zone. Correlates with walk rate (BB%) and home runs per fly ball ( HR/FB). Batters with more power are pitched more cautiously resulting in a lower Z% and a higher BB%.

OSwing (Outside Swing Percentage) – The percentage of pitches a batter swings at that are outside the strike zone. Correlates with walk rate (BB%). This year, OSwing will be represented as OSwing above the MLB average.

Contact (Contact Percentage) – The percentage of times a batter makes contact with the ball when he swings the bat. Correlates with strikeout rate (K%) and home runs per fly ball (HR/FB). Batters who can’t make contact with the ball obviously strike out more often, and batters who swing “harder” often make less contact, resulting in higher HR/FB and more strikeouts.

So, with the recap out of the way, let’s look at some year to year correlations for all three for the first time.

YtY-PitchZone.png

They all correlate well from year to year, but Both OSwing and Contact correlate extremely well. I consider OSwing about the best measure of plate discipline since not swinging at pitches outside the strike zone is pretty much the definition of plate discipline.

Seeing that it correlates so well from year to year (at least in 2005 & 2006) suggests that players do not quickly develop plate discipline. Perhaps it’s a skill that can be learned over time, but there are few players who saw drastic changes in OSwing from 2005 to 2006. Less than 10% of all players with 300 at-bats in 2005 and 2006 saw more than a 5% change in OSwing from 2005 to 2006.

Name                Dif    Name                  Dif
Andruw Jones      8.09%    Jeromy Burnitz     -5.10%
Geoff Jenkins     7.13%    Dave Roberts       -5.34%
So Taguchi        6.97%    Mark Loretta       -5.38%
Willy Taveras     6.62%    Freddy Sanchez     -5.66%
Aaron Miles       6.16%    Vladimir Guerrero  -5.71%
Scott Hatteberg   6.08%    Jay Payton         -5.89%
Joe Crede         5.92%    Kevin Mench        -6.82%
Jorge Cantu       5.13%    A.J. Pierzynski    -7.50%
Eric Chavez       5.08%    Clint Barmes       -8.39%

Contact showed an even higher correlation from year to year than OSwing, also suggesting that players don’t really change their approach from year to year. In fact, there were only 10 players who had more than a 5% change in Contact from 2005 to 2006.

Name                Dif    Name                  Dif
Corey Patterson   5.10%    Brad Wilkerson     -8.87%
Mike Piazza       5.17%    Bill Hall          -7.41%
Adam Everett      5.51%    Nick Swisher       -7.02%
Reed Johnson      5.71%    Chris Shelton      -6.37%
Troy Glaus        6.71%    Craig Monroe       -5.21%

Finally, Z% showed the least amount of correlation from year to year, but it wasn’t a poor correlation by any means. The decreased correlation I suspect is due to this metric not being entirely within the batters control. While how a batter is pitched is indicative of his various skills (mainly power and overall plate discipline), it’s still up to the pitcher to decide how to proceed.

Between both Contact and OSwing there appears to be a sort of “sweet spot” for batters. Let’s apply some filters to OSwing and see what happens. Particularly, let’s look at power batters who have a HR/FB greater than 15%.

For the first list, let’s limit the batters to those who have “considerably better” plate discipline than the rest of the league. Let’s call “considerably better” an OSwing of 5% or more than league average.

Name                  Contact     HR      HR/FB
Jason Giambi           80.97%     37     20.00%
Morgan Ensberg         74.65%     23     16.43%
Barry Bonds            85.78%     26     16.56%
Nick Johnson           84.53%     23     15.97%
Pat Burrell            79.50%     29     18.13%
Jim Thome              71.92%     42     27.81%
Chipper Jones          82.27%     26     19.12%
Frank Thomas           86.58%     39     17.41%
Carlos Beltran         84.08%     41     21.13%
Adam Dunn              70.42%     40     22.22%
Troy Glaus             75.66%     38     18.72%
Jason Bay              75.26%     35     18.82%
Nick Swisher           71.07%     35     17.86%
Austin Kearns          74.11%     24     15.29%

If we move to the next list, which I’ll use the same criteria for, but instead of batters who are “considerably better”, this will just be batters who have “above average” plate discipline (OSwing between 0% and 5% above league average).

Name                  Contact    HR     HR/FB
Josh Willingham        79.31%    26    15.85%
David Ortiz            77.92%    54    26.09%
Albert Pujols          86.24%    49    22.48%
Casey Blake            82.82%    19    16.67%
Raul Ibanez            80.26%    33    16.50%
Jim Edmonds            73.31%    19    16.81%
Travis Hafner          72.73%    42    30.22%
Lance Berkman          79.24%    45    24.59%
Phil Nevin             72.79%    22    21.57%
Jermaine Dye           78.26%    44    25.43%
Bill Hall              71.97%    35    19.44%
Brad Hawpe             73.98%    22    16.18%
Paul Konerko           82.11%    35    17.50%
Moises Alou            85.24%    22    17.46%
Andruw Jones           72.94%    41    22.04%
Richie Sexson          69.40%    34    19.32%
Mark Teixeira          79.93%    33    15.94%
Alex Rodriguez         74.27%    35    20.23%
Mike Piazza            81.88%    22    17.05%
Ken Griffey Jr.        80.70%    27    18.00%
Manny Ramirez          78.46%    35    23.49%
Miguel Cabrera         80.65%    26    15.57%
Adrian Gonzalez        79.93%    24    15.69%

Next up are the batters who have “below average” plate discipline (OSwing between 0% and 5% below league average).

Name                  Contact     HR      HR/FB
Ray Durham             88.15%     26     15.95%
Adam LaRoche           76.81%     32     21.19%
Marcus Thames          73.19%     26     17.11%
Carlos Delgado         74.39%     38     22.89%
Ty Wigginton           76.38%     24     16.90%
Carlos Lee             86.49%     37     16.09%
Ryan Howard            67.49%     58     39.46%
Mike Cuddye            76.26%     24     15.69%
Aramis Ramirez         84.28%     38     15.14%
Juan Rivera            84.38%     23     17.69%
Mark Teahen            79.05%     18     16.51%
Craig Wilson           70.20%     17     15.74%
Craig Monroe           74.79%     28     15.14%
Matt Holliday          78.84%     34     20.00%
Prince Fielder         76.55%     28     15.82%
Vernon Wells           83.45%     32     15.02%
Torii Hunter           78.02%     31     18.34%
Wilson Betemit         76.67%     18     18.00%
Preston Wilson         76.26%     17     16.67%
Miguel Tejada          84.33%     24     15.48%

And finally, the batters who have “considerably worse” plate discipline (OSwing of 5% or more below average).

Name                  Contact    HR     HR/FB
Jeromy Burnitz         72.47%    16    16.00%
Ben Broussard          77.00%    21    15.56%
Justin Morneau         81.07%    34    16.43%
Rocco Baldelli         77.09%    16    16.00%
Jacque Jones           73.82%    27    25.47%
Alfonso Soriano        73.92%    46    18.25%
Jeff Francoeur         76.47%    29    15.26%
Vladimir Guerrero      83.15%    33    16.34%

Now that you’ve seen the lists, it seems clear to me at least, that it’s preferable to have above average plate discipline. Some of the guys in the below average list are borderline, but for the most part, it’s just not as prestigious a list.

The “considerably worse” list is fascinating, since some of these players actually get away with such an aggressive approach. Vladimir Guerrero, who swings at pretty much everything, is talented enough to get away with it. Justin Morneau got away with it last year, but it’s worth noting his plate discipline didn’t improve from 2005 to 2006 and his 2005 season was, fairly forgettable. For what it’s worth, his contact rate did rise by about 3%.

It looks like high contact rates may be able to counter poor plate discipline. It would seem to me that the truly “special” players (with exceptions like Vladimir Guerrero) have that rare combination of power, plate discipline, and contact rates. You see this in players like Barry Bonds, Jason Giambi, Frank Thomas, Carlos Beltran and of course Albert Pujols. Of course, this isn’t the be-all-end-all filter, since there are a few players who sneak in like Casey Blake, who I wouldn’t consider particularly special.

So far we’ve looked at players who had at least 300 at bats, but maybe it’s possible to identify some breakout players from batters who had less than desired playing time.

Here are the players in 2006 who had an OSwing greater than -2% below average and a HR/FB over 12%. I relaxed the HR/FB filter slightly since being able to hit for power might not be quite there yet in younger players.

Name                  Contact    HR     HR/FB
Hideki Matsui          87.70%     8    12.12%
Gabe Gross             76.60%     9    14.06%
Chris Snyder           80.31%     6    12.24%
David Dellucci         73.62%    13    14.44%
Greg Norton            76.61%    17    17.89%
J.J. Hardy             85.78%     5    13.89%
Derrek Lee             81.04%     8    15.09%
Corey Koskie           77.05%    12    17.14%
Damion Easley          83.13%     9    14.06%
Aaron Guiel            77.69%     7    17.07%
Luke Scott             78.83%    10    14.71%
Jason LaRue            71.65%     8    15.69%
Wes Helms              77.88%    10    14.71%
Freddie Bynum          72.73%     4    15.38%
Ben Johnson            72.95%     4    12.90%
Michael Napoli         68.34%    16    17.20%
Chris Duncan           77.25%    22    29.33%
Scott Spiezio          80.91%    13    13.83%
Corey Hart             75.84%     9    12.16%
Russell Branyan        63.90%    18    22.50%
Dave Ross              71.51%    21    23.86%
Yorvit Torrealba       78.96%     7    16.28%
Ryan Doumit            74.43%     6    14.63%
Marlon Anderson        81.94%    12    13.79%
Daryle Ward            77.90%     7    15.22%
Josh Bard              85.16%     9    15.79%
Carlos Quentin         74.38%     9    18.00%
Joe Borchard           70.60%    10    17.54%
Cody Ross              76.85%    13    14.77%
Adam Melhuse           73.71%     4    14.81%

This is by no means a “magic bullet” list, but I’d consider it one of many starting points for narrowing down possible breakout players. There are certainly a few players on this list such as Josh Bard, Chris Duncan, Luke Scott, and others that appear to be quite promising. It’s also a reminder that injured players shouldn’t be forgotten such as Derrek Lee, Hideki Matsui, and David Dellucci.

If you were to look at the same list last year, out of about 30 players, you’d have identified Frank Thomas, Jim Thome, J.D. Drew, Mark DeRosa, Milton Bradley, Matt Murton, Ty Wigginton, Nomar Garciaparra, Marcus Thames and Curtis Granderson. So, about one third of the players ended up being at least decent to excellent sleepers.

If you’re still with me, we’ll look at one last list of filters, which I’d consider a sort of potential breakout power hitter list with already established players. I’ll filter on players with an above average OSwing, a contact rate between 70% and 85%, a HR/FB greater than 7.5%, and players all under the age of 30.

Name                  Contact    HR    HR/FB
Ryan Langerhans        76.50%     7     8.54%
Jhonny Peralta         73.48%    13     9.22%
Bobby Crosby           77.88%     9     9.09%
Jonny Gomes            70.95%    20    13.33%
Rickie Weeks           73.58%     8     9.09%
Jose Bautista          77.72%    16    11.59%
Chris Shelton          73.46%    16    12.60%
Edwin Encarnacion      80.15%    15    12.10%
Curtis Granderson      70.84%    19    11.66%
Matt Murton            83.57%    13    13.54%
Jeremy Hermida         78.88%     5     6.17%

Last year, using the same filter, yielded a group of 15 batters who hit 190 home runs in 2005 and 266 home runs in 2006. The group included Carlos Beltran, Grady Sizemore, Mark Teahen, Nick Swisher, Brad Hawpe, and others.

The bottom line is, that since stats like OSwing and Contact do correlate so well from year to year, it would definitely make sense to include them in a projection system (instead of me boring you to death with random filters). I’d say Contact is arguably better than using strikeouts, and OSwing is really unlike any of the traditional statistics that would go into projections.


Pitch Location & Groundballs

Last week Baseball Analysts published my article Generalities in Pitch Location, which led Tangotiger to ask the following question:

“…how often does Brandon Webb and his brothers get a GB on balls thrown down and balls thrown up the zone. That is, are they “true” groundball pitchers, who can get batters to hit the ball on the ground, because they can. Or, are they groundball pitchers, as a byproduct of them throwing the ball low?”

First let’s take a look at ground ball percentage by pitch location on a major league level.

MLB GBP Location.png

I don’t think there are too many surprises here. The lower the pitch, the greater the chance that it will be hit on the ground. So, let’s look at what Brandon Webb‘s (extreme groundball pitcher) chart looked like the past two season, compared to say Barry Zito’s (extreme fly ball pitcher).

Webb GBP Location.png

Starting with Webb, we can see that no matter where he throws the ball, there’s a pretty good chance it will end up being a groundball. Zito on the other hand, will have a greater chance of inducing a fly ball despite the location of the pitch.

Zito GBP Location.png

Now if you were to calculate a so called, “expected” groundball percentage based on the pitch locations of balls hit into play for a particular player and the league average groundball percentage for that particular pitch location, you’d see that Webb has an expected GB% of about 48%, while Zito’s is 44%.

All in all, a pretty similar “expected” groundball percentage based on pitch location and major league averages, but in reality the two couldn’t be further apart. Webb’s actual GB% the past two years is about 66% with Zito’s being around 39%.

It would seem, at least in the case of these two pitchers, that their ability (or lack there of) to induce groundballs is not entirely a function of where they throws the ball, but probably reliant on several other factors.


On Baseball Analysts: Pitch Location

Thought I’d mention that I have an article on Pitch Location running this week on Baseball Analysts. Here’s a little teaser….

“How often have you heard a player attribute his success to “throwing more pitches inside,” or heard a manager say a pitcher was “hitting his spots?” Pretty much everyone talks about pitch location, but how often is it actually quantified? Thankfully, our pals over at Baseball Info Solutions tracked the x-y coordinates of nearly all 1.5 million pitches thrown the past two seasons. Let’s start by looking at the average major league pitch locations broken down by batter/pitcher handedness.”

You can find the rest of the article here: Generalities in Pitch Location


N.L Cy Young: Who to Choose?

With the regular season finally over, it’s time to start thinking about who should be the recipient of the National League Cy Young award. A month ago, I thought Chris Carpenter was a shoe in to win for the second straight year, but over the past month, the landscape has significantly changed.

Roy Oswalt won 6 of his last 8 starts to put himself in contention while Brandon Webb righted the ship with a strong September posting a 2.43 ERA including two complete games. Then of course there are the relievers, who aren’t typically in Cy Young talks, but would the Padres be in the playoffs without Trevor Hoffman, or the Dodgers without Takashi Saito? Maybe you could even throw the Mets’ Billy Wagner into the discussion.

Just looking at the three starting pitcher candidates of Carpenter, Oswalt and Webb, they had freakishly similar seasons:

Name             W   L    Inn  ShO  CG   ERA   SO  BB  WHIP   WPA
Chris Carpenter  15  8  221.2    3   5  3.09  184  43  1.07  3.38
Roy Oswalt       15  8  220.2    0   2  2.98  166  38  1.17  4.15
Brandon Webb     16  8  235.0    3   5  3.10  178  50  1.13  3.69

How do you choose between these three? Webb has the most innings and wins. Oswalt has the best ERA and Win Probability Added (WPA). Carpenter has the most strikeouts and the best WHIP. May as flip a coin (a three sided coin). Their offenses all gave them about the same amount of run support too, so you can’t even say one of them should have more wins.

If I had a vote, my personal preference of the three would lean towards Oswalt. If you take away his one relief appearance, his WPA jumps to 4.43, which is nearly one win more than either Webb or Carpenter. Also, he’s coming off back to back 20 win seasons which were certainly Cy Young worthy, but just slightly worse than the eventual winners.

But what about those relievers? Their seasons were pretty similar too:

Name             W   L   SV   BS    Inn    ERA   SO   BB  WHIP   WPA    LI
Takashi Saito    6   2   24    2    78.1  2.07  107   23  0.91  4.09  1.50
Trevor Hoffman   0   2   46    5    63.0  2.14   50   13  0.97  4.04  2.08
Billy Wagner     3   2   40    5    72.1  2.24   94   21  1.11  3.85  1.88

This is also a tough group of pitchers to pick a winner from. Even though Saito had about 20 less saves than Hoffman or Wagner, he still managed to top them both in WPA, not to mention his 107 strikeouts are pretty off the charts. Hoffman was used in the most difficult situations of the three, according to his Leverage Index (LI) and he did lead the majors in saves. Wagner falls a bit short of both Hoffman and Saito, but he still had a stellar season, though probably not Cy Young worthy.

If I had to choose one I’d go with Hoffman since he’s pitched in more pressure packed situations than any of the three and he’s been nothing but stellar all season long. Saito should probably take home the NL Rookie of the Year award, but that’s an entirely different discussion.

So for me at least, it comes down to either Oswalt or Hoffman and I’m seriously torn between the two of them. I really think Oswalt will (and should) win a Cy Young award eventually, but I’d really love to see Hoffman win now, especially in a year where there’s no clear cut starting pitcher. Capturing the career lead in saves, leading his team to the playoffs, and winning the Cy Young award, all in the twilight of his career, sure would make a feel good story.


More Home Runs Than Strikeouts

As I was browsing the new leaderboards, I noticed that Albert Pujols has the 10th fewest strikeouts among qualified players with only 43. That’s pretty damn impressive for a guy who’s hit 45 home runs this season. Actually, it’s a little more than impressive as there’s only been six other players who have more home runs than strikeouts and have hit over 40 home runs.

Name            Season       HR       SO
---------------------------------------------
Mel Ott           1929       42       38
Lou Gehrig        1934       49       31
Lou Gehrig        1936       49       46
Joe DiMaggio      1937       46       37
Johnny Mize       1947       51       42
Johnny Mize       1948       40       37
Ted Kluszewski    1953       40       34
Ted Kluszewski    1954       49       35
Ted Kluszewski    1955       47       40
*Barry Bonds      2004       45       41

* - denotes MVP Season

Only Barry Bonds has accomplished the 40-plus home run season with fewer strikeouts since 1955 and he’s the only one to win an MVP award in the same season. Hitting 30 home runs with fewer strikeouts has been slightly more rewarding in the MVP department and is still a very exclusive club.

Name            Season       HR       SO
---------------------------------------------
Ken Williams      1922       39       31
Lefty O'Doul      1929       32       19
Al Simmons        1930       36       34
Joe DiMaggio      1938       32       31
*Joe DiMaggio     1939       30       20
Joe DiMaggio      1940       31       30
Ted Williams      1941       37       27
*Joe DiMaggio     1941       30       13
Willard Marshal   1947       36       30
*Stan Musial      1948       39       34
Joe DiMaggio      1948       39       30
Andy Pafko        1950       36       32
Yogi Berra        1952       30       24
Yogi Berra        1956       30       29
Ted Kluszewski    1956       35       31

* - denotes MVP Season

Is there a point to this? Not really, but it’s fun trivia and maybe fodder for your MVP discussions.

And speaking of the MVP, a few days ago (September 9th), Ryan Howard briefly overtook Pujols for the major league lead in WPA. Before that, Pujols led the majors in WPA since April 16th (145 days). Last night’s 2-run walk-off double put Pujols back on top by a margin of 0.73 wins.


Carlos Lee and Kevin Mench

A week ago, the Rangers and Brewers swapped leftfielders and a few other players. Texas acquired Carlos Lee and minor leaguer Nelson Cruz for Kevin Mench, Laynce Nix, Francisco Cordero and a minor leaguer. At the time, I thought Cordero (an A reliever most of his career) was the key to the deal, and I called Mench a “poor man’s Lee.”

Most Internet posters seem to think that the Rangers got the best of this deal. For instance, ESPN’s Keith Law opined…

Unless the Brewers have a second move in mind involving Mench, Cordero, or Turnbow, it’s hard to see how this is a good return on arguably the most attractive position player on the trade market.

But the erstwhile MGL, in this thread feels that once you include fielding and baserunning, Mench is actually a better player than Lee — and he’s cheaper to boot. So I thought it would be fun to compare the two. Let’s start with a basic Runs Created graph, showing each player’s Runs Created over their career:

243_1261_OF_cseason_blog_8_20060802.png

You’ve got to say that the Brewers picked a fine time to trade Lee, who is having the best year of his career and will be a free agent at the end of the season. Mench isn’t having as good a year, but his production was very similar to Lee’s prior to 2006.

Breaking down their stats a little, Mench and Lee have exhibited the same level of on-base skill throughout the years…

OBP

..but the difference between the two this year has been their power.

ISO
Almost 18% of Lee’s outfield flies have been home runs, compared to a previous average of about 13%. Given his track record, I’d say it’s highly unlikely he will maintain that rate for the rest of the year.

In comparison, Mench has kept his home run/outfield fly rate at about 11% (aided by his old home park), but his 2006 slugging decline is more related to a higher groundball rate (42% vs. a previous career average of 36%). That could be a disturbing trend, because changes in batted ball rates can signal abrupt changes in a batter’s true performance. At least, that’s my hypothesis. Maybe I’ll test that someday…

If you had compared Lee and Mench at the end of last year, you might have said that Lee has a slight edge in power but not much else. Does this year–particularly Mench’s increase in groundballs–change that assessment? I will leave that to you.

As for fielding, Mench ranked 15th among leftfielders last year and Lee ranked 23rd, according to John Dewan’s Fielding Bible. Lee truly looks like a bad baserunner, however. The Hardball Times Annual gave him -2.8 baserunning runs and Mench received a positive 2.2.

Overall, there appears to be about a 10-run edge for Mench in fielding and baserunning (equal to one win) and prior to this year, you might have rated Lee and Mench relatively even in batting prowess. Add in the fact that Mench is younger and won’t be a free agent for two years, and you might actually believe that Mench really isn’t that “poor” a relation to Lee. In the meantime, watch his groundball rate.


2004 Red Sox-Yankees Win Probability

I’ve received a few requests to do Win Probability charts for the 2004 Red Sox-Yankees ALCS. Enjoy!

2004-ALCS-Small.png

(Click Image for Full Size)

2004_Sox_Yankees_1.png

2004_Sox_Yankees_2.png

2004_Sox_Yankees_3.png

2004_Sox_Yankees_4.png

2004_Sox_Yankees_5.png

2004_Sox_Yankees_6.png

2004_Sox_Yankees_7.png