Archive for Research

Hudson & Perez: BIS & Pitchf/x

I thought it’d be interesting to take a look at how Baseball Info Solutions pitch type data matches up with the Pitchf/x’s pitch type identification data for just the two starters in the Nationals Opener. Before I begin, Pitchf/x has a new field in their data called “type_confidence” which appears to be a measure of how accurate their classification is. All the Pitchf/x aggregates were found on Josh Kalk’s player cards.

Tim Hudson:

Fastball - BIS: 70.5% (89.3mph) |  Pitchf/x: 71.8% (91.9mph)
Slider   - BIS: 19.2% (84.5mph) |  Pitchf/x: 18.0% (86.9mph)
Changeup - BIS:  5.1% (78.2mph) |  Pitchf/x:  7.7% (80.5mph)
Cutter   - BIS:  2.6% (86.5mph) |  Pitchf/x:  2.6% (89.1mph)
Splitter - BIS:  2.6% (78.0mph) |  -------------------------

Upon closer inspection the pitch logs are incredibly similar for Hudson, with BIS and Pitchf/x disagreeing on a mere 3 pitches. Two pitches classified as changeups by Pitchf/x were classified as splitters. The third was classified as a fastball by Pitchf/x, but as a slider by BIS. That third pitch in disagreement had the lowest Pitchf/x type_confidence of any pitch in the game.

Odalis Perez:

Fastball - BIS: 59.7% (86.0mph) |  Pitchf/x: 78.8% (88.7mph)
Cutter   - BIS: 28.4% (83.6mph) |  -------------------------
Changeup - BIS: 10.4% (79.3mph) |  Pitchf/x:  4.4% (81.5mph)
Curve    - BIS:  1.5% (74.0mph) |  -------------------------
Slider   - -------------------- |  Pitchf/x: 14.5% (86.5mph)
Splitter - -------------------- |  Pitchf/x:  2.9% (85.3mph)

Needless to say, things were not as rosy for Perez. There were 27 pitches in disagreement, not counting 3 pitches that disagree because of non-identification on either BIS or Pitchf/x part. 9 pitches that BIS classified as cutters were classified as sliders by Pitchf/x. The average type_confidence for the other 18 pitches in disagreement was .56, where it was .68 for pitches in agreement.

For what it’s worth, Perez claims to throw a fastball, cutter, changeup and curveball.

That’s all I got for now. I haven’t had time to take a hard look into why the pitches might have been classified differently by using any of the break data, and nothing is standing out as different about those pitches at first glance.

Update: Josh Kalk over at The Hardball Times took an indepth look at Pitchf/x’s pitch classification on Tim Hudson’s first start.


Spring Training and Strikeouts

First off: Welcome back baseball! Today is the first day of the Cactus and Grapefruit leagues which means we get our first taste of tangible 2008 baseball stats, sort of.

I’m a big proponent of taking spring training stats with a grain of salt. Spring training is the time to examine opening day position battles and try and get a glimpse of which players are healthy. But can you gain any insight into how a player’s regular season will be based on how he performs in spring training?

For this particular exercise, let’s look at pitcher’s strikeout rates (K/9). If you look at the correlation between K/9 in 2006 and 2007, you get an R^2 of about .58, which is pretty strong correlation. When you look at the correlation between spring training 2007 and the 2007 regular season, the R^2 drops to .32. So as a whole, 2006 is a much better indicator of a player’s 2007 strikeout rate than spring training.

The most innings a pitcher will pitch in spring training is roughly 25, so we’re looking at a fairly small sample size which is problematic. But what if a pitcher during spring training has shown extraordinary improvement in his strikeout ability?

For instance, Rafael Betancourt’s K/9 in 2006 was 7.6 and in spring training it jumped to an impressive 12.5. His was the biggest jump from 2006 to spring training and he did indeed show improvement during the 2007 regular season with a K/9 of 9.1.

Of the 161 pitchers sampled, whichever direction their K/9 moved in during spring training when compared with the 2006 season, 63% of them had their K/9 move in that same direction when comparing 2006 with the 2007 season. The correlation of the difference between 2006 and spring training and the difference between 2006 and 2007 had an R^2 of .19.

So there is something there, but it’s not anything you want to bet the bank on. By no means would I suggest looking at a player’s huge K/9 jump in spring training and thinking that it would definitely translate into 2008 success. Oh, and while I’m at it, spring training ERA should not just be taken with a grain of salt; it should be ignored completely.

As a side note, we’ll be carrying 2008 spring training stats starting soon for your own amusement.


Beckett Wins ALCS MVP?

After last night’s Red Sox game 7 victory, I was rather curious to see who would win the ALCS Most Valuable Player award. Without looking at the numbers, I thought it should go to Kevin Youkilis. I thought there was a chance it would go to Manny Ramirez and also a chance it could go to Josh Beckett. When Beckett was announced the winner, I was a little surprised and went to check the numbers.

It turns out that Josh Beckett did lead all players in Win Probability Add (WPA) in the ALCS. In two starts his WPA was .516 wins compared to Youkilis’ .203 wins. Manny Ramirez’s WPA was just slightly below Beckett’s at .483 wins. Jon Papelbon and Hideki Okajima also were also contenders for the WPA title with .467 and .286 wins respectively.

If you look at things just in terms of run production taking the context out of the situation. Youkilis was indeed the most productive player with a Batting Runs Above Average (BRAA) of 9.3 runs. Ramirez came in second with 6.2 runs, while Beckett was the next best with 4.7 runs.


Making Up For Ortiz (Sorta)

As of today, the Red Sox have the best record in baseball. If you look at their Win Probability numbers, their batting has contributed 4.38 wins, their starting pitchers 1.54 wins and their bullpen 4.08 wins. The bullpen which was considered a big question mark to begin the season has surprisingly been just as valuable as their offense, much in thanks to Boston’s lesser known Japanese import, Hideki Okajima.

For the past two seasons, David Ortiz has racked up more WPA than any other player in baseball by adding an incredible 17 wins to his team. About one-third of the way into this current season, Ortiz has accumulated a mere 1 win, but still leads all Red Sox batters in WPA.

Last season, David Ortiz had 15 hits worth more than .2 wins, not to mention a pair of home runs worth .78 wins and .90 wins. This season he has no hits over the .2 wins mark. That’s not to say he’s not having an excellent season. When you take the context out of his wins using WPA/LI he’s 3rd in baseball with 1.93 wins, but the hits just haven’t been as timely as last season. The huge disparity between his WPA and WPA/LI gives him the 5th worst “Clutch” with -.88 wins.

So who has been getting the big hits for the Red Sox if their previously “clutch” star hasn’t? Let’s take a look:

On May 13th, newcomer Julio Lugo was the catalyst for the biggest play of the season. With 2 outs in the bottom of the 9th, down by 1 and the bases loaded, Lugo singled in the tying run while Erik Hinske scored on an error. The whole thing was worth .718 wins and capped off an incredible 9th inning rally (including a Jason Varitek double worth .343 wins) that overcame a 5 run deficit. At the start of the inning the Sox had a mere .9% chance of winning the game.

In a classic Yankees-Red Sox battle on April 20th, Coco Crisp tripled in two runs to tie the game at 6-6 off Yankees closer Mariano Rivera. This triple was worth .472 wins. Immediately following Crisp’s triple, Alex Cora singled allowing Crisp to score the go ahead and final run of the game. In contrast Cora’s hit was worth .123 wins.

On April 26th, Orioles closer Chris Ray (who also gave up the hit to Lugo on May 13th) gave up a grand slam to Wily Mo Pena which put the Sox up by 3 to win the game. Down by 1 at the time, the home run was worth .43 wins.

Alex Cora on April 19th, in the top of the 9th with 1 out, tripled, allowing Julio Lugo to score the go-ahead run. While not quite as big as Crisp’s triple on the 20th, this one was still worth .373 wins.

Erik Hinske’s two run blast in the 7th inning of a tie game back on May 17th rounds out the top 5 most important Red Sox hits so far this season. Manny Ramirez owns the next two biggest which were both worth juts over .3 wins.

David Ortiz is no where to be seen on this list with his biggest hit worth .197 which came on April 25th in the 7th inning of a tie game. In the 9th inning or later in a game, Ortiz is actually a -.213 wins while in 2006 he was 2.34 wins; far and away the most in baseball.

I’m sure as the season goes on, things will start to even out and there will be some big hits here and there, but fortunately for the Red Sox, they haven’t exactly needed Ortiz to be the savior he’s been the past two seasons.


Fly Balls and Groundball Pitchers

In today’s Hardball Times, Matthew Carruth did an analysis on extreme groundball pitchers and how they do not really give up more home runs-per-fly ball (HR/FB) than your typical pitcher. There has been some thought that extreme groundball pitchers do tend to give up more HR/FB because they’re only allowing fly balls when they throw a bad pitch, thus making it easier to hit the ball out of the park. Carruth’s analysis even suggests that the opposite might be true, though the correlation was quite weak.

I decided to run a similar analysis using data from 2002 to the present. The average HR/FB rate during that same period is 10.7%. If we look at the 2002-present totals of all pitchers with a groundball percentage (GB%) greater than 55% and more than 100 innings pitched, they have an average HR/FB of 12.2%. That 12.2% is not a weighted average, it’s just a simple average of each qualified pitcher’s HR/FB.

Using the same method, if you look at pitchers with a GB% less than 35%, they have an average HR/FB of 9.9%.

Now I’ll admit this is a much simpler approach than the route Carruth took, but the results seem to be considerably different and I wondered why this would be the case.

First off, if you use my approach with the 20 pitchers Carruth selected in each group, you come to the same conclusions as Carruth did. This leads me to believe the batted ball data from Retrosheet (which Carruth used) and the batted ball data from Baseball Info Solutions don’t quite match up.

Just taking a quick look at the top 10 players, their GB% don’t match. For exampe, Retrosheet has Brandon Webb with a GB% of over 70% and BIS has him at 65%. That’s the first difference.

The second difference is that the time period he used was between 1988 and 2006 where the HR/FB according to Retrosheet was 13.57%. This is considerably different than the HR/FB of 10.7% that Baseball Info Solutions reports between the 2002-present time period. Using the 13.57% for all pitchers over a 18 year period where there’s been some considerable influx in home run totals is probably going to cause some issues as well.

To me it seems there is at least some evidence that extreme groundball pitchers as a group do give up more HR/FB than your typical pitcher. The two most extreme groundball pitchers in the past 5 years have an average HR/FB of 15.5% (Brandon Webb) and 13.1% (Derek Lowe).

The other option is it really has nothing to do with GB% at all (sampling size issue maybe?) and it’s just that some extreme groundball pitchers tend to have higher HR/FB. In their case instead of regressing to the league average, you’d just regress to the player’s actual average; treating it more like you’d treat a batter’s batting average on balls in play (BABIP) than you would a pitcher’s BABIP.


Is WPA Predictive for Batters?

One of the biggest complaints I see about WPA is that it’s not predictive. The mere mention of it’s non-predictability seems to be enough for many to write it off as a mere toy used by some of stats community.

So let’s see how it actually correlates from year to year compared to the stats we all know, like AVG, OBP, SLG, and OPS. I’ll throw in Batting Runs Above Average for fun too.

Looking at the r-squared from 2005 to 2006 for batters with over 300 plate appearances, here’s how WPA stacks up against the regulars:

AVG: .12
WPA: .27
BRAA: .35
OBP: .36
OPS: .36
SLG: .38

Here’s the same deal, 2004 to 2005.

AVG: .14
WPA: .24
OBP: .27
OPS: .30
BRAA: .31
SLG: .33

It’s true, WPA doesn’t correlate as well from year to year as OBP, SLG, or OPS, but it does have some correlation from year to year. In 2004, a players OBP was almost indicative of his 2005 OBP as his 2004 WPA was of his 2005 WPA. Yet, that wasn’t quite the case in 2005 to 2006. BRAA which is calculated by using Run Expectancy on a play-by-play basis (much like WPA uses Win Expectancy), holds its own against the regulars.

Anyway, the point is, let’s stop using the argument that WPA isn’t predictive as a crutch, because it does actually show some correlation from year to year.


Unlike 2006: A-Rod Wins the Game!

In yesterday’s 10-7 victory win against the Orioles, Alex Rodriguez’s game winning grand-slam was the second biggest hit he’s had in the pats 6 years according to Win Probability Added (WPA). It brought his team from a mere 28.8% chance of winning to a complete victory.

20070407_orioles_yankees_0_blog.png

In 2006 however, Rodriguez was about as far from being a clutch hitter as you could possibly get. But before we delve into why, let’s get familiar with two stats: REW and OPS Wins. REW is calculated much like WPA, except it uses Run Expectancy (as opposed to Win Expectancy), which doesn’t take the score or inning into account. It does however account for how well a batter does with runners on base. OPS Wins on the other hand is how a player would do in a completely context neutral environment.

Looking at 2006, Rodriguez’s 3.18 OPS Wins and his REW of 3.34 wins are fairly close, but in general he did a little bit better than expected with runners on base. When you take into account the inning and the score (or late and close situations), he accumulated just 1.18 wins. Basically he performed much worse than he should have in high leverage or “clutch” situations. This is measured by a stat called “Clutch” which is the difference between WPA and OPS Wins once leverage adjusted. Rodriguez’s Clutch was -2.16 wins; the third worst among qualified players in 2006.

Last season was the worst season he’s had in the past 5 years in terms of clutch hitting and probably his worst season ever. Yet in his previous two seasons with the Yankees he was actually a clutch hitter with a Clutch of .76 wins in 2004 and .41 wins in 2005.

Since joining the Yankees, he’s still the 9th most valuable player in baseball according to WPA. If we look at just the Yankees batters since 2004, he ranks first in terms of WPA.

Batter                WPA
Alex Rodriguez      11.27
Derek Jeter         10.54
Gary Sheffield       8.91
Jason Giambi         7.61
Hideki Matsui        6.46
Jorge Posada         2.92
Bobby Abreu          1.96
Johnny Damon         1.76
Tony Clark           1.05
Tino Martinez         .77

Whether you like him or not, he has been the most valuable Yankees batter according to WPA the past 3 seasons including the few games played this season. Of course, Mariano Rivera bests him by half-a-win with a WPA of 11.73.


And That’s Why They Play the Game

The Nationals just had a great comeback against the Marlins this afternoon. The final game graph (unofficially) looks like this:

270404120_marlins_nationals_334479_lbig_blog.png

The Marlins newly acquired Jorge Julio pretty much blew the entire game for them with a WPA of -.903 wins. The Nationals low point in the game was in the bottom of the 6th with 2 outs when they had a mere 3.7% chance of winning the game.

But let’s draw our attention to one very specific play at the end of the game: the sacrifice bunt when the score was 6-5 in the bottom of the 9th. Before the sacrifice bunt there was a runner on first with no outs. The Nationals at the time had a 34.4% chance of winning. Manny Acta, the Nationals new manager, had Felipe Lopez hit a sacrifice bunt. It was successful, but it didn’t improve their chances of winning the game. Instead of increasing their chances, it actually decreased it by 6% to give the Nationals a 28.8% chance of winning.

If you were watching the game on FanGraphs, you got to see exactly why the following is true:

To quote The Book: “With a non-pitcher at the plate, and a runner on first and no outs, advancing the runner in exchange for an out is a terrible strategy. It significantly reduces the RE in almost any run environment. It also reduces the WE in almost any run environment, even late in a close game.”

Fortunately for Nationals fans (while unfortunate my hopes and dreams), they ended up winning anyway.


Pretty Good Daisuke, Pretty Good

After causing a major panic from his March 11th “bombing”, Daisuke Matsuzaka threw quite the gem yesterday. He struck out 7, while allowing only 1 walk and 1 hit in 5 and 2/3’s innings of work against the Pirates. This no doubt gave Red Sox fans that warm fuzzy feeling that was sorely lacking the 10 days in between Daisuke’s starts.

While we learned last week that his March 11th start was fairly typical of high priced pitchers, the 7 strikeouts he recorded yesterday was a rare feat indeed. There were only nine times this spring that a pitcher has struck out seven or more batters:

Ian Snell – Way back on March 6th, Snell threw 3 innings while striking out 7. Snell showed a lot of promise last year and this spring he’s showing why he’ll be the ace of the Pirates pitching staff (even if no one knows who he is).

Rich Harden – On March 15th the oft-injured Harden threw just 3-plus innings and struck out 9. Then he struck out 7 on March 20th in 5 innings. Overall, Harden has struck out 25 batters in a mere 13 innings of work this spring. Please stay healthy this year! There’s nothing I enjoy more than watching the Ks pile up.

Oliver Perez – He matched Harden on March 15th with 9 strikeouts, but it took him 5 innings to do it. He’s having a fine spring and he was dazzling to watch just three years ago. Perhaps he’ll find some of his 2004 magic in the Mets rotation this season.

Aaron Harang – Three days after Harden and Perez fanned 9, so did Harang. His spring has not been so stellar. He’s given up 28 hits which sets his H/9 at a mere 17-something. On the bright side, he’s still striking out a batter-an-inning, and has given up zero walks.

Scott Kazmir – He struck out 7 in five plus innings of work on March 18th. Yet he’s walked 6 in 12-plus innings so far this spring. It will be interesting to see if he can recapture the much improved control he exhibited in 2006.

Brett Myers – He also struck out 7 on March 18th. He’s one of those guys who took the off-season “seriously” by shedding 25 pounds off his frame. He’s mentioned that he’s been a bit “uncomfortable” pitching at his new weight, but the discomfort isn’t showing in his stats.

Josh Beckett – The 2006 home run king struck out 8 on March 20th. He’s only given up a single home run in 16-plus innings this spring. He’s also given up just a single walk while he’s struck out 17 batters.


Daisuke Matsuzaka – You’re Not Alone

All anyone can seem to talk about today is how the 103 million dollar pitcher, Daisuke Matsuzaka, was “bombed” yesterday in a spring training start. He gave up 2 home runs, to two “non-roster” players, and ended up surrendering 4 runs (3 earned) in 4 innings, which raised his ERA from 0 to 3.86. He also struck out 3 and walked none.

What about the highly paid pitchers not named Matsuzaka? Surely some of them had an equally atrocious day. Here were the highlights from Sunday’s action:

Brad Penny ($8.5 Million): He gave up 9 hits and 4 runs yesterday in only 3 innings. He also struck out none and has a 12.86 ERA this spring.

B.J. Ryan ($9.4 Million): 1 inning, 4 hits, 3 runs, 1 strikeout.

Freddy Garcia ($9 Million): 3 innings, 5 hits, 3 runs, and 2 walks. He didn’t strike anyone out.

Mark Buehrle ($9.5 Million): 4 innings, 6 hits, 6 runs, 2 walks, and 4 strikeouts. His ERA stands at 11 this spring. It’s only 1.5 higher than he makes in millions.

And that was only yesterday. On Saturday:

Barry Zito ($18 Million): 4 innings, 5 hits, 3 runs, 2 walks and 4 strikeouts.

Everyone panic!