Thinking Your Way Through Spring Training Statistics

February 11, 2013

Players have begun to report to their various camps, and while whole entire teams aren’t yet together, spring training is officially on the doorstep, preparing to knock. For some, this is the most wonderful time; for others, this is a time most miserable. Soon, there will be practices, and then there will be games. When there are games, there will be statistics, and when there are statistics, there will be attempted interpretations of the statistics. There’s no such thing as a baseball number that goes by un-analyzed.

On countless occasions before you’ve probably seen attempted correlations between spring-training statistics and regular-season statistics. What we care about, after all, are those statistics that might be meaningful, so it’s important to check the spring numbers for meaning. Is there anything in there? Is it possible to identify imminent breakouts or collapses? Plenty of people have examined plenty of correlations. I know I’ve done it myself, and I didn’t come up with the idea. It comes around every year.

But some people don’t like numbers. Granted, those people probably aren’t reading FanGraphs, but we don’t necessarily need to open a spreadsheet. It’s possible, I think, to just think your way through spring-training statistics, and you’ll be left with the right idea. Spring numbers are of limited utility after the spring is over. Following is a thought process.

(1) There should be some correlation.

Ultimately, baseball is baseball, and spring-training games are played between players at high professional levels. The good players will most often look like the good players, while the worst players will most often bring up the rear. There’s nothing particular about spring training that might make Mike Trout play like Lucas Duda and Lucas Duda play like Mike Trout. Power hitters will be power hitters. Strikeout pitchers will be strikeout pitchers. We should not expect for spring-training statistics to be completely and utterly random, because spring baseball isn’t entirely dissimilar from summer baseball.

(2) But it’s not even a months’ worth of data.

Every year, people need to be reminded not to buy into spring numbers. Every year, people need to be reminded not to buy into April numbers. It’s clear that spring training is a limited sample size, but we’re not even talking about a sample size of a month. Based on plate appearance and innings counts, we’re basically left with two or three weeks’ worth of numbers. The league leader in plate appearances last spring had 95. The league leader in plate appearances over 2012’s last 30 days had 133. The league leader in innings last spring had 30.1. The league leader in innings over 2012’s last 30 days had 42.2. Would you ever attempt to reach conclusions based on two or three weeks of performance? You’re going to come across some signal and so, so much noise.

(3) And you have to consider the quality of competition.

Camps are populated by major-league players and minor-league players. The scheduled are unbalanced, and the players who are playing the first few innings of a game are often way different from the players who are playing the last few innings of a game. These are not regular major-league games between groups of 25 major-league players. Younger guys or worse guys get shuffled in, and over such a limited sample size this might make a real difference.

(4) Then there’s the matter of players who are working on specific things.

Maybe this hitter is trying to learn to go the other way. Maybe this pitcher is focusing on developing his changeup. Not every player is going to approach every plate appearance the way he would during the regular season, and this applies to both players and their opponents. In part, spring training is about getting ready for the season, but it’s also part instructional, which can influence the numbers.

(5) Additionally, the players aren’t 100%.

Kind of the whole point is that the players are working to get to 100%, or close to it. Some people are of the opinion that, early in camp, the pitchers are ahead of the hitters, because the hitters are cold. We see evidence that early-season pitch velocity is lower than midseason pitch velocity, so in spring pitchers are still getting stretched out. Spring-training games are organized, competitive warm-ups, practice, and the players aren’t yet what the players will be.

(6) And we should note the unexamined ballpark environments.

To whatever extent it matters, there might be park-factor influence, but to my knowledge no one has bothered to investigate Arizona and Florida spring-training park factors. I wouldn’t even really be interested in the results, because I’m not real interested in the overall spring-training results. But because we don’t understand which places might be hitter-friendly or pitcher-friendly, we don’t understand quite what to do with the numbers. It’s a small thing, but it’s a thing.

Let’s think of spring-training statistics as being based on about two to three weeks’ worth of playing time. In the regular season, you would expect that sort of sample size to show correlations of X to overall season performance. It’s a very small sample, but it’s also a somewhat meaningful sample. In spring training, you would expect that sort of sample size to show correlations of < X to overall season performance, because of the reasons noted above, and possibly others too. X is already low; in spring training, it’s only lower. You have the near-randomness of small-sample data, and then still other factors to consider as well.

According to the MLB.com leaderboards, last year’s leader in spring training batting average was Munenori Kawasaki. Albert Pujols tied for the league lead in homers, then he immediately had that extended home-run drought. Brennan Boesch and Ryan Raburn had big springs, followed by miserable seasons. In spring, Francisco Liriano posted nearly seven strikeouts for every walk. In spring, Jarrod Parker finished with more walks than innings pitched. Nobody allowed more runs than Edwin Jackson. Nobody posted a lower ERA than Luis Mendoza.

You see good players in the leaderboards, and also mediocre players. Sometimes, you can see in retrospect that a player gave a sign as to what his season would be like, but most of those would-be signs don’t pan out. What’s most interesting about spring-training performance is scouting observation. Scouting observation leans less on a sufficient sample size, and it matters if this guy is throwing harder than usual, or if this guy is suddenly cranking homers to the opposite field. That’s the part of spring training worthy of interpretation and investigation. The rest? The rest are spring-training statistics. You should know by now what to make of spring-training statistics.

BAL	CHW	ATH
BOS	CLE	HOU
NYY	DET	LAA
TBR	KCR	SEA
TOR	MIN	TEX

ATL	CHC	ARI
MIA	CIN	COL
NYM	MIL	LAD
PHI	PIT	SDP
WSN	STL	SFG