Francisco Sosa, Silly Parks, and Half-Samples by Nathaniel Stoltz January 28, 2014 If you’re the sort of person who likes to look around minor league statistical leaderboards for under-the-radar performance prospects, the name Francisco Sosa may have emerged on your radar in 2013. Sosa’s 2013 statline has several intriguing numbers: his triple-slash was .315/.397/.529, he clubbed 20 homers and ripped 35 doubles, and he also swiped 30 bases. No other player in the minor leagues attained that combination of doubles, homers, and steals. Sosa doesn’t really show up on prospect lists, though, for a few reasons. For one, he was a 23-year-old left fielder in Low-A, on the wrong side of the defensive spectrum and well older than most legitimate prospects who have yet to sniff the upper minors. Second, 2013 was the first time in his six-year career that he posted remotely interesting numbers–only twice before had he managed an on-base percentage over .310, and only once had he slugged over .400. Finally, he played in the silliest home park in organized baseball. It’s well-documented that Asheville’s historic McCormick Field is a hitter’s paradise; the park is particularly noted for its Green Monster-esque monolithic right-field wall that is just 297 feet from home plate at the foul line. That’s hitter-friendly enough, but it might be the third-most shockingly easy-to-reach dimension of the park. The first two are that right-center is a mere 320 feet from home and doesn’t feature the high wall and that no part of the outfield wall is more than 372 feet from the plate. It’s debatable which park in organized baseball is the most beneficial to hitters. Some say Asheville, but others point to Lancaster, High Desert, Colorado Springs, Albuquerque, and Las Vegas, with occasional others mentioned as well. Still, it’s hard to debate that Asheville’s beyond-cozy dimensions make it the most uniquely easy place to hit. There’s no other park that’s going to reward hitters for popping a ball 321 feet to right-center or 373 to dead center, regardless of the thinness of the air or the patterns of the wind. Those who are familiar with McCormick Field and its oddity rightfully tend to discount statistics compiled by Asheville players. But what can we do to evaluate a player like Sosa statistically? There are a few different tactics typically taken. The first is essentially throwing our hands up and reserving judgment on all players in this situation, waiting for them to ascend to a higher level and fairer environment before definitively ruling them in or out of the prospect map. This strategy obviously comes with significant drawbacks, especially in the case of the Colorado system, which features inflated environments all the way to the majors, with only moderately hitter-friendly Tulsa in the Double-A Texas League the most normal of the lot. One cannot just pretend Colorado hitting prospects don’t exist, after all. A second way to try to make sense of the numbers is through park factors. Asheville has a park factor of 160 for right-handed home runs, meaning that a right-handed hitter who spends the year in a Tourists uniform will likely hit about 60% more homers than he otherwise would have, or that a Tourists player would hit only 5/8 as many home runs if he played in a neutral home park. That would put Sosa at 12.5 homers for 2013. Likewise, the 145 park factor for doubles would give him 24 two-baggers instead of 35. Such an analysis is fairly cumbersome to do by hand, though, and further, it comes with its own set of problems. Sure, Sosa might project to hit 12 or 13 homers in a neutral park, but what if all 20 of his 2013 homers went 400+ feet? Then it wouldn’t matter if they were hit in a more pitcher-friendly environment–they’d clear any and all fences. Home run totals, and even doubles totals, in a single season tend to be small enough numbers that adjusting them becomes problematic for this reason. We can probably safely say that Sosa would have collected fewer extra-base hits in 2013 had he worn another SAL uniform, but it’s tough to say how many fewer. The numbers are more abstract than the visual, so it becomes difficult to be comfortable with reconciling them. And so many people opt for a third strategy that’s more concrete–simply citing the road numbers of players who play in extreme parks. This becomes a particularly popular strategy when those numbers tell a story, something they readily do in the case of Francisco Sosa: McCormick Field: 269 PA, 27 2B, 2 3B, 15 HR, 17 SB, 37 BB, 50 K, .374/.466/.709 Road: 262 PA, 8 2B, 0 3B, 5 HR, 13 SB, 20 BB, 75 K, .256/.325/.355 That’s a pretty vivid story, and the story is that Francisco Sosa is not a prospect: he’s a 23-year-old left fielder in A-ball who couldn’t crack a .700 OPS in parks of remotely reasonable size. That’s not a prospect, nor is it even an organizational player–it’s someone who’s teetering on the edge of being out of pro ball. Or is it? The problem with this method is that we’ve now gone from analyzing 531 plate appearances of Francisco Sosa to analyzing just 262 of them. 531 PAs isn’t exactly a robust sample size, but it’s certainly more reliable than 262. Any knowledge of MLB players should tell us that: we’ve all seen players come up and set the world on fire for a couple months, only to then fall on their faces (Jeff Francoeur, Brennan Boesch, etc.); likewise, we’ve seen players struggle for half-seasons and then catch fire. And we’re still faced with the fact that, while we know Asheville is unfair, we don’t know what exactly the hits in question were or how they would have played at more standard parks. We do know that Sosa’s power output was not the only thing affected by road trips–just look at the K/BB difference–so maybe some of the split is a product of him feeling more comfortable at home, no matter where home is. The previous year, playing in short-season Tri-City (a more pitcher-friendly park), Sosa hit .280/.357/.456 at home and just .270/.352/.333 on the road, with another wide K/BB split (23/13 to 34/11). We know that the massive home-road split in 2013 is probably part comfort level, part park inflation, and part luck/small sample size, but it becomes difficult to partition the factors and ascribe percentages of the split to them. So, there are definitely drawbacks of just citing the road numbers of Sosa or a similar player, but how problematic are they? I decided to test it out. Baseball-Reference now has minor league splits going back to 2008, so I went back through all of Asheville’s full-season hitters from 2008 to 2012 and examined how their home numbers, road numbers, and overall numbers translated to what they did in High-A Modesto the following season.* It’s a sample size of a mere 36, but still, it yielded some interesting findings. *For the sake of keeping things on an even plane, hitters who were not promoted to Modesto were thrown out of the sample. For hitters who spent multiple years in Asheville, I used the statistics of their final season before the promotion. First, it should come as no surprise that the hitters fared much better at McCormick Field (.297/.368/.468 on average) than on the road (.252/.320/.381). As one might expect, the Isolated Discipline (OBP-AVG) of the batters stayed pretty consistent when they went on the road (unlike Sosa’s, actually), but both batting average and Isolated Power dropped preciptously. Unsurprisingly, the hitters’ numbers when they moved to Modesto (.255/.328/.389) much more closely mirrored the Asheville road numbers than they did the McCormick Field numbers or, indeed, the overall average statline of .275/.345/.425. That in itself is useful–it means that, on average, we can expect an Asheville hitter to lose roughly 20 points of average and OBP and 40 points of slugging when he moves up to High-A. Maintaining Asheville production in Modesto isn’t just holding one’s own; it’s a fairly significant step forward on the statistical front. None of that should come as a surprise–it all follows pretty intuitively from what we know about park factors. But does that mean we can take the Asheville road numbers as the true arbiters of talent? Let’s take a look at each of the triple-slash numbers. Say we want to predict the batting average of an Asheville hitter moving to Modesto. What has the strongest predictive value: the batter’s overall batting average as a member of the Asheville team (home and road), his batting average at McCormick Field, or his batting average on the road. Here’s a bar graph of the relative r^2 values, with overall on the left, home in the middle, and road on the right: It’s unmistakable–the overall batting average is easily the best predictor. The batters’ road averages were more predictive than their performances at their home bandbox, but both had some predictive value, and combining their power into a larger sample yielded easily the strongest correlation. What about OBP? Whoa. This one shocked me too. Asheville hitters’ road OBPs have nearly no correlation to how often they get on base when they move up to Modesto, so much so that their inclusion actually drags the overall totals down to being less predictive than the home ones! I’m skeptical that this would continue to hold true if we had split data for several years before 2008 and could put together a more robust sample, but it serves to rather dramatically underscore the point that citing road data as true-talent-level performance might create more problems than it solves. Of course, of the three triple-slash stats, OBP isn’t what McCormick Field would seem to most profoundly effect–it’s slugging. How did that fare? So in the three triple-slash stats, the road stats beat out the home ones in average, lost in OBP, and virtually tied in slugging. Overall, they didn’t really prove any more predictive than the statistics compiled in the silliest park in organized ball, let alone the overall numbers that boasted twice the sample size. While it seems like a good idea, then, to conclude that Francisco Sosa was really a below-average Low-A hitter rather than an excellent one, this investigation reveals some problems with jumping to that conclusion. Even though the Tourists’ home park skews data in some challenging ways, halving the club’s players’ numbers in the same of removing bias actually seems to darken more than it illuminates. For example, in 2011, Corey Dickerson posted an even more dramatic split than Sosa–.354/.418/.844 at home and .193/.280/.363 on the road–he’s now a solid MLB outfielder. Current Rockies third baseman Nolan Arenado hit a meek .258/.278/.435 away from Asheville in 2010; now he’s a three-win player. Conversely, there were several players who hit well away from McCormick Field (Jared Clark, Will Swanner, Lars Davis, Jordan Ribera, Trevor Story), who struggled to adjust to Modesto. Home/road splits for minor leaguers are certainly interesting, and they can serve as a warning about who may or may not translate well. Still, as this analysis shows, a warning is far from a guarantee. No matter the extremity of the environment, cutting a sample in half is counterproductive to ascertaining the most complete statistical picture of a player. Don’t run away from extreme-park data just because it’s inconvenient and messy–“fair” sixty-game samples are far less polished than they may appear.