Sal Perez At 198
[Author’s note, Saturday, July 7, about 12:30 PM EST: I changed some of the formulations in the post that I felt were unclear, or, in some cases, incorrectly expressed. The essential point remains the same, but I just wanted to note this in case one wonders what some of the comments are about. I note this so no one feels like a “fast one” has been pulled with the changes.]
Despite some shaky la1te moments from their usually excellent bullpen, the Royals defeated the Blue Jays 9-6 last night behind their usual combination of a “That was good? Or okay? At least he didn’t give up 8 runs” start from Luke Hochevar, a barrage of singles, and a random home run from Yuniesky Betancourt. The primary engine of the singles train was young catcher Salvador Perez, who hit four singles in five plate appearances.
Perez has been on fire since returning from the disabled list (due to an injury in Spring Training), and his current 2012 line stands at .425/.425/.725 (216 wRC+). Of course, that is over only 40 plate appearances, but hey, if you look at his major-league line from 2011 (.331/.361/.473), he does seem to have the ability to hit for a high average with some pop. Over parts of two seasons, Perez has a .351/.374/.527 career line.
That is only 198 total career plate appearances. It appears that a strange phenomenon might have taken hold among some observers. I suspect that if Perez gone to the plate 198 times in a single season, it might have received less attention. However, since the sample has been broken up over two seasons, some people seem to have bought the idea that his hot (belated) start to 2012 somehow confirms that 2011 was “for real.” That seems obviously silly — we are still talking about less than 200 plate appearances. Perhaps this will sound just like another “small sample size” post, but nonetheless, let’s see we can glean from Perez’s performance at the plate in the majors so far and see what we can learn about him and about how we use numbers.
There is little doubt that Perez is a pretty exciting young player with plenty of potential. He was a hot sleeper prospect after 2010 with a reputation for excellent defense (what presence!) and a bat that had some potential. Nonetheless, many understandably questioned his promotion to the majors in 2011 given his less-than-impressive showing with the bat at AA and his very brief (49 plate appearances) sojourn at AAA. But however one parses the numbers, few would argue that the 22-year-old Perez has looked overmatched at the plate since getting called up. The Royals did the smart thing and signed him to a contract during the off-season that is basically a no-lose proposition for the team.
Let’s take a quick look at that career line: .351/.374/.527. There is some power there, but the main value as come from batting average, especially on balls in play. I do not intend to pick on anyone, but if one thinks that saberists are immune from the seductions of a small sample of batting average, think again. Leaving that aside for a moment, there are two other lines from the first approximately 200 plate appearances of two different players’ careers:
Player A: .318/.343/.516, 201 PA. Obviously a future stalwart. Like Perez, not too many walks, but he obviously has a gift for making contact, and he’s got some pop, too!
Player B:.196/.251/.246, 197 PA. Nothing here to like, really. Hopefully he can play catcher, if he’s really good with the glove and improves with the bat, he might have a future as a journeyman backup.
[/Neyer’d]
Player A is Mike Aviles in 2008. Aviles got hurt in 2009, then dog-housed and jerked around by the Royals in 2010 and 2011 until he was traded to the Red Sox for nothing (and most Royals fans thought it was good riddance), but hey, the team had to make sure Johnny Giavotella was not blocked! “Don’t worry, Yuni is hardly going to play at all!”
Player B is Mike Moustakas last season. I assume that all the people elevating Perez to savior status for the Royals now wanted to give up on Moustakas after that miserable start? Obviously, those first 197 plate appearacnes told us all we needed to know: dude obviously can’t hit.
That is not to say that things have turned out completely opposite from what the first 200 plate appearances might have led one one to think in either case: Aviles is a useful player, and while Moustakas is having a very good year (and really should have made the All-Star Game), there are plenty of chapters left to be written in his story. The point is simply that their first 200 or so plate appearances really told us very little about what sort of hitters they would turn out to be.
Why should we have any more confidence in the predictive value of Perez’s first 200 or so plate appearances, even if he is only 22? If you have read The Book, you may remember that at 220 plate appearances, a player’s observed wOBA is regressed halfway to league average. That does not mean at 220 plate appearances we have a very good idea of a player’s true hitting talent — regressing halfway to league average still means that regression is doing “half” the work, there is tons of uncertainty there. I would take any projection based on 220 plate appearances with a salt lick.
If we should not be drawing any serious conclusions about Perez’s overall hitting ability on the basis of his 198 big league plate appearance, can we learn anything about any of his skills from them? Let’s quickly look at Eric Seidman’s classic post based on Pizza Cutter’s research from a few years back, When Samples Become Reliable. Just sticking with some basic stats, after 50 plate appearances, swing percentage stabilizes, after 100 contact rate stabilizes, after 150 strikeout rate stabilizes, and at 200 (which Perez should reach this weekend! Hooray!), walk rate stabilizes.
[Keep in mind that “stabilizes” does not mean “at this point, observed performance does not need to be regressed much.” Pizza Cutter is using a different sort of method.] Taking a look at some of the “stabilization points” listed in the post, Perez’s major league track record to date tells us very little. Of the stats I tend to use, only swing rate, contact rate, and strikeout rate, and (this weekend) his walk rate have approached. His major league numbers with respect to his power, for example, have not reached that point. As for Perez’s BABIP, the research did not find a “stabilization point” even at 650 plate appearances. The tiny bit we might have learned from the major league numbers about Perez so far indicates is that he is going to take his swings and make contact. We might also initially say that he is not a fan of the free pass.
Now, those things is neither very exciting nor surprising if you looked at Perez’s minor league numbers or read what people were saying about him as a prospect. But that is not the issue here. The issue is how some people take his major league numbers on their own to mean something. Overall, they mean very little, and even what they can tell us about his plate approach is only slightly “stable.”
That is not to dismiss Perez’s minor league performances or the scouting reports derived from them. On the contrary, given the near-irrelevant sample (with respect to estimating his true talent) of his major league statistics to date, his minor league stats and especially scouting reports are much more important. Perez is a very young player with loads of potential. But if you want to make that point, scouting reports are going to do far more work than what happened during Perez’s 198 major-league plate appearances to date.
Matt Klaassen reads and writes obituaries in the Greater Toronto Area. If you can't get enough of him, follow him on Twitter.
I see this over and over again on FG, and this is simply incorrect and a vast misstatement of the original research.
The point at which a stat “stabilizes” is the point at which the sample is sufficiently big that you can predict future performance by equally weighting sample performance and league average; no more and no less.
“Before” a stat stabilizes, you weight league average more (“regress more”) to account for the greater variance in a smaller sample, and you have wider error bars/less precision in your prediction; “after” a stat stabilizes, you weight league average less and have more confidence. But that’s a quantitative difference, not a qualitative difference.
Even a small sample size does have some meaning; otherwise, you couldn’t add up a whole bunch of 0s from individual data points and get non-zero meaning from the sum, or have a discontinuity from “no meaning” at 199 PA to “meaning” at 200.
So you can predict, even at this point, that Perez’s BABIP will be higher than league average — just not as much higher than league average as you could/should predict if you had a much bigger sample. [I’m ignoring his minor league BABIP — if you add that to your data set, you might well get a different answer, but that’s not relevant here.]
So the fact that the data you have about Perez says far more about his Swing% than his BABIP doesn’t mean that it says nothing about his BABIP, let alone that it says “‘less’ than nothing” about it.
Thank for the comment.
I may have been hyperbolic in some of my originally statements above (I’ve toned it down a bit to hopefully lessen confusion), but as for your main point about regression/prediction/stability, I asked about the distinction between the point of “stability” and the point of where we weight league average for regression. Pizza Cutter (the original researcher) and Tango have different approaches, but unless I’m misreading them, they both seem to make a distinction (although in different ways) between the point at which a stat “stabilizes” in Pizza Cutter’s sense and the number of PA at which one weights observed individual PA equally against the same number of league average PA of league average. See their responses to my question along those lines (a long time ago) here:
http://www.fangraphs.com/blogs/index.php/when-samples-become-reliable/#comment-76819
I’m not saying that I’ve necessarily got things exactly right here, but they are both making a distinction between Pizza’s point about “stability” and Tango’s way of using regression to the mean.
Essentially, Pizza is saying that “stability” has to do with measuring a *change* in baseline — if you see an uptick in Swing% over a small sample, it may suggest that the player’s actual skillset has changed.
Tango is saying that “stability” is just measuring the current skill — once you reach a “stability point,” you can statistically expect that X% of the player’s deviation from average is repeatable. It has nothing to do with change.
In this post, it’s a distinction without a difference. If you’re starting from scratch, then the “change” is from your expected baseline, which means both definitions wind up with exactly the same outcome.
The only time you would really distinguish between the two is if you were looking for evidence that a player’s skill has actually changed — at which point you have a very complex problem ahead of you in determining how much to “age” old data so that it carries less weight.
That’s the crux of Tango’s issue with Pizza — he thinks that Pizza is asserting that a lower “stability point” indicates a skill that actually changes more rapidly than a skill with a higher “stability point,” but that the research doesn’t actually show that — instead, the research just shows that some skills are easier to identify with a smaller data set than others without giving any indication as to which skills might quickly change (let alone how you would age old data to properly incorporate it).
And I agree. I think it’s entirely possible that some skills do change more quickly than others, but I don’t see any evidence to support that. Instead, I see good research on how big of a data set we need before we can make a specific prediction about a specific skill, and how that varies based on the skill. Which is still meaningful but very different.
But in either case, neither Tango nor Pizza is saying that “stability point” is a discontinuity where your sample suddenly goes from meaningless to meaningful. It’s just crossing a line and reaching a certain “amount” of meaning.
I don’t believe I ever expressly said that, although it sounds like something that I would say if I had thought of it.
When I wrote the original article, nearly 5(!) years ago, I was actually most interested in an empirical way to determine what my cutoff point should be when I said “min X PA” in research. The issue of whether this models change is something that is something that people (including me) have probably inadvertently assumed.
More to come.