# Familiarity and Framing, Investigated

Something that makes total sense is that catchers need to know their pitchers. Catchers, after all, are the guys calling all of the pitches, and catching many of the pitches, and you often hear about guys who either are or are not on the same page. It makes sense how familiarity could have an effect on pitch-calling. It also makes sense how familiarity could have an effect on pitch-receiving, as greater familiarity will yield a greater understanding of how pitches move and where they’re likely to go.

Earlier this very Monday, Eno posted an article titled “Familiarity Breeds Better Framing“. Eno was passing along material he got in speaking with Oakland catcher Stephen Vogt, and Vogt used Luke Gregerson as an example of a guy he doesn’t know well enough yet. Vogt needs to learn Gregerson’s tendencies and movement in order to maximize his own ability to catch him. This all got me wondering: can we see anything in the PITCHf/x data? What do the framing numbers look like for pitchers who’ve changed teams?

The familiarity I’m talking about is the familiarity between a pitcher and a catcher. I’m going to use changed teams as a proxy for changed catchers, since most pitchers don’t have an R.A. Dickey/Josh Thole thing going on. You’re most accustomed to seeing framing data broken down by catcher, since it’s the catchers who’re doing the actual framing, but you can also see the same numbers by pitcher. That’s what this study focuses on. The expectation is that pitchers that change teams end up with worse framing numbers, because then they’re throwing to new catchers.

I looked at this in two ways. First, I looked at pitchers who changed teams midseason. Then, I looked at pitchers who changed teams between seasons. Out of convenience, I elected to use a different methodology for each. For the former, I used data pulled from StatCorner. For the latter, I did some calculations using data pulled from FanGraphs leaderboards. Both methodologies have agreed well with one another in the past, in different studies, so I was comfortable with this approach.

First, the guys who changed teams midseason. During the PITCHf/x era, there have been 36 pitchers who threw at least 50 innings for two different teams in the same season. StatCorner keeps track of a couple of stats — balls called within the usual umpire strike zone, and strikes called outside of the usual umpire strike zone. Here are the average numbers for those pitchers before and after getting moved:

**First team: **16% balls on called pitches in zone

**Second team: **16%

**First team: **7.8% strikes on called pitches out of zone

**Second team: **7.9%

Absolutely nothing there. Now for the guys who changed teams between seasons. For this analysis, I used my own metric that I put together a while ago. Using FanGraphs plate-discipline data, you can calculate expected strikes. Then you can look at the difference between expected strikes and actual strikes, and the numbers you get out of these calculations agree well with the more rigorous pitch-by-pitch analyses. I calculated something called Diff/200 — the difference between actual strikes and expected strikes, above or below average, per 200 innings. I examined the six-year PITCHf/x era.

I looked for guys who threw at least 50 innings in consecutive years. I then excluded the guys who were traded during one or both of those seasons, to make the data cleaner. I was left with a sample of 921 season-pairs. Of those, 164 featured pitchers who changed teams over the winter. The remaining 757 featured pitchers who stayed on the same team. Here’s how the numbers compare:

**Year 1, pitchers who changed: **-2.1 Diff/200

**Year 1, pitchers who stayed: **0.8

**Year 2, pitchers who changed: **-0.7 Diff/200

**Year 2, pitchers who stayed: **2.4

**Difference, pitchers who changed: **+1.4 Diff/200

**Difference, pitchers who stayed: **+1.6

If I wanted, I could analyze the little patterns in these numbers. The pitchers on new teams regressed toward average, but were still a little worse than the pitchers who stuck around. However, consider the scale. We’re talking about one, two, three strikes per 200 innings. So, a small fraction of one run, which is hardly worth talking about. If you squint, you can see a familiarity effect. If you squint any harder, your eyes will be closed and you’ll see nothing but darkness. There just isn’t anything here that screams “I’M SIGNIFICANT!”

Intuitively, you’d think, when a pitcher changes teams, he’ll get received a little worse, because the catchers don’t know him. I couldn’t really find anything with pitchers who changed teams in the middle of the year. I couldn’t really find anything with pitchers who changed teams between two years. And the more I think about it, the more I think *this* is the result I should’ve expected. A result of nothing meaningful.

For one thing, most catchers have been catchers for a while, and while catchers, they’ve had to catch a lot of different pitchers. Catchers in the majors are selected for meeting at least some kind of bare-minimum threshold of adequacy, so it stands to reason they’re quick learners. They basically have to be. They have to learn pitchers quickly, and they have to learn hitters quickly, or else they probably won’t be very good catchers. I don’t doubt that a new pitcher takes a little getting used to. It might just be the getting used to takes very little time at all.

And for another thing, how different are most pitchers, really? How different are their pitch types? The catcher knows what pitch is coming, because the catcher called for it. Most fastballs move similarly. Most changeups move similarly. Most pitchers tend to miss in similar ways. Luke Gregerson throws a bunch of different sliders, but he’s unusual in that regard. He might take a little extra getting used to. It’s presumably still a quick process. Pitchers and catchers are in constant communication, and catchers seem able to learn pitchers in a hurry.

For purposes, at least, of receiving. Familiarity might have a bigger effect on game-calling, and therefore on results. It could take longer for a catcher to figure out a pitcher’s real strengths and weaknesses in different situations. But as far as just pitches and movements? Plenty of reasons to think it’s a process. Plenty of reasons to think it doesn’t take long.

Jeff made Lookout Landing a thing, but he does not still write there about the Mariners. He does write here, sometimes about the Mariners, but usually not.

I think you should do a statistical significance test before deciding whether or not the results are significant.

Pretty sure I can eyeball this one.

Remember what significance tests are for. In this case, I think it’s fair to say that the author isn’t working with samples of data; he’s working with populations (all pitchers in the Pitch F/X era for which the stated conditions are true/false). Assuming you would use a Student’s T-test here, a test of statistical significance to determine whether a given parameter differs between two populations violates one of the test’s assumptions (the “sample” consists of >10% of the population).

Further, such a test would be a useless exercise because the purpose of the test would be to obtain evidence as to whether the population means differ. The data provided in the article clearly show that the population means do differ. In this case, statistics cannot tell you whether the observe difference in means is significant, you have to use common sense to tell you that. As the author implies, common sense is pretty clear on this one.

I like that it is called a “Student’s T-Test.” Not sure why, but that is the way I know it as well.

William Sealy Gosset, Guinness’s master brewer, used a surname (“Student”) when publishing his t-test methodology.

http://en.wikipedia.org/wiki/Student%27s_t-test

Gosset, we should also note, would/did abhor the practice of “significance testing.” He (and I) consider it an arbitrary, intellectually bankrupt practice, both as it is now and as it was when first proposed by R.A. Fischer. Fischer and Gosset debated, but because Fischer was in the academic community and Gosset in the business world of proprietary practices, Fischer’s flawed beliefs propogated; Gosset’s got lost.

I agree. In a nutshell, using significance tests as “proof” is akin to using a zero correlation coefficient as proof of causation.

And I also heartily endorse being a Student of Guinness.

I don’t even get what you’re saying. You can probably eyeball this case, yes, but you seem to be saying that if the numbers are different you don’t need to test for significance. The entire point of a significance test is to see if the “different” data you have are actually different and not simply a fluke. This is actually the perfect data set for a t-test.

No. The perfect data set for a t-test would not violate the assumptions that make the test mathematically valid. I’m not sure if the data follow a normal distribution, but I would assume not. More importantly, the data are not independent observations.

But more important still, we use frequentist inferential statistics to estimate population parameters using sample data and test hypotheses concerning the equality of those parameters. This is useful when it is impractical to gather data for a given parameter over an entire population. What you are proposing is that we conduct a hypothesis test that two population means are the same (with the alternative hypothesis being that the population means are different) using the complete dataset for both populations with known population means. This is analogous to estimating the population of the United States in 2010 by taking a sample of 2010 census data. You already have a census, so what are you estimating for?

Moreover, the data reveal that the population means ARE different in this case. “Statistically significantly different” and “not equal” are the same thing. That just doesn’t mean anything because they’re clearly not that different here. That we’re eyeballing this one isn’t really a philosophical problem either. Statistical tests are merely a form of evidence. Sometimes hypothesis testing is unnecessary to convince people of the existence or non-existence of a relationship between two or more variables because those relationships/non-relationships are self-evident.

Ok, I might have gotten carried away with that last paragraph. By this point I’ve consumed a significant amount of Hopslam (self-evident). But you get the idea.