Evaluating Two-Pitch Pitchers by Carmen Ciardiello June 11, 2021 About a month ago, I wrote about Jack Flaherty and looked at his increased reliance on both his fastball and his slider. I posited that through his first seven starts, Flaherty had effectively been a two-pitch pitcher, with the aforementioned combination of pitches making up about 80% of his total pitches. (His curveball was his third-most used pitch, thrown sparingly at about a 13% clip.) To investigate if this constituted a negative development and could account for Flaherty’s reduction in strikeouts relative to his career norms, I conducted a series of analyses. I grouped pitcher seasons from 2010-20 and looked at the number of pitches each pitcher had with a usage over 15%. This was somewhat arbitrary; I chose the 15% cutoff so pitchers with mixes like Flaherty’s in 2021 would appear in the bucket with two pitches. I then took each bucket and looked at the group’s strikeout rate, walk rate, FIP-, and WAR per 180 innings pitched. I found that between two, three, and four pitches, there was virtually no difference in any of the measures; the strikeout and walk rates were within a percentage point, as were the FIP- figures, while the prorated WAR numbers were within hundredths of a win. Next, I calculated the third time through the order (TTO) effect for pitchers in each bucket. To my surprise, there again was little difference between the pitcher buckets. My hypothesis was that two-pitch pitchers would struggle to get through the order as effectively as their peers who utilized more pitches. But based on my cutoffs for a relevant pitch (15% usage), this did not seem to be the case. From there I concluded that Flaherty leaning on his fastball and slider more was not inherently bad; there seemed to be no evidence that being a two-pitch starter was inherently detrimental to striking out batters, preventing runs, and turning over a lineup on more than one, two, or three occasions. But upon further reflection, I was dissatisfied with my process in arriving at this conclusion. The basis for my dissatisfaction was that my criteria for determining whether a pitcher was a two-pitch pitcher or a pitcher with three to four credible offerings. I chose the criteria, as I explained above, based on the tendencies of a single player I was interested in and in a way that would fit the narrative I was trying to tell. I also felt (anecdotally) there had been an influx of pitchers in the majors who have found success by primarily relying on two pitches; some of those pitchers happened to represent clubs the public deems “smart.” Thus, two-pitch starters were not actually more flawed than their peers with more diverse repertoires. I will address the latter part of this line of thinking later (spoiler: it is extremely flawed) but this is just how I was trying to rationalize my findings. I have seen the performances of Luis Patiño and Shane McClanahan in 2021 and Tyler Glasnow last year (he added a slider this season) in Tampa with two pitches and thought the Rays may be on to something. Same for the Astros and Framber Valdez, Cristian Javier, and Lance McCullers Jr. (until this year, when he also added a slider). Two of the most surprising break-through pitchers of the past two-plus seasons have been Kevin Gausman and Lucas Giolito, both of whom rely primarily on a fastball/offspeed combination (for Gausman, the pitch is a splitter; for Giolito, it is a changeup). Dinelson Lamet is another pitcher with exceptional results (when healthy) relying only on a four-seamer and slider. As I mentioned above, this is all anecdotal evidence backing up a potentially faulty conclusion. There is no empirical support here. This is not the most rigorous approach to research. That led me to redo my analysis, this time with more rigor in classifying the “two-pitchedness” of a player. Before I get into my methodology for this determination, I would be remiss if I did not at least introduce the main concept I am trying to measure: the third time through the order effect (which I will denote as TTO for the remainder of this piece). This is a phenomenon that has played a massive part in determining pitching roles and deployment in this era of major league baseball. It consists of the degradation of pitcher performance as he moves through the opposing lineup. No matter how you measure it — wOBA allowed, RA9, ERA — the pitcher population pitches worse the second time through the order compared to the first and the third time through the order versus the second. Generally, the effect is measured relative to the first time through the order. Since I will be using wOBA allowed in this piece, that means the second time through the order effect is the difference in wOBA allowed for the second and first time through the order and the third time through the order effect is the difference in wOBA allowed between the third and first time through the order. For some more background on the subject, I would recommend this piece at Baseball Prospectus by Mitchell Lichtman, which was my introduction to the phenomenon. More recently, Rob Mains did a multi-part series on the TTO penalty for BP. I would also recommend these two articles from Chris Teeter at Beyond the Boxscore; the first link measures the TTO for groundball versus fly ball pitchers and the second gauges the TTO by the type of secondaries a pitcher employs. Now, onto to my analysis. First, let’s walk through how I grouped pitcher seasons this time around. For every pitcher season from 2010-19 (I threw out the shortened 2020 season) where the pitcher in question threw at least 100 innings, I looked at the percentage of pitches he threw for each pitch type. All the pitches were ranked in descending order based on their usage. I pulled the top two most used pitches for each pitcher and added their usage together. The sum of the usage of the top two pitches was my gauge of the “two-pitchedness” of that pitcher season. To give an example, Walker Buehler’s two most used pitches in 2019 were his fastball and slider. The former he threw 53.2% of the time and the latter he threw 14.2% of the time. Add those two figures together and you get 67.4%. That combined number was the figure I was concerned with for each pitcher season. A pitcher who only has two credible offerings will have a value close to 90%. Pitchers with the most egalitarian mixes will be down towards 50%. So instead of using an arbitrary cutoff to gauge whether a pitcher was a two-pitch pitcher, I used a continuous number that gives us a spectrum that’s not biased in any way (unlike my analysis in the Flaherty piece). I bucketed the combined usage of the top two pitches in increments of 10 percentage points. All players with a combined usage of their top two pitches greater than 50% and at most 60% were grouped together, then greater than 60% and at most 70%, etc. Note that we are dealing with pitchers who threw at least 100 innings in a season. This means we are considering starters and, in recent seasons, “bulk” guys or pitchers who appear after openers and are tasked with starter-level workloads without the designation of pitching as a starter. With the pitchers bucketed I went to pitch-by-pitch data from Baseball Savant. Each plate appearance in each regular season game was given the designation of how many times that pitcher faced that spot in the batting order. I appended the information about the pitch usage bucket the pitcher fell into and then collected the data for each bucket. Before I get to the TTO figures, let me show you the information I described towards the beginning of this article about the performance of pitchers in each bucket, now with the refined pitcher designations: Performance by Reliance on Top Two Pitches Top Two % No. of Pitchers K% BB% FIP- WAR per 180 10-20 9 17.4 7.1 107.9 1.73 20-30 1 15.3 6.2 100.0 2.33 30-40 1 19.4 8.7 90.0 3.34 40-50 41 18.6 6.9 104.1 1.94 50-60 322 19.6 7.2 99.0 2.35 60-70 487 20.1 7.3 97.6 2.44 70-80 349 20.9 7.5 97.1 2.49 80-90 179 20.9 7.1 96.2 2.61 90-100 28 21.5 7.5 97.0 2.58 For the rest of the piece, I am going to neglect the bins with so few players because the generalized results in those bins lack any signal given the size of the sample of pitchers in those buckets. Interestingly, it seems pitchers up to 90% combined usage of their top two pitchers performed best. They tied for the highest strikeout rates and posted the lowest walk rates, lowest park and league adjusted FIPs, and the highest WAR accumulation rates of all the relevant bins. All of these figures steadily decrease as the pitch mixes become less concentrated in the top two pitches. Case closed! We shouldn’t care if a pitcher throws a useful third and/or fourth pitch, right? I will point out that I made this point in my Flaherty piece. But this is the incorrect conclusion. The pitchers in the 80% and up to 90% bucket faced the fewest batters per appearance, followed by the pitchers in the next lowest bucket. This means that these pitchers are being pulled earlier and do not have to combat the second or TTO penalty as often as the rest of their peers and suffer a degradation in performance. Managers and front offices have realized this effect and naturally have made a conscious effort to pull these types of pitchers before the opposition gets too comfortable in the batter’s box. So pitchers with only two heavily used pitches post better results than those who leverage more offerings, but we know those performance indicators are biased in favor of those two-pitch pitchers. This performance bias presents itself with the TTO effect, which I calculated for the buckets in the table above. TTO Effect by Top Two Pitch Usage Top Usage Bin First Time wOBA Second Time wOBA Third Time wOBA Second Penalty Third Penalty 40-50 .319 .331 .337 .012 .018 50-60 .312 .323 .335 .011 .022 60-70 .307 .318 .332 .011 .025 70-80 .303 .319 .332 .016 .029 80-90 .308 .316 .340 .008 .033 SOURCE: Baseball Savant The second time penalty is the wOBA allowed difference between the first and second time through the order and the last column is the TTO penalty. From the pitcher’s perspective, positive wOBA figures are disadvantageous because this indicates hitters are performing better. The results here are stark. There seems to be no signal in how well a pitcher performs the second time he pitches through a lineup based on his propensity to throw his top two pitches. The TTO penalty, on the other hand, steadily increases from the lowest bucket in this set to the highest bucket. For pitchers who only use their top two pitches up to 50% of the time, the TTO penalty is worth just 18 points of wOBA. By the time we get to pitchers who are effectively throwing two pitches, the TTO penalty almost doubles relative to the lowest bucket, ballooning to 33 points of wOBA. The magnitude of the TTO penalty increases steadily among the buckets. The penalty for the second bucket (more than 50%, at most 60%) is four points higher than the lowest. The third is three points higher than that, while the fourth is four points higher than the third, and finally the last bucket is four points higher than the third. This is almost a perfectly linear trend. Adding pitches clearly gives pitchers more viable options to eat up innings and go deeper into games. That is not to say pitchers with broader repertoires do not suffer the consequence of the TTO penalty; instead the magnitude of the penalty is muted relative to their peers with arsenals concentrated in just a couple of pitches. Along these lines and with the TTO penalty results on hand, I tried to determine if adding a pitch in a given season would improve a pitcher’s ability to get through a lineup by dampening the TTO penalty. I took two approaches. The first was more restrictive, where the new pitch in question could not be thrown at all in the season prior. This meant that I took every pitcher season from 2010-19 (with the same 100 innings minimum restriction as before) and for every pitch that pitcher threw, I cross-checked with their prior season and noted if they threw the pitch at all. If the answer to that query was yes, then the pitcher was not marked with utilizing a new pitch. Correspondingly, if the answer to the query was no, I marked the pitcher as having a new pitch. The restrictive nature of this querying and flagging of pitchers and pitches made me skeptical that the results would be relevant on account of the small group of pitchers who add a completely new pitch after not using it the prior year. My skepticism was borne out in the results (Note: a previous version of this table was the exact same as the table you will see later in the article. That mistake has been rectified and the following has the updated results). Changes in TTO Penalty When Adding New Pitch New Pitch Second Penalty Previous Second Penalty Change in Second Penalty Third Penalty Previous Third Penalty Change in Third Penalty No .012 .013 -.001 .027 .024 .003 Yes .013 .012 .001 .025 .023 .001 SOURCE: Baseball Savant In the cases of the second time through the order penalty and the TTO penalty, there is basically no change across seasons when adding a new pitch from scratch, with changes on the scale of single points of wOBA, which is noise. There is also no discernible difference between those who add a new pitch and those who do not, based on this criterion. However, the population of pitchers who truly add a new pitch, one they did not throw prior to the season at hand, is very small. So I changed the definition of what constituted a new pitch. For the second go around, a new pitch was one the pitcher threw at least 10 percentage points more than the season prior. Yes, 10 percentage points is arbitrary and yes, I talked about arbitrary cutoffs at the start of this piece. But I would offer that the cutoff had to be set somewhere and my choosing of the cutoff was not influenced by the pool of pitchers I was analyzing. Also, I realize that my new criterion does not technically denote a “new” pitch like the first. But the spirit of this portion of the investigation is to flag pitchers who add a pitch the opposing hitter must account for differently in a plate appearance compared to how they would have approached the pitcher in a prior season. So, if a pitcher goes from throwing a pitch 5% of the time in year n-1 to 20% of the time in year n, that is a fundamental change in their repertoire that will have massive ripple effects on how they are scouted and what a hitter is looking for in any count. The results of my second query were more promising but hardly groundbreaking. Changes in TTO Penalty When Adding 10% Usage to a Pitch New Pitch Second Penalty Previous Second Penalty Change in Second Penalty Third Penalty Previous Third Penalty Change in Third Penalty No .011 .013 -.002 .026 .021 .005 Yes .016 .014 .002 .029 .031 -.002 SOURCE: Baseball Savant Pitchers who added a new pitch by this criterion shave about two points of wOBA from their TTO penalty while the rest of the population adds about five points year-over-year. One possible explanation for this seven-point wOBA discrepancy is that without making a fundamental shift to your repertoire, major league hitters can get a better handle on you the following season, yielding a more substantial TTO penalty. Another explanation, which goes hand in hand with the fact that pitchers who do not meaningfully add a new pitch actually perform slightly better the second time through the order, is that the population of pitchers who did not add a new pitch includes pitchers who decreased their usage of certain pitches. So this population includes pitchers who became more of a two-pitch pitcher season over season, thus choosing to lean into their best pitches more. As I said at the top, these two-pitch pitchers perform better on a rate basis but do not pitch as deep into games and suffer harsher TTO penalties. This, at least to me, is the most likely explanation for pitchers who would fall under the designation of the first row of the table improving the second time they go through the order but feel the effects of a more robust TTO penalty. On the flip side, pitchers who make a pitch a more substantial part of their arsenals worsen when they go through the order the second time but make up for it by dampening the TTO penalty. Is this a worthwhile tradeoff? Would you rather have a pitcher more dominant on a per plate appearance level but who taxes your bullpen more? Or would you want your starter/bulk guy to go deeper into the game? It obviously depends on your roster construction and how often your bullpen has been used leading up to a game, but this is a question front offices and field staff constantly juggle throughout the season and in the offseason when building their teams. Close to 3,000 words later, what have we learned? First and foremost, when attempting to measure anything or test a hypothesis, upon the conclusion of the research it is important to reflect and ask critical questions of how you approached the problem at hand. After my initial study into the viability of two-pitch starting pitchers centered around Jack Flaherty, I concluded that two-pitch pitchers were just as effective on a per pitch basis and that they suffer no additional TTO penalty. Therefore, I surmised, rostering these types of starting pitchers should have no detrimental effects on how you build your roster and are not a reason to be skeptical of a pitcher as a viable option to churn through an opposing lineup. The issue I found was that my definition of a two-pitch pitcher was flawed, based on an arbitrary cutoff to try to diagnose Flaherty’s lack of strikeouts in the early going. When I eliminated the arbitrary cutoff and used a more continuous definition of how much a pitcher relies on his top two pitchers, I found that pitchers with more limited repertoires were a little more effective than the rest of their peers, but did not go as deep into games. Furthermore, they suffered a much harsher TTO penalty, which is most likely the explanation for those pitchers not facing as many opposing hitters. The idea that pitchers with only two viable pitches are better suited for short starts, bulk work, or high leverage innings is not a groundbreaking finding, but I hope putting some empirical justification behind this idea is useful and this approach relatively new (at least on the public side). This confirmation of what many evaluators believed to be true should help us ask critical questions about how players should be deployed and developed, and what sorts of pitchers a roster requires. If the Rays invest in pitchers like Shane McClanahan and Luis Patiño, how should they be used and how does that affect Tampa’s roster? Well, it seems they are following what the research demonstrates: roster a deep bullpen and use these pitchers in three to five inning stints. The same concept holds true for the Astros and Cristian Javier and Framber Valdez or the Padres with Adrian Morejon, Ryan Weathers, and Dinelson Lamet. Another essential part of this calculus is how we should be evaluating players in the minor leagues or amateurs in the draft. The starting viability of players like Garrett Crochet and Max Meyer has been called into question in recent draft classes; the same goes for Sam Bachman in this upcoming draft. Binning these types of pitchers — with high-end fastball velocity, wipe-out breaking pitches, and a history of starting — as starters or relievers seems foolhardy. Instead, we know pitchers with this skillset can effectively get through a lineup twice but more than that and the manager is playing with fire. Given this breed of pitcher’s effectiveness per plate appearance, actively avoiding acquiring pitchers with only two viable pitches is narrow-minded. Instead, if they make it to the major leagues, teams should be trying to supplement these elite talents with other pitchers who mesh with the roles required to maximize the skills of a Max Meyer or Garrett Crochet type pitcher. I do not believe this is lost on much of the league. I am merely suggesting two-pitch starting pitchers can be excellent players in the correct environment. But given a TTO penalty almost twice that of starting pitchers with more diverse arsenals, two-pitch pitchers need to be monitored closely. If the league allows teams to carry as many pitchers as they would like, two-pitchedness and flame throwing bullpens are here to stay. Until the rules on pitcher limit take affect, with the correct usage limited pitch mixes will continue to be valuable assets to major league clubs, provided those two pitches are high-end offerings.