Chris Mitchell on KATOH and Forecasting Prospects by Carson Cistulli January 8, 2015 Just before the start of the new year, Chris Mitchell published at The Hardball Times some expanded results from his work on KATOH, the name he’s given to a methodology for estimating not only the probability of a particular minor-league prospect graduating to the majors, but also — in this expanded version — for estimating actual WAR thresholds prospects are likely to cross given their minor-league resumes. Mitchell’s work advances our understanding of which metrics at the minor-league level correlate most highly with major-league success. Below are five questions I asked regarding these most recent findings, and Mitchell’s answers concerning same. ***** Because you’ll do a better job of it than I would, could you provide a brief explanation of KATOH — in particular, of the variables that most directly inform it? KATOH aims to answer a series of questions about a minor-league baseball player: “How likely is this player to play at least one game in the majors through age 28?” and “How likely is it that he’ll reach certain performance benchmarks — 4, 6, 8, 12, and 16 WAR — through age 28?” I arrived at these probabilities by running probit regression analyses, which tell us how a variety of inputs influence an outcome that has two possible outcomes. In this case, the variables in question include a player’s age and some of his offensive stats relative to league average: strikeout percentage, walk percentage, isolated slugging, batting average on balls in play, and frequency of stolen base attempts. You write in your THT piece that “even as high as Double-A, a one percent change in strikeout rate affects a player’s projection by about 1.5 times as much as a one percent change in walk rate.” This sort of information — which is to say, the relative predictive ability of one metric versus another — seems incredibly useful and also relatively unplumbed (except in maybe a more anecdotal, less rigorous context). Are there any other guidelines you can provide to this effect? Like, on the importance of a prospect’s isolated-slugging figures at one level, or the significance of his contact rates at another? Each of these variables I considered has some predictive power for players in the high minors, but only some of them — age, ISO, K%, and BABIP — tell us much about players at the lower levels. A player’s BB% and SB numbers all tell us relatively little about players below Double-A. I think the biggest takeaway — and possibly the most surprising one — relates to a hitter’s walk rates. For players in the lower levels of the minor leagues, walk rate has little to no bearing on his future big league success. I found that a player’s walk rate has literally no predictive value at all for players in Rookie ball and very little for A-ball hitters. There are only three levels where both BB% and K% are predictive — Triple-A, Double-A, and Short-Season-A — but there’s some insight to be gleaned by looking at how predictive these metrics are relative to each other. In Short-Season-A, a one percent change in strikeout rate affects a player’s projection over twice as much as a one percent change in walk rate. This drops to around 1.5 for players in Double-A, and is essentially 1:1 for hitters in Triple-A. Clearly, a hitter’s walk rate becomes more and more revelatory the higher he gets on the minor-league ladder. Other than a player’s walk rate, there aren’t really any eye-popping trends regarding which statistics matter most at different minor league levels. A hitter’s SB numbers also proved to be insignificant at the lower levels, but even when this metric is predictive, it doesn’t add all that much to the model. Some of the names which appear atop the current KATOH list aren’t particularly surprising. Mookie Betts, Kris Bryant, Joc Pederson: each has either already played or is very close to playing in the majors, and all three have received optimistic Steamer projections for 2015. Of more interest, perhaps, is the case of Padres infield prospect Luis Urias. He was just 17 in 2014, playing Rookie-level ball — and yet, among players who recorded 100-plus plate appearances in the minors last season, he’s forecast to produce the seventh-highest total WAR figure (14.5 WAR) through age 28. What does KATOH detect here? In my analysis of players in the lowest rung of Rookie ball, I found only three variables that proved to be predictive of future big league success: age, strikeout rate, and isolated power. And as a 17-year-old who struck out just 7% of the time, Luis Urias does very well in two of those categories. It’s relatively rare for a 17-year-old to spend the year in a domestic rookie league, and the fact that he did so while consistently putting the ball in play is certainly encouraging. The one fly in the ointment – and it’s a big one – is Urias’ lack of power. He managed just a .045 ISO last season, which dampened his projection a bit. Despite his optimistic projection, I’d hold off on deeming Urias a can’t-miss prospect. Urias spent 2014 in the Arizona League, which is the absolute lowest rung of the minor-league ladder outside of the foreign rookie leagues. As a result, projections for players at these levels should be taken with a huge grain of salt. Looking at historical players, the average difference between a player’s projected and actual WAR total gets larger the further you move away from the Major League level. I’d easily take the under on Urias’ 14.5 WAR projection, but I still say he’s worth keeping an eye on. His low-strikeout, low-power profile may not seem all that exciting, but several players have gone on to become successful big leaguers after similar showings in Rookie ball, including Edgardo Alfonzo (28.5 fWAR through age 28), Rafael Furcal (21.4), Alex Ochoa (4.4), Joe McEwing (2.9), and Norris Hopper (2.8). In addition to supplying a spreadsheet of the KATOH figures for minor leaguers who played in 2014, you also provide a link in your THT piece to historical KATOH projections. A 20-year-old version of Cliff Floyd from 1993 is the best prospect ever, basically, according to KATOH. Or, maybe “best prospect ever” isn’t the precise way to summarize it. In either case, Floyd’s KATOH projection is the highest on the aforementioned spreadsheet. What did Floyd do to earn it? What didn’t Floyd do in his 1993 campaign in Double-A Harrisburg? At the tender age of 20, he hit an impressive .329/.417/.600 on the strength of a 12% BB%, 16% K%, .271 ISO, and .347 BABIP, all of which were better than average for the Eastern League in 1993. In fact, his ISO was the very best in the league. On top of all of that, he attempted to steal a whopping 41 times in just 441 PAs. In sum, Floyd hit for elite power, and performed very well across the board otherwise. Those historical KATOH projections also reveal certain prospects whose minor-league resumes produced cause for optimism, while the player himself ultimately did little in the majors. The 2005 version of Jeremy Hermida (18.9 projected WAR, 0.9 actual WAR) belongs to this class of player, for example, as do both the 1993 and -94 versions of Karim Garcia (13.8 and 14.4 projected WAR, respectively, 0.0 actual WAR). Are these cases instructive in any way for you? Merely a product of variance? Something else? As with any predictive model, there are going to be cases where the actual outcome differs from the prediction; and with KATOH, these differences can sometimes be pretty large. Much of this discrepancy can be chalked up to random variation — it’s not easy to predict how a teenager will perform ten, five, or even two years from now. But aside from the random noise, there are certain instances where we can see why KATOH might have missed on a particular hitter. For example, KATOH doesn’t directly account for a player’s defensive abilities, which causes it to overrate or underrate players at certain positions. This probably explains at least some of the overly optimistic projections for Karim Garcia and Jeremy Hermida, who were both below-average corner outfielders. There’s also the possibility that a player’s stolen-base totals might slightly misrepresent a player’s true-talent level. I think this applies to Hermida’s 2005 season, where he went 23 out of 25 on the basepaths. Hermida wasn’t a slow player, but probably wasn’t as fast as most who run that frequently in Double-A. Plus, since KATOH doesn’t directly take defense into account, a player’s stolen-base numbers act as something of a proxy for his defensive ability. Most of the players who post high stolen-base totals also happen to be strong defenders, often manning premium positions like center field or shortstop. Hermida, however, was a pretty poor defensive player as evidenced by his negative UZRs in both outfield corners.