Chris Mitchell on KATOH and Forecasting Prospects

Just before the start of the new year, Chris Mitchell published at The Hardball Times some expanded results from his work on KATOH, the name he’s given to a methodology for estimating not only the probability of a particular minor-league prospect graduating to the majors, but also — in this expanded version — for estimating actual WAR thresholds prospects are likely to cross given their minor-league resumes.

Mitchell’s work advances our understanding of which metrics at the minor-league level correlate most highly with major-league success. Below are five questions I asked regarding these most recent findings, and Mitchell’s answers concerning same.


Because you’ll do a better job of it than I would, could you provide a brief explanation of KATOH — in particular, of the variables that most directly inform it?

KATOH aims to answer a series of questions about a minor-league baseball player: “How likely is this player to play at least one game in the majors through age 28?” and “How likely is it that he’ll reach certain performance benchmarks — 4, 6, 8, 12, and 16 WAR — through age 28?” I arrived at these probabilities by running probit regression analyses, which tell us how a variety of inputs influence an outcome that has two possible outcomes. In this case, the variables in question include a player’s age and some of his offensive stats relative to league average: strikeout percentage, walk percentage, isolated slugging, batting average on balls in play, and frequency of stolen base attempts.

You write in your THT piece that “even as high as Double-A, a one percent change in strikeout rate affects a player’s projection by about 1.5 times as much as a one percent change in walk rate.” This sort of information — which is to say, the relative predictive ability of one metric versus another — seems incredibly useful and also relatively unplumbed (except in maybe a more anecdotal, less rigorous context). Are there any other guidelines you can provide to this effect? Like, on the importance of a prospect’s isolated-slugging figures at one level, or the significance of his contact rates at another?

Each of these variables I considered has some predictive power for players in the high minors, but only some of them — age, ISO, K%, and BABIP — tell us much about players at the lower levels. A player’s BB% and SB numbers all tell us relatively little about players below Double-A.

I think the biggest takeaway — and possibly the most surprising one — relates to a hitter’s walk rates. For players in the lower levels of the minor leagues, walk rate has little to no bearing on his future big league success. I found that a player’s walk rate has literally no predictive value at all for players in Rookie ball and very little for A-ball hitters. There are only three levels where both BB% and K% are predictive — Triple-A, Double-A, and Short-Season-A — but there’s some insight to be gleaned by looking at how predictive these metrics are relative to each other. In Short-Season-A, a one percent change in strikeout rate affects a player’s projection over twice as much as a one percent change in walk rate. This drops to around 1.5 for players in Double-A, and is essentially 1:1 for hitters in Triple-A. Clearly, a hitter’s walk rate becomes more and more revelatory the higher he gets on the minor-league ladder.

Other than a player’s walk rate, there aren’t really any eye-popping trends regarding which statistics matter most at different minor league levels. A hitter’s SB numbers also proved to be insignificant at the lower levels, but even when this metric is predictive, it doesn’t add all that much to the model.

Some of the names which appear atop the current KATOH list aren’t particularly surprising. Mookie Betts, Kris Bryant, Joc Pederson: each has either already played or is very close to playing in the majors, and all three have received optimistic Steamer projections for 2015. Of more interest, perhaps, is the case of Padres infield prospect Luis Urias. He was just 17 in 2014, playing Rookie-level ball — and yet, among players who recorded 100-plus plate appearances in the minors last season, he’s forecast to produce the seventh-highest total WAR figure (14.5 WAR) through age 28. What does KATOH detect here?

In my analysis of players in the lowest rung of Rookie ball, I found only three variables that proved to be predictive of future big league success: age, strikeout rate, and isolated power. And as a 17-year-old who struck out just 7% of the time, Luis Urias does very well in two of those categories. It’s relatively rare for a 17-year-old to spend the year in a domestic rookie league, and the fact that he did so while consistently putting the ball in play is certainly encouraging. The one fly in the ointment – and it’s a big one – is Urias’ lack of power. He managed just a .045 ISO last season, which dampened his projection a bit.

Despite his optimistic projection, I’d hold off on deeming Urias a can’t-miss prospect. Urias spent 2014 in the Arizona League, which is the absolute lowest rung of the minor-league ladder outside of the foreign rookie leagues. As a result, projections for players at these levels should be taken with a huge grain of salt. Looking at historical players, the average difference between a player’s projected and actual WAR total gets larger the further you move away from the Major League level.

I’d easily take the under on Urias’ 14.5 WAR projection, but I still say he’s worth keeping an eye on. His low-strikeout, low-power profile may not seem all that exciting, but several players have gone on to become successful big leaguers after similar showings in Rookie ball, including Edgardo Alfonzo (28.5 fWAR through age 28), Rafael Furcal (21.4), Alex Ochoa (4.4), Joe McEwing (2.9), and Norris Hopper (2.8).

In addition to supplying a spreadsheet of the KATOH figures for minor leaguers who played in 2014, you also provide a link in your THT piece to historical KATOH projections. A 20-year-old version of Cliff Floyd from 1993 is the best prospect ever, basically, according to KATOH. Or, maybe “best prospect ever” isn’t the precise way to summarize it. In either case, Floyd’s KATOH projection is the highest on the aforementioned spreadsheet. What did Floyd do to earn it?

What didn’t Floyd do in his 1993 campaign in Double-A Harrisburg? At the tender age of 20, he hit an impressive .329/.417/.600 on the strength of a 12% BB%, 16% K%, .271 ISO, and .347 BABIP, all of which were better than average for the Eastern League in 1993. In fact, his ISO was the very best in the league. On top of all of that, he attempted to steal a whopping 41 times in just 441 PAs. In sum, Floyd hit for elite power, and performed very well across the board otherwise.

Those historical KATOH projections also reveal certain prospects whose minor-league resumes produced cause for optimism, while the player himself ultimately did little in the majors. The 2005 version of Jeremy Hermida (18.9 projected WAR, 0.9 actual WAR) belongs to this class of player, for example, as do both the 1993 and -94 versions of Karim Garcia (13.8 and 14.4 projected WAR, respectively, 0.0 actual WAR). Are these cases instructive in any way for you? Merely a product of variance? Something else?

As with any predictive model, there are going to be cases where the actual outcome differs from the prediction; and with KATOH, these differences can sometimes be pretty large. Much of this discrepancy can be chalked up to random variation — it’s not easy to predict how a teenager will perform ten, five, or even two years from now. But aside from the random noise, there are certain instances where we can see why KATOH might have missed on a particular hitter.

For example, KATOH doesn’t directly account for a player’s defensive abilities, which causes it to overrate or underrate players at certain positions. This probably explains at least some of the overly optimistic projections for Karim Garcia and Jeremy Hermida, who were both below-average corner outfielders. There’s also the possibility that a player’s stolen-base totals might slightly misrepresent a player’s true-talent level. I think this applies to Hermida’s 2005 season, where he went 23 out of 25 on the basepaths. Hermida wasn’t a slow player, but probably wasn’t as fast as most who run that frequently in Double-A. Plus, since KATOH doesn’t directly take defense into account, a player’s stolen-base numbers act as something of a proxy for his defensive ability. Most of the players who post high stolen-base totals also happen to be strong defenders, often manning premium positions like center field or shortstop. Hermida, however, was a pretty poor defensive player as evidenced by his negative UZRs in both outfield corners.

Carson Cistulli has published a book of aphorisms called Spirited Ejaculations of a New Enthusiast.

newest oldest most voted


I’m fascinated by this, and impressed at someone assembling the necessary data in order to set up the probit regressions in question–always the hardest part of this sort of analysis.

A couple questions/comments:

~It would seem that you’re mainly interested in predicting whether a player will be able to hit major-league pitching–that is, fWAR might not actually be the best dependent variable. Why not separate out only the offensive component of WAR, or even use something like wRC+?

~Have you explored using a normal regression rather than probit regression? You could also simply change the WAR thresholds, but I’m guessing you’ve already done that.

~I think adding a fixed effect for primary position could improve KATOH’s accuracy immensely. I remember in the THT article that one of the principal problems with KATOH was that it underrated the potential success of premium defensive positions. This wouldn’t be a perfect solution given position changes (e.g. Michael Morse coming up as a shortstop), but I’m guessing it would be valuable.

~What about interaction terms between BB%, K%, and ISO? Might be an interesting way to dive deeper into different ‘profiles’ of players.



If you use OLS you’re stuck with the question of what to do with the huge number of people who never make it. Are they 0 WAR? If so, are they really better than the people who get called up and post Negative WAR? Seems unlikely. I’ve been playing with something really similar to this using OLS, and I limited it initially to guys who are actually called up so that I can weed out some of the crap and answer from a fantasy perspective, what kind of hitting we’re likely to see from this guy over the next 3 years that I can keep him on my fantasy roster. Subsequently, my version overvalues players who are lower in the minors because I weeded out all the bad ones who never get a sniff of major league action, while keeping quite a few bad AAA players who get called up because their parent club is desperate to fill an injury or get a glove-only guy who really can’t hit at all. Mitchell’s probabilistic approach with probit regression allows him to more accurately account for garbage sitting around in the minor leagues and identify it because he’s got a binary dependent variable, “makes majors=true(1) or false(0),” “Earns at least 1 WAR=true(1) or false(0),” etc.

Chris Mitchell

Thanks Joshua. I can assure you that gathering and cleaning the data wasn’t easy, but the FanGraphs leaders page helped a lot.

As Matt mentioned below, I used fWAR instead of wRC+ since a player’s speed somewhat gets at his defensive value. Things also get a little messy when you consider that players at premium defensive positions are likely to stick around even when they’re not hitting. Say a shortstop and first baseman are both 70 wRC+ hitters at age 22, but 100 wRC+ hitters by 28. The first baseman is obviously the worse player overall, but would have the higher career wRC+ since he would have spent his early 20’s in the minors, while the shortstop was racking up PA’s of 70 wRC+.

As Corey mentions, OLS struggles with the issue of so many players who don’t make it. No matter how I classified those players, I always ended up with projected WARs that were crazy low. Also, using a probit allows for a little more depth IMO. It allows me to see which players might be high-risk/high-reward, for example.

I tested for interaction terms, but none ended up being statistically significant.