There’s a strident (pun intended) debate about who should win the NY Cy Young Award, the Padres’ Blake Snell or the Braves’ Spencer Strider. The reason the debate is so strident has to do with the peculiar contrast between the two players’ seasons.
Both pitchers have thrown roughly the same number of innings, but Strider has struck out many more (274) and walked far fewer (55). In fact Snell leads he majors in walks (99) and is on track to be the only pitcher since 1913 to do so while also leading qualifying pitchers in ERA (2.25).
Now if Strider were second or third in ERA, he’d win the award, hands down. Unfortunately, his ERA is 3.81 — in other words he’s allowed 77 earned runs to score on his watch while Snell has let in only 45.
Those in the Strider camp would never point to his 19 wins (that’s a team stat!), and in fact you hear them dismissing Snell’s ERA similarly as both a team (defense, park) and luck stat (BABIP, HR/FB). They argue that among the things pitchers control — walks, strikeouts and fly balls — Strider was better, and that’s what should matter if we’re trying to determine who should win the award for league’s best pitcher.
Let’s set aside the fact that Snell had a 44.4% ground ball rate to Strider’s 34.3. That would mean fewer chances for home runs, and in fact Snell gave up fewer (15) to Strider’s 22. Strider’s HR/FB rate was slightly higher, but you can chalk that up to their respective home parks (With 100 as the average, Snell’s was 93 (21st) and Strider’s 112 (5th)). And let’s further set aside that Snell gave up hard contact on only 30.9 percent of batted balls, while Strider was ninth overall at 35 percent.
I say “let’s set aside” because while it explains in part why Strider’s ERA was high relative to his underlying metrics, and Snell’s lower, that difference doesn’t nearly do the heavy lifting needed to explain the cavernous gap in runs allowed, especially given Snell’s insanely high number of walks.
The only thing left, according to the Strider backers, is luck. Snell’s .256 BABIP to Strider’s .312 (which is even more pronounced when you factor in the seven extra hits that went out of the park against Strider that don’t count against his BABIP) and Snell’s insane 86.7 left-on-base (LOB) percentage which leads the majors. (Strider’s LOB% is 37th out of 46 qualifying pitchers at 70.4%.)
But as was pointed on in an excellent article by MLB.com’s Brett Maguire, Snell had even better LOB numbers in 2018 (when he won his first CY) too. Moreover, as Maguire points out:
As if a mid-90's heater and elite curveball weren't enough, Snell also mixes in a spectacular changeup and slider. Both pitches are not too far behind in terms of bat-missing ability and overall dominance. Snell's changeup has the ninth-highest whiff rate (47.7%) among individual pitch types (min. 150 swings) and is second only to Shane McClanahan's changeup. His slider, meanwhile, has a 53.6% whiff rate that trails only four pitches, including his own curveball.
When you combine the overall excellence of his non-fastballs, Snell is producing one of the most dominant seasons on breaking balls and offspeed pitches by a starting pitcher in recent memory. He has a combined 51.1% whiff rate on his non-fastballs, the second-best single-season rate by a pitcher behind only Strider, who has a 55.3% whiff rate on his slider and changeup this year.
So Snell’s season is not just unusual cosmetically — leading in walks and ERA — but it’s unusual under the hood too with elite secondary stuff doing so much of the work. That he’s doing it differently — walking people rather than giving in, and punching people out with three elite secondary pitches — and that he had this absurd “luck” over a full season once before starts to sound like maybe Snell, when he’s healthy and right, is an outlier.
As I wrote in the my piece on Outliers:
One of my favorite stats of all time is that Mariano Rivera had a career BABIP of .265 (league average over that span was .298) and a career HR/FB rate of 6.2 percent (league average was around 10.) The BABIP mark was the lowest of any pitcher over that 20-year span (minimum 1000 IP), and the HR/FB was second-lowest. How could anyone get so lucky as to be No. 1 and 2 in two different, “luck-based” metrics:?…
Outliers break models, they ruin the smooth distribution curves that make us feel like we understand what there is to know, they don’t dutifully regress to the mean over time the way to which they’re supposed. But we watch to find the outlier, not the average. We want to witness greatness because it reminds us of what’s possible rather than what’s likely.
When you’re dealing with a skillset so unusual, it’s folly to regress it to the mean and declare how much of it is luck. When we say “BABIP is luck”, or “pitchers don’t control balls in play”, we mean BABIP is usually luck or pitchers don’t usually control balls in play. We mean on average something is the case, and when we say that we mean because it’s this way on average, it’s probably this way in Snell’s particular case.
That’s well and good, but it’s a specious leap to conflate “probably” and “definitely.” And it’s not a matter of saying, “Well, if only 1 in 1000 pitchers are Rivera, then “probably” means 99.9%, and I’ll take that bet.” That’s bad reasoning because we’ve already identified Snell’s profile (leading the majors in walks and ERA, absurd LOB rates twice in his career, dominant on three different secondary pitches) as highly unusual and uncannily successful. You can’t lump him in as one of the 1000 or so garden-variety qualifying pitchers over the decades. He’s already on the short list of candidates for something unlikely to be true about him.
Just as we can feel pretty confident Mariano Rivera didn’t just get insanely lucky over 20 years in two separate metrics (Rivera pitched even better in the postseason, presumably against better competition too), we should be open to the plausibility of Snell’s uncanny “luck” having a skill component to it that defies facile quantification via garden-variety regression. In sum, you don’t know the extent to which Snell was lucky even if you can on average separate luck from skill fairly reliably.
. . .
But let’s set all of that aside for a moment and take a step back. So what if Snell were just insanely lucky? What if his defense helped him a ton, and when he did give up hard-hit balls with men on base, they just disproportionately were hit directly at his fielders? Do you not own your own luck?
Put differently, does a banked-in three-pointer not count on the scoreboard? Should we give extra points in basketball for swishes and fewer when the ball hits the rim and drops in? What does it matter how lucky a pitcher was in preventing runs so long as he prevented runs?
One might be tempted to answer: “Because if it’s luck and not skill based, then it’s not sustainable.” Someone on Twitter put it this way:
To which I responded:
This is a typical point of confusion for people who have mastered the 101 course, the idea that certain stats are more reliable indicators of performance than others, and that many of our traditional measurements of what happened like ERA or Wins are not especially predictive going forward. And that the reason they’re not predictive is that too much of it depends on luck or environment and not the actual skills of the pitcher. Should the environment change, or the luck simply regress to the mean, the pitcher’s performance will also regress. As such, if you sign a pitcher with a great ERA, but mediocre peripherals to a big contract, it’s likely to be a mistake.
But as I posted in my response, none of what you learned in your 101 course (congratulations on getting your diploma, by the way!) is relevant to this discussion. I am not arguing Snell should be paid more than Strider going forward. I am not ranking them for purposes of my 2024 fantasy drafts. I am casting my (virtual) vote for who should win the award for something that happened in 2023, which is in the past.
This is a category error people make time and again: predictive metrics (like strikeout minus walk percentage, e.g.,) are useful only insofar as they help inform you as to likely future outcomes. The outcome (run prevention) is the important thing, but past run prevention (the result) isn’t as predictive of future run prevention (future results) as some other indicators.
Hence, when looking toward the future, you should find the most reliable indicators (salt them with the understanding that all general indicators are imperfect and might not apply entirely to outliers) and act accordingly — that is, if you’re an MLB GM or fantasy baseball drafter.
But do not, under any circumstances, make the category error of substituting the indicator (what’s likely to predict future results) with the results themselves. Just because K%-BB% better predicts ERA than ERA, doesn’t mean that run prevention isn’t the most desirable result for a pitcher, and that when looking backwards, every team would rather its pitcher prevent runs, by hook or crook, rather than have impressive forward-looking metrics.
If you don’t understand the foregoing paragraph, please read it until you do. If you still don’t understand it, that means you’ve likely been indoctrinated into a religion I like to call: But My Process Was Good, and this post is a waste of your time. Stop reading it!
To summarize, for those of you still here, it really doesn’t matter how Snell allowed a whopping 42 percent fewer runs than Strider, but that he actually allowed 42 percent fewer runs on his watch. The only world in which that matters is the future one where considerations of sustainability which necessitate the skill/luck discussion apply. In 2023, where the results are all in, indicators go into the woulda/coulda/shoulda bin with all the other excuses, failed speculations and what ifs. That’s why I made the Jacob deGrom joke on Twitter — but for his bad injury luck, he was the league’s best pitcher!
. . .
There is one final argument for Snell, though the first two are, in my opinion, already more than sufficient: that substituting indicators for results leads to a reductio ad absurdum.
If we argue how a pitcher threw (strikeouts, walks, fly balls) is really the important thing rather than how many runs he gave up, then by that same logic, we can say that strikeouts, walks and fly balls are also results, and what’s really important was the velocity, movement, command and control of his pitches. For who knows whether he got unlucky with bad umpiring, pitch framing, quality of opponents, bad hitters luckily getting the bat on pitches they almost always miss. The pitcher can’t really control the results (strikeouts, walks) but only the location and quality of the pitches.
Let’s then measure the quality of the pitches and award the player who by some chosen metrics made the best pitches. Who cares how many actual Ks he had? Let’s give it to the guy with the most xKs™ and the fewest xBB™.
But even then the pitcher doesn’t really control the xKs and xBBs because how he throws is partly a product of his pitching coaches, his genetics, his trainers, his regimen, his diet, his health, etc. He has a say in it of course, but there’s a lot of noise in what team drafts you or who your high school coach or parents happened to be. We need a luck-neutral metric for accidents of birth! I mean if Strider were born to midgets on the island nation of Palau, it’s highly unlikely he puts up these numbers.
The bottom line is baseball is a game with certain rules and objectives. The primary objective for hitters is to produce runs, for pitchers to prevent them. Those are the results with respect to those players. We can see them on the scoreboard, and we know how they came about.
If you are looking to predict future results, past results, taken at face value, are often poor indicators — or at least poorer than other ones we have since come up with. Only in that context do runs scored and prevented lose signal and contain noise.
But the CY Young isn’t about the future — otherwise give it to the injured guy who projects as healthy next year, or the elite prospect who dominated from July on. It’s only about the past, and the past, unlike the future, does not need to be predicted! Stop using predictive metrics to measure it when the results are in front of your face.