Response of Jason Wilson and Jarvis Greiner to David Hoaglin’s Letter to the Editor
We would like to express our thanks to David Hoaglin for his careful reading of our work and for his critical feedback. We are honored that someone of his stature in the community would take the time. To frame our response, we remind the reader that we wrote our paper not to assert a final model for quantifying curveballs, but rather to put the idea for whether a curveball could be quantified to the public test, which is essentially what we stated in our conclusion: “If our concept were adopted, the coefficients would need to be updated with more data from different pitchers with different skill levels rated by different coaches.”
The critiques raised by Hoaglin pertain only to technical limitations of our model and technical interpretive details of our presentation. The critiques do not question the validity of our fundamental concept and can therefore, in some sense, be viewed as an implicit affirmation. We readily admit the specifics of the model need refinement.
Hoaglin focuses his critique of our paper on five main points, and below we offer a more detailed response to each of these.
First, he writes “predictions that involve low values of these two variables would not be supported by the data.” We agree with Hoaglin and we nowhere claimed the entire range of predictions would be supported by the data. Our data set was limited and the model was preliminary, as stated above.
Second, regarding the “held constant” interpretation, we have followed Raymond Myers’ Classical and Modern Regression with Applications, and refer the reader to compare it with Hoaglin’s forthcoming paper “Regressions Are Commonly Misinterpreted” in The Stata Journal.
Third, Hoaglin points out that one of our sentences is technically incorrect: “If the correlation is significant, then the shape of the scatterplot will have a linear trend and a simple linear regression line may be fit to the data.” He is right, as Jason teaches in his stats classes. Most of the time two randomly generated variables will have non-zero correlation. In the context of the paper, this sentence occurs inside of the “Multiple Regression” explanation feature we were required to write by the editor as an introduction for the nonstatisticians in our audience. We intended simply to convey that correlation is supposed to signal a linear relationship, as illustrated by our accompanying scatterplot.
Fourth, Hoaglin questions the independence of our 30 pitches. The point here is well taken. All we mean by it is that the pitches are thrown under the same experimental conditions, with sufficient rest time. They do not have the dependence and non-randomness that a sequence of pitches influenced by game conditions would have, which was part of the reason given for this experimental approach. The inclusion of a pitcher effect in the model showed no significance, as stated in our article (p-values: 0.41, 0.69, 0.58).
Fifth, Hoaglin caught an embarrassing typo, “Wilk,” not “Wilks.” This error had been corrected in an earlier version of the manuscript and somehow crept back in. We again thank David Hoaglin for these clarifications.
Since publication of our original article, we have advanced the concept of the paper in two ways. First, we successfully tested the concept on major league pitchers, using 500+ pitches scored from the 2014 baseball season of PITCHf/x data. (We changed the rating scale to -10 to 10). Second, we extended the model from curveballs to all pitches. The response in our new model therefore represents the overall curvature quality of the pitch. We are actively refining the model, but the current version is:
rating = -4.3rise + 0.3breakpoint + 0.5total_break + 0.7location + 0.9horizontal_break
Note that “location” is improved from “knee distance.” to include two dimensions. Location values are negative, meaning a positive coefficient is desired and corresponds with our old model. The current new model statistics are: Adj R-sq = 0.60; F-stat = 152.9 on 5 and 501 DF; p-value < 2.2e-16.
About the Authors
Jarvis Greiner—from Edmonton, Alberta, Canada—earned his BA in cinema and media arts production as a multi-sport, dean’s list student athlete at Biola University in La Mirada, California. As a former varsity baseball player, Greiner has three passions: baseball, statistics, and video production. This makes him excited about making curveball quantification practical to the baseball community. His goal is to make this index a standard, much like mph is to judging velocity.
Jason Wilson earned his PhD from the University of California, Riverside, and is an associate professor of mathematics and statistics at Biola University. Wilson’s research specialization is in the application of statistical methods to biotechnology. When not too busy teaching, he enjoys statistical consulting. Most recently, he has been doing undergraduate student-collaborative research in such areas as Yahtzee calculations, crime data, and the integration of statistics with the Christian faith.