Interview with Carl Morris

CHANCE magazine invited Carl Morris, professor of statistics at Harvard University, to talk with Jim Albert, professor of statistics at Bowling Green State University and editor of the Journal of Quantitative Analysis of Sports. Morris is a pioneer in the development of statistical thinking and methodology in sports and competition. Here, Albert talks with Morris about his interests in sports, how he used sports examples in his research, and his thoughts about the current and future use of statistical thinking in sports.

Portrait and collage of Carl Morris by Jim Harrison

Portrait and collage of Carl Morris by Jim Harrison

Jim Albert: I appreciate your willingness to be interviewed for this special issue with CHANCE. When I was working on my dissertation in the ’70s, my advisor told me to read the Efron and Morris Stein-estimation papers, and they were a big influence in my research. As a Purdue student, I didn’t think I could blend my statistics and sports interests. So I was very surprised to see the batting average estimation example in your papers.

The intent of this interview is to learn about your interests in sports, how you were able to use sports examples in your statistics research, and your thoughts about the current and future use of statistical thinking in sports.

Let’s start by talking about your interest in sports. When you were growing up, what sports were you interested in as a participant and a fan?

Carl Morris: As a participant, I did what other schoolboys did. My favorites were softball and touch football, and tennis as a teen and in my adult life. As a schoolboy, I loved poring over the statistics for the major U.S. sports. Every day started early for me while I was a newspaper boy in San Diego. I started reading newspapers and the baseball box scores daily at 5:00 a.m. before delivering them on my bike. I learned about baseball statistics from that.

In California in the 1950s, we only had the Pacific Coast League, the highest of the minor leagues. The nearest major league team was 2,000 miles away. There was almost no television. Still, I was blessed because for only 50 cents, I could take a bus to Lane Field and watch the San Diego Padres of the Pacific Coast League. I didn’t realize it at the time, but now I’m kind of awestruck that it just happened after World War II that the historic Major League owner Bill Veeck was busy integrating the Cleveland team with black players who had excelled in the Negro Leagues, and San Diego was Cleveland’s highest minor league farm affiliate. Veeck brought Larry Doby—the Jackie Robinson of the American League—Minnie Minoso, Luke Easter, and other greats to San Diego to prepare them for his Cleveland team. So I grew up watching the early great black players that way.

Carl N. Morris is professor at the Harvard Faculty of Arts and Sciences, Department of Statistics. Morris was editor of the Journal of the American Statistical Association (1983-1985) and executive editor of Statistical Science (1989-1991). He is a Fellow of the ASA, Institute of Mathematical Statistics, and Royal Statistical Society and an elected member of the International Statistical Institute. Morris has done pioneering work in the theory of statistics as applied to sports and competition, especially in baseball and tennis.

Jim Albert: What sports and teams do you currently follow?

Carl Morris: Over the years that’s depended on where I’ve lived, but as I live in Boston now, I especially follow the Red Sox in baseball and the New England Patriots in football. Mainly, I track all of the major American pro sports teams, and also the tennis tour.

Jim Albert: Let’s talk a little bit about your early exposure to statistics. When I was growing up, I was exposed to statistics and randomness through my baseball card collection and through baseball simulation games like Strat-O-Matic Baseball and All Star Baseball. And I followed the Phillies and idolized a few Phillies players. Did you have specific sports statistics memories as a child? Did you have heroes in sports?

Carl Morris: I especially enjoyed All Star Baseball. That’s the game that used spinners, right?

Jim Albert: The spinners were an integral part of All Star Baseball.

Carl Morris: I liked to make up my own player spinner disks, in addition to the ones that came with the set. That, with card and dice games, got me thinking about randomness. The better team loses sometimes, even when you’re doing the spinning yourself and you can see that your team should have a clear advantage. Did you ask me about my favorite players?

Jim Albert: Yes, who were your sports heroes?

Carl Morris: I loved sports statistics back then, before knowing anything about modern statistics. Ralph Kiner was hitting more home runs than anybody, so I enjoyed keeping track of his home runs. Lou Gehrig, whom I only read about because he had died, stood out as someone to be admired immensely.

Jim Albert: Frederick Mosteller was a very famous statistician who had a keen interest in sports. Are you familiar with Mosteller’s interest in sports? Were there particular papers Mosteller wrote that you liked? Why did he work on these sports problems?

Frederick Mosteller

Frederick Mosteller

Carl Morris: I never knew why, but he loved sports since his Carnegie Mellon college days and he loved statistics about everything. I read several of his sports statistics papers about baseball and football scores. One paper showed that the better team often wouldn’t win a World Series, and with Bernoulli trials, how the probability of winning the series is easily calculated. His papers tended to be more about the data than deep theory.

Fred started a small sports seminar during my first two years at Harvard with Bernie Rosner, Hal Stern, Lisa Burdick, me, and several others. I regret that we didn’t wind up publishing anything about that as a group.

Jim Albert: Were there other early statisticians who worked on sports problems?

Carl Morris: The first time I heard of a profession called “statistics,” I thought the entire field was Allen Ross, the famous Brooklyn Dodger statistician. I thought that was the only statistics job, and for a while, I yearned to have that job someday. Later, I realized that wanting to be a statistician was like wanting to be a movie star—you either made it big or you starved. So I gave up that dream in high school, not realizing there was another avenue into statistics. But it’s been funny to think of statisticians as being like movie stars, because that hasn’t been our image.

Jim Albert: I believe there was a 1954 LIFE magazine article titled “Goodbye to Some Old Baseball Ideas” that described Branch Rickey’s use of Allan Roth as a statistical analyst for his team.

Carl Morris: Another early person we just have to mention was an operations researcher, George Lindsey, who in the late 1950s developed baseball’s first “Markov matrix,” the 3 by 8 table of expected runs for an inning’s remainder for each of the 24 batting situations. That was among his three pioneering baseball papers in academic journals, 1959, 1961, and 1963 (see Further Reading).

Jim Albert: Let’s move on to talk about your use of sports examples in your papers. The most famous example is the Efron and Morris batting average example from the 1975 JASA paper that’s still being used in modern texts to illustrate Bayesian hierarchical modeling. There is a new book by Parent and Rivot on Bayesian modeling where a single appendix is devoted entirely to an analysis of that data set.

Parent and Rivot write that “Efron and Morris brilliantly exemplified using an interesting sports data set that the James-Stein estimator does perform better at predicting subsequent performance and the observed averages.” Can we can talk about the history of that data set? How did Brad and you collect the data? Why did you think it would be helpful for illustrating empirical Bayes modeling? Why did you ensure that everyone had 45 at bats in that data set?

Carl Morris: Okay. Brad and I were classmates as Caltech undergrads and again as Stanford PhD students. In the spring of 1970, some years after our Stanford graduations, we talked one evening outside the statistics department at Stanford and decided to write a paper together. What should it be about? Brad suggested, “Let’s work on Stein’s estimator.” Because so few understood it back then, and because we both admired Charles Stein so much for his genius and his humanity, we chose this topic, hoping we could honor him by showing that his estimator could work well with real data.

Stein already had proved remarkable theorems about the dominance of his shrinkage estimators over the sample mean vector, but there also needed to be a really convincing applied example. For that, we chose baseball batting average data because we not only could use the batting averages of the players early in the season, but because we also later could observe how those batters fared for the season’s remainder—a much longer period of time.

To really test Stein’s procedure, we also needed to know that at least one batter was very different (much better, in this case) from the group norm. That was the great Roberto Clemente. He had batted 45 times on the weekend when data became available, so our sample chose him and all the other batters who also had batted 45 times through the same weekend, a total of 18 players. Stein’s estimator required approximate normality (satisfied by a sufficient sample size for each batter), and it was designed only to work for equal variances—meaning, in this case, that all batting averages had to be based on the same sample size, 45 at bats. Finding all such players required reading microfilms of The New York Times sports pages, as there was no Internet and no records were kept for mid-season data. So that’s how we got the 18 players. Stein’s shrinkage estimator had a total square error that was several times smaller for predicting the season’s remainder batting averages than did the usual separate vector of unbiased estimates.

Jim Albert: Later in your parametric empirical Bayes JASA paper (1983), you use the career trajectory of Ty Cobb’s batting average to demonstrate the use of a hierarchical regression model. Again, why did you use a baseball example there? And how does this example go beyond what you did for the Efron and Morris batting averages?

Carl Morris: I did that baseball example because it helps immensely to think about statistics in terms of real examples, and because thinking about real data in any field raises interesting statistical issues that often apply in other contexts, too. In this case, my Harvard medical school colleagues were concerned with ranking hospitals based on surgical success rates, and that’s the same problem as ranking baseball batters—hospitals also have “batting averages.”

The interesting question lurking behind the Ty Cobb analysis was “has there ever been a ‘true’ 400 hitter?” There have been about 13 batters in modern baseball who have hit 400 (a batting average of .400) over a full season, but maybe they had been “true” .380 hitters who were lucky enough to bat a standard deviation or so above .380. A true 400 hitter differs from one with an observed seasonal .400 batting average, because “true” refers to the unobservable probability of getting a hit, averaged over an entire season. That was the reason I used Cobb’s batting data—because it seemed that if anyone ever at some point had been a true 400 hitter, it was Cobb.

The Cobb analysis also was empowered by theories I’d been thinking hard about, including 1) the unequal variance problem—remember that the variances had to be equal to use Stein‘s estimator, 2) an underlying regression for the 24 annual random effects (Cobb played for 24 years), and 3) putting variances on the shrunken estimates, which were needed to answer the question. In statistics, it’s not enough just to estimate a quantity, the estimate’s accuracy also must be provided.

So I developed a two-level hierarchical model, like those of several Efron-Morris papers, to frame the question of whether Cobb ever had been a true 400 hitter. And the answer was that in the year when Cobb was most likely to have been a true 400 hitter, there was only a 34% chance he actually was. But there was an 88% chance that in some unspecified year during his career, we can’t say which year, he actually had been a true 400 hitter. So probably there has been such a hitter, with Rogers Hornsby being the only other likely possibility.

Jim Albert: This reminds me of another famous paper by Stephen J. Gould that discusses the disappearance of the 400 batting average. His explanation for this disappearance is that the variability of batting averages has decreased over time. Players are getting better, which means they are more similar in their batting abilities.

Carl Morris: Good point. I agree. Also, hitting has become tougher. Relief pitching has gotten much better since the last of the .400 hitters because managers stopped leaving the starting pitchers in games after they began to pitch poorly. Also, starting pitchers, now knowing they aren’t expected to pitch complete games, can pitch at full throttle.

Jim Albert: Let’s return to our discussion about the early pioneers of statistical thinking in baseball. The person everyone thinks about is Bill James, and the world got acquainted with Bill James through his Baseball Abstracts in the 1980s. I understand you were one of the early readers of his abstracts.

Carl Morris: Can I tell a story about my first contact with Bill James? In 1976, uncharacteristically for me, I browsed through some ads in an issue of The Sporting News. A four-line ad in the back caught my eye, offered by someone who claimed he had some new ways of looking at fielding statistics. After much mulling, I finally sent him $2.00, the first and only time ever for me to respond to such an ad. A week later, a mimeographed 75-page pamphlet arrived from Kansas, written by Bill James. It was the first year he’d ever written his Abstract. With it was a little handwritten note that I‘ve kept and treasured. It said, “Dear Mr. Morris, I hope you enjoy this article, but if you don’t, I’ll gladly refund your $2.00. Well, not gladly, but I will refund it. -Bill James.” The rest is history, of course. I was one of about 70 people to get that first issue.

I’ve loved Bill James for his humility and his willingness to try bold new things. And Jim Albert, there’s a lot in you like that, too. There have been questions about whether academic types should be doing sports analyses, but as a professor you’ve gone out and done just that. Bill James also opened up shop and started his pioneering work in the basement of his office somewhere in Kansas.

Jim Albert: I believe Bill James was a watchman for a factory in Kansas and he spent his evening hours in the factory with baseball books. “60 Minutes” had a nice segment devoted to Bill James titled “Stat Man.”

Jim Albert: Andrew Gelman, in a recent Baseball Prospectus article, said that Bill James’ writings were very influential in getting him interested in statistics. Gelman has said that James uses a number of important statistical principles in his writing, such as the focus on a baseball question rather than methodology. Do you think of Bill James as a statistician?

Carl Morris: Bill James has keen instincts for data and for statistical inference. He’s a virtuoso sports statistician and a terrific expositor. I don’t know whether academic statisticians consider focus on the question to be a statistical principle, but it needs to be if one is working with real data and is passionate about learning from it.

It’s very important in teaching to have good data sets, data sets that people are familiar with and for which the relevant questions are apparent. I use health data and sports data for that reason, so I can think about the connection between application and theory. In that way, not only does theory enhance the application, but knowledge of the application suggests the appropriate theory, and raises better questions. I’m sure you’ve noticed that, too.

You also need stories, as my colleague Joe Blitzstein emphasizes. He’s been teaching the most popular first course in probability that we’ve ever had at Harvard. He tells stories to illustrate the ideas of probability theory. That makes for a big part of his course’s success.

Jim Albert: The book and movie that popularized statistics in baseball is Moneyball. You were mentioned in the Bill James chapter of that book. What do you think about the Moneyball revolution in major league baseball and its acceptance in other sports?

Carl Morris: Well, Moneyball has had a huge effect on the acceptance of statistical thinking in baseball, and in other sports, too. It also has affected the number of students interested in statistics and has increased statistics course enrollments. Of course, computers and the Internet also have made these advances timely and possible by giving us the ability to collect the data efficiently and to maintain it. Early baseball pioneers, like George Lindsey and others, didn’t have these tools and so their work didn’t attract and stimulate large audiences.

Jim Albert: I believe baseball is the most statistical of sports in terms of its access and use of data. The field of sabermetrics has advanced quite a bit since the 1980s, especially with the collection and analysis of new kinds of fielding and pitching data. Are there still interesting baseball problems to solve?

Carl Morris: Plenty, I think, especially in this era of Big Data. There now are multiple sensors in every MLB stadium that keep track of the direction, speed, and spin of every pitched, batted, and thrown ball.

Fielding historically has been the hardest to measure, but such electronic data, combined with other kinds of electronically collected data, will lead to substantial new progress in measuring fielding talent.

Beyond that, there’s also more low hanging fruit that remains for baseball analytics, such as forecasting performance and determining optimal in-game strategies for batter and relief pitcher substitutions. It’s not clear if most teams recognize the potential benefits. Perhaps most baseball team owners and managers have considered the possible benefits of an analytics group for their teams, but even so, nearly half of the Major League teams have yet to assemble an analytics group.

Jim Albert: There is a new book by Ben Baumer and Andrew Zimbalist in which they talk about the current state of analytics in sports. They mention a number of interesting problems that have not been addressed in sports, including baseball. For example, we measure individual performance, but I don’t think we quite understand how players work together. For example, defense in baseball is a collective effort of the defense and the pitcher.

Carl Morris: Good point. Teams should consider assessing a player’s cooperative skills. In baseball, fielders must cooperate with each other, as when throwing to the right base and on relays. Other team sports depend more than baseball on cooperation. In American football, nobody accomplishes much without the help of the 10 other players.

Jim Albert: When a running touchdown is scored, the running back gets all the credit for the score. But this touchdown is a combined effort of the offensive line that is creating the hole and the running back.

Carl Morris: Yes, and it also depends on the players who are faking and threatening to pass. They all tie together.

Improved analytic strategies that can help team performances most mainly separate into those that benefit field managers and those that benefit general managers. Field managers, who are limited in games to the available personnel, can be helped by finding optimal tactics like what plays to call and when to make player substitutions. General managers, who are expected to acquire and evaluate players and decide on salaries, can benefit especially from improved statistical forecasts of how well players are likely to play in the future.

Jim Albert: The media shows pictures of locations of pitches, but I don’t think we understand as well about the sequence pattern of pitches that makes a pitcher successful. The recently elected Hall of Fame inductee Greg Maddox was partly successful due to the sequence of pitches and how he chose pitches in particular circumstances. I’m not sure we understand that process that well.

Carl Morris: And, of course, pitching choices depend on the batter’s strengths and weaknesses. There’s much that game theory can add, too, via randomized strategies. Optimal strategies would blend with Bayesian considerations, which depend on past player and team predilections, and also with game theoretic randomized minimax strategies.

Jim Albert: Let’s switch sports and talk about tennis. Both of us have a passion for baseball and tennis. What are some of the interesting statistical problems you’ve considered in tennis?

Carl Morris: Jim Albert, you’re a tennis player, too, right?

Jim Albert: Right. I’ve been playing tennis for a while.

Carl Morris: Me too, almost all of my life. There’s much they haven’t done yet in tennis, and this is true of other sports, too. They collect more data now, but the media doesn’t do anything much different with the data, except look at it in more microscopic (disaggregated) ways. When I was much younger, I wanted to do something with tennis, but there were no data. I wondered, “How can one do something statistically interesting with tennis when there are no data?” Then I realized the obvious answer was to make it possible to collect data. So I developed a box score to track the statistics of serves and of receiving, long points, etc. An early application was to the King/Riggs match back when it was played in 1973. I developed a simple way to collect detailed data. I’ve used it to keep data for many matches, at all levels of play. I still plan to make it available for players via an electronic scorekeeper worn on players’ wrists that keeps score and retains point-by-point data to be analyzed.

Jim Albert: One interesting aspect of tennis is the tiebreaker, which is how a set is completed when the game score is six games all. The player who wins seven points wins the set. Announcers typically provide player records in tiebreakers with the implication that there is clutch ability in tennis. Do you believe some tennis players like Rafael Nadal or possess clutch ability?

Roger Federer

Roger Federer

Carl Morris: I do, Jim Albert. I’m not so sure it can be shown easily in the data because those players are very likely to win anyway. For sure, great players let up a bit. If they’re sailing along, leading five games to one, they don’t knock themselves out to win the next game, but they try their hardest in the clutch. I don’t know if clutch ability can be separated from the level of effort. You as a player must have some thoughts about that. As a player, I know what goes through my own mind at clutch times, and it’s not always what I’d wish for.

Jim Albert: I have had enough experiences in USTA tennis to believe I have the ability to choke in clutch situations. But that is different than the clutch performance of professional players. Generally, there is much interest in the role of momentum or streaky behavior in sports. Are there particular sports that you think exhibit true streaky patterns that would be different from what you would see in coin flipping?

Carl Morris: Yes, I believe in streakiness in baseball and tennis. Hal Stern and I once wrote a discussion of a paper by Christian Albright that appeared in JASA in 1993. Albright thought he had shown that, in baseball, the length of hitting streaks was no more than expected at random. However, when we carefully reanalyzed it, there was just enough bias in logistic regression, combined with the sample sizes that prevailed in Albright’s analysis, to offset the amount of streakiness that would be expected. So I believe that if Albright were to redo his analysis with bias-corrected logistic regression, his data actually would prove that streakiness does exist. Bill James has said it’s very hard to prove streakiness in sports, but that’s different from disproving it. That’s a separate issue, about whether there’s enough power in the data. I also believe streaks occur in tennis, and I’ve seen good evidence of that in professional tennis data.

Jim Albert: In college basketball, one sees big momentum shifts in scoring during games. Obviously something is going on there. Also, I believe you see more interesting streaky behavior in individual sports such as horseshoes or bowling.

Carl Morris: Well, what’s the famous paper about shooting percentage in basketball and the hot hand question?

Jim Albert: That was the Tversky and Gilovich paper that appeared in CHANCE in 1989.

Carl Morris: People, when they learn that I’m involved in sports statistics, ask me about hot hands and streaks more often than anything else.

Jim Albert: The Tversky and Gilovich CHANCE article and the rebuttal CHANCE article by Larkey, Smith, and Kadane were good discussions of the hot hand effect in basketball.

Other sports like soccer and volleyball appear to be relatively unsophisticated with respect to statistical thinking. Why is that? Do you think statistical thinking will become more important in these other sports?

Carl Morris: I sure do. Soccer, ice hockey, and basketball have similar scoring structures, except there are relatively few scores in soccer and hockey. Still, any progress with Big Data in basketball will help with modeling of Big Data in these other two sports. Kirk Goldsberry of ESPN with Luke Bornn on our faculty and a team of our PhD statistics students have made some remarkable progress in analyzing high frequency NBA data. With relatively few goals scored in hockey and soccer and those being shared by so many players, any one player’s statistics are very imprecise for an individual game, and not that precise even when aggregated over a season. Andrew Thomas of Carnegie Mellon has tracked the puck’s progress as it moves through 15 hockey zones, but that’s about the limit for manual data gathering. Spatial/temporal data, where players and the puck are located several times per second, are bound to provide new insights, at least when experts figure out how to analyze them usefully. A societal benefit is that any progress with such Big Data in sports is likely to carry over to benefit applications to other fields.

Jim Albert: I have seen some new work in ice hockey, and it’s pretty exciting to see what can be learned through statistical analyses.

Can we talk more about statisticians who work on sports problems? Some statistics colleagues are reluctant to write sports papers because they feel these papers will not count toward tenure and decisions about promotion. Would you encourage statisticians to work on sports problems?

Carl Morris: Most academically trained statisticians who have analyzed sports data have done so mainly because they love doing it, not because it helps their careers. That could be changing now that sports data challenge us to deal with Big Data. Also, sports teams are becoming employers, although in a limited way, for those with strong analytic skills. The sabermetrics revolution and Moneyball have helped inspire professional sports teams to get with analytics. So has the Internet and some popular sports analytics conferences like MIT’s annual Sloan Conference and the biannual New England Symposium on Statistics in Sports (NESSIS) conference at Harvard. We also have quality journals for sports, especially the journal you’ve been editing, the Journal of Quantitative Analysis in Sports (JQAS), now under ASA auspices. The upcoming Journal of Sports Analytics aims relatively more to addressing the issues of decisionmakers in sports.

Jim Albert: I personally see sports as a way of communicating statistics to the public. One reason I was motivated to write Curve Ball (with Jay Bennett) is that we felt like there was an opportunity to really explain statistical thinking at a relatively simple level in sports.

Carl Morris: Yes, Brad Efron and I experienced that by using batting averages to demonstrate how Stein’s method actually worked with real data. That one single example, combined with knowing there were theorems behind it, dramatically changed people’s willingness to accept Stein’s shrinkage procedures. It made it clear then that very substantial improvements over the standard use of separate estimates were possible.

Jim Albert: On a related subject, college students are often either afraid or bored with statistics when the material is taught using traditional ways. Do you think constructing an introduction to statistics course with a sports focus would help some students overcome their aversion or anxiety to learning statistics?

Carl Morris: Well, when Hal Stern and I once team-taught an introductory summer course at Harvard, there was a lot of initial enthusiasm because we promised an emphasis on sports applications, but I couldn’t tell ultimately whether that helped students understand statistical ideas better. In other undergraduate courses since then, I’ve used sports data for perhaps a quarter of my examples. That also attracted certain students, but certain other students weren’t happy with so many sports examples. So, you can overdo sports—unless the course is explicitly advertised that way, as you’ve done in some of your courses, Jim.

For those who are especially interested in sports, sports data can help a lot. Actually, any application a student appreciates and that shows how statistical thinking works is a strong motivator for learning. It really facilitates learning if we show students how statistics helps them understand topics they care about.

Jim Albert: In the current age of Big Data, there are challenges and opportunities in working with large data sets. Are there interesting Big Data sets in sports? And what can we learn from these data?

Carl Morris: As you said, we talked about Big Data with respect to the spatial-temporal data that’s being collected by several professional sports leagues, and about fielding data in baseball. I think the answer is yes, but what can we learn? Investigations are embryonic now and it’ll take a few years to know what really can be learned.

Jim Albert: You have mentioned that some of your colleagues at Harvard have worked on sports. I understand at Harvard you currently have a pretty active sports analytics group. How did this group get started?

Carl Morris: You’re referring to our undergraduate sports club HSAC, an acronym for Harvard Sports Analysis Collective. HSAC started in 2006 when Rohit Acharya, an econ[omics] concentrator from Winthrop House, came to my office to say that Harvard really should have an undergraduate sports group and he wanted to start one. He thought then that it would center on using regression methods to analyze all kinds of sports, but of course many other statistical tools also have been used. I’d also been thinking about the need for a sports club and I encouraged him. Rohit designed the original agenda, recruited the original members, and served as HSAC’s first president. I’ve been HSAC’s faculty advisor for all of its eight years. Some of HSAC’s current and past leaders have moved on to analytics careers with professional sports teams, and the club keeps growing.

Jim Albert: Can you mention briefly some of the sports and the problems that this sports analytics group is considering?

Carl Morris: Readers can answer this best by looking on the Internet at the HSAC website. It’s always current with many posts and recent activities. The bulk of the articles have concerned professional and college football, basketball, and baseball in the United States. There also have been articles about soccer, hockey, tennis, and foreign sports, but less frequently. Recent analytic issues have included team rankings, player evaluations, salary predictions, injuries, draft analyses, and commentaries on published sports analyses. The students support each other with data sets and by sharing ideas in weekly meetings.

Jim Albert, you’ve attended some of the biannual NESSIS conferences that Scott Evans and Mark Glickman have led and organized at Harvard, for the fourth time last September 2013.

Would-be analysts have to struggle to get the data they really need. Most teams want analysts to collect the needed data and do something really interesting before they hire an analyst. People are always asking me how to get a job doing sports statistics? Well, it ain’t easy. Bill James told me that himself, even though he has had a really good job for years with the Red Sox.

Jim Albert: In JQAS, I am seeing a greater variety of sports and sports questions being considered in the submitted papers. Recently, we have received papers on fly-fishing, fantasy football, track cycling, and scramble golf tournaments. Also there are exciting new types of data being collected. ShotLink data collected by the PGA Tour collects the actual locations of every shot during a PGA round. In the past, one measured putting performance by the number of putts in a round. Now one knows the length, inclination, and break of each putt, and we can obtain much better measures of putting performance. The opportunities for statisticians working with these new sports data sets are remarkable.

Carl Morris: Another question is whether you can get a paying job.

Jim Albert: Doing the work is one thing, and being paid for it is another thing.

Carl Morris: You make an interesting point. The PGA recently offered a cash prize for the best analysis of ShotLink data. My understanding is that they didn’t get many submissions, even though there was an award for the best prediction formula and the data were made available.

Jim Albert: As you said, the winner of this contest received a cash prize, not a job.

Carl Morris: Some of our HSAC students, while undergraduates, have gotten internships with pro teams. Most of those have been asked to research innovative statistical projects. Some interns have gone on to take full-time jobs as analysts for professional teams, helping their team’s GMs and other leaders with management decisions. I’ve talked with one frequently who is now a team’s director of analytics. In his position, he’s often trying to find and employ highly qualified statisticians and computer scientists as interns and full-time employees to do sports analyses. Increasingly, professional sports teams are building such analytics groups—this is a growth period.

Years ago, when I was a PhD student at Stanford, I could identify strategies I thought might win an extra 3–5 games per season for the San Francisco Giants by improving some inefficient game strategies they were using. If one could do that today, it would be worth millions. Baseball players who, over a season, can win that many extra games above replacement-level players typically make $10–15 million annually above the replacement-level salary. Unfortunately, teams have yet to pay as much for the same benefit of analytic work.

Jim Albert: I do think sports provide a wonderful environment for learning statistical methods. The sports data sets are very available, and I believe students can really learn much about statistical reasoning using these data sets. These experiences working with sports data can eventually lead to jobs outside of sports.

Carl Morris: Oh, yes. Some HSAC members who have listed on their vitae authorship of a published sports article have told me their non-sports job interviewers wanted to talk with them about their article and that led to their getting hired.

Jim Albert: Are there some additional topics we should discuss relative to sports and statistics?

Carl Morris: I thought you might be interested in the connection between thinking about sports data and how that leads us to models and analyses that translate to other fields.

Jim Albert: One reason why I like to work in sports like baseball is because I understand baseball so well. I can see if estimates in statistical models line up with my baseball intuition. When statistical conclusions don’t line up with my intuition, then that leads me to question my modeling assumptions and methods.

Carl Morris: Yes, exactly! That’s how it ought to be. In your case, you know a lot about baseball. We statisticians do best and learn most when we know a lot about how the data were generated and the field of study. This also can happen when we work as a team with interdisciplinary colleagues who have those understandings. The excitement for me and for many doing statistics is just seeing how great our subject is, partly for the generality of its theories and partly for what we can learn from modeling real data.

I find working on real applications helps me think more about better theory, too, because it challenges us to not sweep real issues under the rug. With real data, we pay more attention to the assumptions we make, whether they are valid, and how we can develop better methods and models that reflect the realities of the data and the science we’re investigating.

Jim Albert: That is a very good summary. Why should statisticians work with sports data? One reason is that it is a good illustration of how a statistician should be very familiar with the applied discipline.

Carl Morris: Yes, it’s easier to understand sports data for sports we’ve played and to think more creatively statistically. I know this, because there are some sports, like ice hockey and soccer, that I don’t understand very well, even now. It would help me if I were to analyze data from those sports if I had played them.

Jim Albert: Since I am a tennis player, I understand the importance of strategy. Much of tennis is exploiting your opponent’s weaknesses. And the selection and location of shots are important.

Carl Morris: Do you love doubles?

Jim Albert: Doubles is very different from singles in tennis. One challenging aspect of doubles is playing effectively with your partner. This includes covering the court, moving to net together, and other things.

Carl Morris: Definitely. Wayne Bryan, father of identical twins Bob and Mike, the world’s best doubles team, says, “Singles is checkers, doubles is chess.” That sums it up for me. I’ve loved doubles even more than singles because of the teamwork and the strategic chess-like aspects of doubles. Of course you’ve got to have a partner who understands this, too, and who moves with you. It’s hard to find such partners, but when you do, doubles is a fantastic game.

Jim Albert: Right. For example, if both you and your partner go after every ball, that usually creates conflict. You need to have one person be designated as the one who runs around the court, such as going back for a lob. One needs to be very aware of one’s partner’s position on the court.

Carl Morris: Let me raise one final specific topic. Statisticians who understand a sport well can use that to learn about how people interpret and use statistics and probability in sports. Such opportunities arise when we hear, read, or watch discussions about sports analyses and get the thoughts of team leaders, members of the media, and of fans. This gives us a lens into the thinking processes of bright, motivated people who are untrained statistically about an inferential situation we understand. We get to see the smart things they say and their statistical flaws. Such people often raise interesting questions that can be analyzed in principled ways and that sometimes give us new ideas to analyze.

Further Reading

Mosteller, Frederick. 1952. The world series competition. Journal of the American Statistical Association 47.259: 355–380.

Mosteller, Frederick. 1997. Lessons from sports statistics. The American Statistician 51.4:305–310.

Rickey, Branch. 1954. Goodbye to some old baseball ideas. LIFE, Aug. 2.

Lindsey, George R. 1959. Statistical data useful for the operation of a baseball team. Operations Research 7(2):197–207.

Lindsey, George R. 1961. The progress of the score during a baseball game. Journal of the American Statistical Association 56.295:703–728.

Lindsey, George R. 1963. An investigation of strategies in baseball. Operations Research 11(4):477–501.

Parent, Eric, and Etienne Rivot. 2012. Introduction to hierarchical Bayesian modeling for ecological data. CRC Press.

Efron, Bradley, and Carl Morris. 1975. Data analysis using Stein’s estimator and its generalizations. Journal of the American Statistical Association 70.350:311–319.

Morris, Carl N. 1983. Parametric empirical Bayes inference: Theory and applications. Journal of the American Statistical Association 78.381:47–55.

Gould, Stephen Jay. 1988. Trends as changes in variance: A new slant on progress and directionality in evolution. Journal of Paleontology 319–329.

James, Bill. 1984. The Bill James baseball abstract, 1984. New York: Ballantine Books.

Gelman, Andrew. 2011. A statistician rereads Bill James. Baseball Prospectus.

Baumer, Benjamin, and Andrew Zimbalist. 2014. The sabermetric revolution: Assessing the growth of analytics in baseball. University of Pennsylvania Press.

Albright, S. Christian. 1993. A statistical analysis of hitting streaks in baseball. Journal of the American Statistical Association 88.424:1175–1183.

Stern, Hal S., and Carl N. Morris. 1993. Comment. Journal of the American Statistical Association 88.424: 1189–1194.

Tversky, Amos, and Thomas Gilovich. 1989. The ‘hot hand’: Statistical reality or cognitive illusion?. CHANCE 2(4):31–34.

Larkey, Patrick D., Richard A. Smith, and Joseph B. Kadane. 1989. It’s okay to believe in the ‘hot hand.’ CHANCE 2(4):22–30.

Lewis, Michael. 2004. Moneyball: The art of winning an unfair game. WW Norton & Company.

Goldsberry, Kirk. 2014. Databall. Grantland Feb. 6.

Tagged as: , , ,