Take Me Out to the Ball Game


When I learned the theme of this issue was “Sports and Statistics,” I’ll admit I felt some trepidation. Although many of my students dream about jobs as sports statisticians—and others became interested in statistics because of a love for sports—the arena, the field, and the court are outside my areas of knowledge for the most part. Yet I know there is a thriving community of sports statisticians, and I know sports and statistics are a natural pair. So with the spirit of expanding my horizons, I decided to dive in and see what the sports world is making of the Big Data revolution.

The first thing I noted is there is a real distinction to be made between what one might naïvely call “traditional” sports statistics—the collection of data on players and teams and their analysis (think Moneyball)—and the realm of Big Data sports. The former is definitely classical data analysis, without the trappings we have come to associate with Big Data. Fans, or team statisticians, collecting player statistics and analyzing them, is after the fact, whereas a hallmark of Big Data is immediacy and evolution in real time. Traditional sports statistics are also quite limited in scope; in baseball, you have measures such as “runs batted in” and home runs (or singles, or doubles, etc.), which are discrete metrics that summarize a player’s performance. By contrast, the “new” sports statistics might derive from monitors that capture every move of every player over the course of a game. Big Data, as I learned, even makes an appearance in the food that gets served at stadium restaurants.

Clearly, there is much more here than the fantasy football or basketball leagues that so many participate in as a pastime, although these also have been affected by the revolution. As I read on more than one website, “Big Sports is Big Business, and Big Business means Big Data.”

Big Data for the sports world seems to have two major intertwining directions. One revolves around enhancing the fan experience, while the other focuses on improving a team’s ability to win games. These are not altogether unrelated, of course, in the sense that the fan will have a better experience if her team wins! But the enhancement of fan experience is more immediate and tangible—actually catering to the needs and desires of the fans in the stands or at the concessions and stadium restaurants.

For example, the New England Patriots have put WiFi in their stadium so fans watching the game can more easily upload photos and post comments to social media sites, access information about wait lines for the bathrooms, and even order food from their seats. The idea is to combine some of the conveniences of home viewing with the live experience. Our wired world is what makes this possible.

Also in the vein of enhancing the experience of the fan, there are many websites dedicated to sports analytics, which often contain large databases of player, team, and tournament statistics. IBM’s SlamTracker is designed specifically for the Wimbledon tennis tournament and emphasizes performance indicators, whereas the Wimbledon Social Command Centre, as the name implies, is devoted to social aspects of the match—fan and player conversation and sentiment analysis of discussion topics.

Brooks Baseball is a website that gathers data on baseball players. Each player has a “card” that contains detailed information about his pitches (for pitchers) and hits (for other players). You can ask for tables or graphs of data, and these can be sliced and diced in many ways. You can even specify what type of error bars you want on the plots you produce! There is a wealth of data here, which even a non-baseball fan such as I can appreciate. Indeed, I began to see some of the attractiveness of this pursuit and had to pull myself away from experimenting with all the various options for data presentation and analysis.

The National Basketball Association (NBA) has recently paired with a company called STATS LLC to install the SportVU tracking technology in every NBA arena. According to the website, the system consists of six cameras, three on each half-court, and software developed by STATS that tracks every player and the ball 25 times per second—a truly staggering amount of data collected over the course of a game.

The data are analyzed, although apparently not in real time, and many types of summaries are produced. These include information about rebounding opportunities, speed and distance, shooting efficiency, passes, and much more; measures can be output for players or teams, for regular season or playoff games. Currently, only data for the 2013–2014 season seem to be available.

Again, as with the Brooks Baseball site, any “data geek” will find plenty here to latch on to and explore. Major League Soccer has a similar system in place in all of that league’s playing fields.

Likewise, during the 2010 Olympics, NBC collected massive amounts of data from more than 50,000 participants in the “Billion Dollar Research Lab” on their viewing habits. This helped the network determine that live streaming of events did not detract from prime-time viewing, contrary to expectations. Based on that information, collected and analyzed in near real time, NBC changed how they handled major events, including the closing ceremony that was live-streamed (the opening ceremony had not been).

The ability to make such decisions, based on data collected while the Olympics was still going on, was actually important for the network, which had received criticism for showing events on delayed time to coincide with American prime-time television hours. According to an article in The New York Times, NBC counted 83 million comments related to the Olympics on social media platforms such as Twitter. Not unexpectedly, most of the social media action took place during prime time and much of it was complaining about the broadcast delays.

Another intersection between Big Data and fans is through fantasy leagues, such as fantasy football. Before researching this column, I had some notion that fantasy football is popular and that data collection and analysis play a significant role in helping participants build up their teams. I had no idea, however, of the scale of any of it! There is more than one company aimed at the fantasy football market, and massive aggregations of player data are a large part of their product. More than simple summary statistics, for the fantasy league participant, the critical information is about player interactions: How does player X perform against players with particular characteristics? How well does player Y score under these circumstances? When player X and Y have faced off in some way, what was the result? A layer of complexity and data richness is added in this view.

The ways in which Big Data is enhancing player and team performance are myriad. Players can attach wearable devices to their uniforms during practices or, sometimes, games—the use of data-collection devices during competition seems to vary from sport to sport.

For the America’s Cup sailing competition, boats are equipped with sensors that collect data on wind and other conditions prior to the start time, but no data collection equipment is allowed on board during the race (it has to be dumped overboard in waterproof packages once the race begins).

In NASCAR racing, by contrast, data can be recorded during the competition, but cannot be analyzed in real time. During the IndyCar Series, for yet another example, sensors collect massive quantities of data—5 gigabytes per lap for 85 laps—which are permitted to be analyzed on the spot and can then be used to implement changes in racing strategy midstream.

Wearable technology such as fitness monitors are a boon to team physicians and coaches, as well as to the players. With these wearable devices, coaches can monitor heart rate, speed, hydration levels, and more, thereby helping to ensure players’ health during practice or a game.

The German football (soccer) club TSG Hoffenheim has teamed up with a firm that specializes in business analytics to incorporate the real-time collection and analysis of data into their training schedule. They put multiple trackers on each player, as well as on the goals and posts, to monitor every movement on the field. The data are analyzed as they come in and improvements can then be made “on the fly.”

Adidas has a product called miCoach, which it labels “an interactive personal coaching and training system.” According to the product website, miCoach also provides real-time coaching and performance tracking; it can be used for team or individual sports. miCoach products are on the ground at the World Cup in Brazil this year, as well; teams from Germany, Mexico, Argentina, and Japan are using one of the team training systems. And a “smart ball,” complete with integrated sensors and Bluetooth, is coming on the market; its goal (pun intended) is to help players improve their kicking.

Sticking with the World Cup, I was intrigued to read an article in the magazine Foreign Policy about the data presence of the rosters of various teams. The article discussed how, because of the “Big Sports Big Data” nexus, a plethora of statistical information is available for almost all the players appearing in this year’s World Cup. The all-pervasiveness and, more critically, easy accessibility, of data is, of course, a relatively new phenomenon and the article was speculating about the effect this might have on how coaches and players approach the game.

This was an interesting insight, but what I thought was most interesting was that this data wealth is not evenly spread. Iranian players, for instance, are going into the tournament almost as statistical ciphers—most of the Iranian team is domestically based and data on their performance are not collected. On the U.S. team, meanwhile, one player—a young man who usually plays for a German youth squad—has not logged many minutes on the field, and hence data on his performance are similarly sparse.

The World Cup offers more Big Data entertainment for those who are so inclined. “Google trends” has a site that explores searches related to the beautiful game—close to 600 million as of June 19, 2014. The page summarizes which countries are garnering the most attention, which players are the subject of the most searches (who is “trending” on Google), and even what questions are the most frequently asked by fans in different countries. Oddly enough, a popular question from Mexico is, “When did Paul the octopus die?” (Paul the octopus, you may recall, was credited with predicting the outcomes of several matches in the 2010 World Cup—see Optimism and the Occult Octopus: Favorites Lose, Underdogs Triumph, and Spain Finally Wins the World Cup. By the way, he died October 26, 2010.)

Interestingly, but not at all surprising to anyone who has been following the Big Data trend and oftentimes backlash, some voices can be heard against the reliance on data in sports. Andy Flower, a cricket coach in England, has been chided for depending so much on data analysis that he ignored the spirit of the game and the players, leading to an embarrassing—the implication is “avoidable”—defeat against Australia. Tim Wigmore, writing at ESPN, put it thus: “Cricket is an art, not a science. … Flower’s reign, for the most part, showed the virtues of using [data] smartly. But cricket data is affected by the unpredictability of human beings and so constantly fluctuates. Data is emphatically not a substitute for intuition and flair. … By the last embers of Flower’s rule, England seemed not empowered by data, but inhibited by it, as instinct, spontaneity, and joy seeped from their cricket …”

Maybe the team would have lost anyway, maybe Big Data is a scapegoat. Of course, one never knows. As with much of the hype around Big Data, I think it is useful to make a distinction between what is claimed and what is realistic to expect. The Moneyball approach, while not Big Data, offered a glimpse of what can be gained by paying attention to data and carefully analyzing them. But, as we’ve also seen elsewhere in this column, data—big or small—are not a panacea. Since leagues, teams, and individual players are going to continue to collect performance data in ever-increasing quantity and quality, it behooves the statistical community to take an active role in ensuring both appropriate use and a sensible balance between data-driven and spirit-driven decisions. Statisticians can work closely with coaches and team managers to help make this happen—the nature of much collaborative work in our discipline. A challenge for statisticians taking this route is that teams jealously guard their data and analysis methods to keep a competitive edge, an obvious barrier for those who are interested in publishing.

A major opportunity for the worlds to interact is through analytics conferences. There are three regular conferences: the annual MIT Sloan Sports Analytics Conference, held in Boston since 2007; the New England Symposium on Statistics in Sports, held every other year at Harvard University; and MathSport International.

MIT Sloan is a student-run conference that has become one of the major venues for the discussion of sports analytics in the United States. In addition to students, the conference attracts representatives from many of the major leagues, as well as data analysts. Since 2010, the conference also has featured a “research paper track” that emphasizes innovative analysis of sports-related data. The conference continues to expand; whereas the first conferences were held on the MIT campus, it has moved to Boston’s convention centers in recent years to accommodate the growing interest. Activities and data analysis competitions sponsored by the leagues also have become part of the two-day affair. Panels in the 2014 conference ranged over a wide variety of topics, from training the next generation of sports data analysts, to visualization of sports data (led by none other than Edward Tufte), to wearable technologies, and much more.

MathSport International is not specifically an analytics conference, casting a somewhat wider net to include tournament design, econometrics, and mathematics education and sports. By contrast, the NESSIS conference has more of an academic focus, a gathering of statisticians with an interest in sports applications; the next such conference will be in 2015.

As Big Data continues to permeate the world of sports, the need for quantitative expertise and data analysis skill also will grow. Many teams already employ groups of statisticians to help them make sense of the massive quantities of data now routinely generated over the course of a game, race, or competition. While these jobs may be coveted and relatively scarce now, keep in mind it’s a growing field. One of the star players for the Boston Celtics, Rajon Rondo, employs his own statistician to help him analyze his performance data, as do many other players of all sports.

I’ll end by pointing to the example of Drew Cannon, a young statistics graduate from Duke University who was hired last year by the Boston Celtics to be their statistician. Admittedly, the Celtics are probably forward-thinking on this front—they’ve had a data analyst named Mike Zarren, now an assistant general manager, on staff since 2003. And the fact that so much was written about the hiring of Cannon—he even has his own Wikipedia page—indicates such jobs are still the exception.

Where a handful of teams lead and find success from their decisions, however, others are bound to see the benefits of incorporating performance metrics and using them to formalize what coaches and players already instinctively know from their experience. Almost everything I read as research for this column leads me to believe the market for statisticians with a knowledge of sports will continue to expand. So, for my students—and others—whose dream job is that of sports statistician, the world of Big Sports will offer many opportunities for employment and personal fulfillment. Seek them out!

Further Reading

Asay, M. 2014. How Big Data fails to make big plays in sports. readwrite.

Brousell, L. 2014. 8 ways big data and analytics will change sports. CIO.

Harris, D. 2014. Sports is big business, which means it’s fertile ground for Big Data. Gigaom.

Medeiros, J. 2014. The winning formula: data analytics has become the latest tool keeping football teams one step ahead. Wired.

Wigmore, T. 2014. The perils of data-driven cricket. ESPN cricinfo.

About the Author

Nicole Lazar earned her PhD from The University of Chicago. She is a professor in the department of statistics at the University of Georgia, and her research interests include the statistical analysis of neuroimaging data, empirical likelihood and other likelihood methods, and data visualization. She also is an associate editor of The American Statistician and The Annals of Applied Statistics and the author of The Statistical Analysis of Functional MRI Data.

In The Big Picture, Nicole Lazar discusses the many facets of handling, analyzing, and making sense of complex and large data sets. If you have questions or comments about the column, please contact Lazar at nlazar@stat.uga.edu.

Back to Top


Tagged as: , , , ,