COVID-19 Data in the Classroom
COVID-19 has undoubtedly generated a plethora of data, from daily counts on tests, positive results, hospitalizations, deaths, etc., to information from scientific studies of the virus and treatment to insights on the impact of the pandemic on the economy, education, mental health, etc.
The value of engaging statistics and data science students with these data seems obvious, but the decision of whether to bring COVID-19 data into the statistics and data science classrooms is not simple.
On one hand, the answer might seem like an obvious yes—what better way to engage students with real data than with data about the pandemic that has turned all of our lives upside-down? On the other hand, at least at the time of writing this column, we’re still in the midst of a pandemic that has taken many lives, which might be the very reason to say no to requiring students to engage with data about an event that might have taken the life of a loved one, caused economic hardship, or affected their lives in some other way that is causing stress and grief.
The goal of this column is not to provide a definite answer for whether educators should bring COVID-19 data into their classrooms, but to showcase a few approaches for how they can do so responsibly, and highlight resources that might help educators decide whether to do so in the first place.
DataFest: COVID-19 Virtual Data Challenge
The American Statistical Association (ASA) DataFest is a data analysis competition where teams of up to five students each analyze a real and complex data set over the course of one weekend.
In 2019, DataFest was held at more than 40 locations in the United States and internationally, with more than 2,000 students participating in the event. In a typical DataFest, a surprise data set is revealed to participants at a kick-off event on Friday afternoon, and students work throughout the weekend to analyze the data and derive insights. On Sunday afternoon, groups present their work to a panel of judges made up of instructors and statistics and data science professionals in industry.
By the end of the DataFest weekend, students have not only gained experience in analyzing real data; they have also practiced presentation skills, while connecting with other students, faculty members, and industry professionals.
In March 2020, as many colleges and universities transitioned to a remote format, the ASA DataFest steering committee considered alternatives for this year’s competition. The goal was to adapt DataFest to the new remote environment, while still maintaining the parts of the event that make it an inviting and valuable experience for students with a wide range of data analysis experience. The 2020 DataFest was held as a virtual challenge where students worked in teams to explore an impact of the COVID-19 global pandemic. Given the variety of potential topics, part of what made this year’s challenge unique was that it involved participants finding a data set for their analyses.
DataFest events were held in April through June, a time when data and modeling about the direct health outcomes of the pandemic were rapidly changing and unreliable (see Why It’s So Freaking Hard to Make a Good COVID-19 Model on FiveThirtyEight.com). Building models and drawing reliable conclusions about infection, mortality, or recovery rates would require participants to understand the nuances and limitations of the COVID-19 health data at a level that was not likely to be feasible in the short span of the DataFest competition. Therefore, participants were advised to “tell us about something affected by the COVID-19 pandemic other than its direct health outcomes” to discourage them from presenting conclusions that could be misleading or harmful.
A few suggested analysis questions included:
- • How has the pandemic affected the airline industry and what are some potential downstream effects of this, other than economic strain on the industry?
- • As a student, how would you quantify the effect of the pandemic on your education?
- • With shelter-in-place/lockdown orders, many workers have started working from home, which requires internet access. How prepared was the nation/your local area for this shift?
- • How has the spread of the pandemic affected people’s opinions about government tracking and privacy?
- • What is the effect of the social distancing/shelter-in-place/lockdown recommendations and policies on pollution?
- • How can we quantify the potential effects on nutrition and general health of the public, outside of those affected by the virus?
- • How are refugees affected by COVID-19?
Too Much Direction?
When we suggested these potential analysis questions to students, we were worried that we might be giving too much direction and curbing their creativity. Fortunately, this was not the case. Students who participated in the event came up with a wide variety of questions on their own, including these analysis foci from the winning teams that might be inspirational for educators wanting to bring COVID-19 data into their classrooms.
- • Societal impacts of the COVID-19 pandemic on education in the United States: analysis of data from surveys conducted by the U.S. Census Bureau’s Household Pulse Survey, examining the availability of devices and internet in households with children in public or private schools in the U.S. over a period of four weeks, April 23–May 26, 2020. (The Data Quails, University of Edinburgh)
- • Relationship between dengue fever outbreak and lockdown: investigation of whether the dengue fever outbreak in Singapore, which coincided with the Circuit Breaker (Singapore’s COVID-19 lockdown measures), could be attributed to the Circuit Breaker, or alternatively if the Circuit Breaker had worsened the dengue fever outbreak. (Team lemonchocolatecheesecake, University of Edinburgh)
- • Dreams in the time of COVID-19: exploration of Google search trends, as well as sentiment analysis of tweets, related to people having vivid dreams during the COVID-19 outbreak. (Apoorv Jha, Duke University)
- • How research priorities shift as COVID-19 progresses: exploration of the data set provided as part of Kaggle’s COVID-19 Open Research Data Set Challenge (CORD-19), suggesting that research focus shifted from finding a cure to preventive measures for containing COVID-19. (Team N&N, Duke University)
- • Purchasing behavior via Amazon and Google Trends: analysis of purchasing behavior data based on Amazon prices and Google Trends. (Team Maskman, UCLA)
- • Driving during quarantine: investigation of traffic data to evaluate the effectiveness of the call for social distancing in Toronto, measured by the decrease in the amount of people driving in residential areas of the city. (Team Shirley Eva, University of Toronto)
The projects come from the DataFest events at the University of Edinburgh, Duke University, UCLA, and the University of Toronto (see Further Reading for links to event webpages with the student presentations and the data sets they put together for their analyses). The variety of foci in these projects is a testament to the feasibility of engaging students with COVID-19 data without the need for epidemiological modeling expertise.
In addition, a majority of the teams worked with data provided openly by governments, suggesting that featuring COVID-19—related data in classes might also be a good way to expose students to open government data sets.
Using COVID-19 Data in the Classroom
At the May 2020 Electronic Conference on Teaching Statistics (eCOTS), Laura Le, Kari Lock Morgan, and Lucy McGowan spoke on the panel “Engaging Students during the COVID-19” about using data related to the pandemic in the classroom.
One of their primary messages was that the pedagogy should be “trauma-informed” due to the potential direct impact on students. By taking this trauma-informed approach, instructors can create a classroom environment where students feel safe to discuss the subject and reduce risk of re-traumatizing students affected by the pandemic.
The panelists shared practical ways instructors can use a trauma-informed approach when discussing these data in class:
- • Take an anonymous poll, asking students whether they want to talk about data related to the pandemic in class. If the data will be used multiple times in a semester, it is good to repeat the poll to get point-in-time feedback, since students’ feelings may change as the situation evolves.
- • Indicate in the syllabus when data about the pandemic will be used, so students know when to expect the topic to be discussed in class.
- • Create an alternative assignment or discussion prompt for students who do not wish to discuss the pandemic.
- • If the course is designed for a more-specialized audience, such as biostatistics or graduate students, consider addressing the fact that the topic is sensitive but also an important area of research. This is also an opportunity to talk about strategies for maintaining a healthy relationship with emotions when doing research on sensitive topics.
- • As with this year’s DataFest, the analysis examples can focus on societal impacts of the pandemic other than direct health outcomes.
The panelists also suggested that instructors be honest about their experiences in working with COVID-19 data and, where appropriate, provide a disclaimer that they are not experts in epidemiology and infectious diseases, have not done exhaustive literature reviews, and cannot vouch for everyone’s models and predictions.
Activity: Visualizing the Effects of the Pandemic
An activity for a statistics or data science course uses data related to the pandemic. The primary goals are for students to understand how to create effective data visualizations and the ethical considerations when creating visualizations using data that are sensitive and regularly changing. This activity is largely inspired by these data analysis exercises: Dangerous Numbers? Teaching About Data and Statistics Using the Coronavirus Outbreak; Visualizing COVID-19; and Cumulative Deaths from COVID-19.
Although the data in these three activities deal primarily with direct health outcomes, the activity can be done using data about other societal impacts of the pandemic (see examples of other data sets related to the pandemic here).
Government organizations, news media, and other public outlets have used numerous data visualizations to help the general public understand the pandemic better (e.g., the “flattening the curve” plots). This has resulted in a vast collection of examples that can help students think about how visualizations are used to effectively (and sometimes ineffectively) communicate insights from complex data to the public. Students can develop their statistical literacy as they apply what they are learning in a real-world context that is current and relevant.
Part 1: Evaluate an Existing Visualization
The Guidelines for Assessment and Instruction in Statistics Education (GAISE) state that one goal for introductory statistics courses is for students to be able to “demonstrate an awareness of ethical issues associated with sound statistical practice,” so this activity begins by focusing on the principles of data ethics and ethical data visualizations. The textbook Modern Data Science with R has a chapter dedicated to ethical data science practices and professional ethics that provides a foundation for discussing the ethical considerations of working with sensitive data.
More specific to data visualizations, the post “Ethical Data Viz” on the Teach Data Science blog includes recommendations for creating ethical data visualizations and examples of how visualizations can go wrong and convey misleading information. In addition, the post “Ten Considerations Before You Create Another Chart About COVID-19” introduces a set of criteria to consider when creating visualizations specific to the COVID-19 pandemic. After a preliminary discussion of data ethics, students evaluate a visualization that conveys information about an impact of the pandemic.
Using these resources as a foundation, students can write or discuss their responses to:
Question 1. What is the topic of the visualization? What is its primary message?
Question 2. In what ways is it effective? In what ways is it ineffective or potentially misleading?
Question 3 (if examining an interactive visualization). What are the benefits of displaying the data using an interactive visualization? What are the limitations?
Question 4. How can you improve the existing visualization or display the data in a new way?
You can either have the students find a visualization they want to examine or provide one for them. For inspiration, see Resources for Teaching for data visualizations to try.
Part 2: Your Turn!
Applying the principles from Part 1, students can create their own visualizations or replicate and improve existing ones. They also can write narratives that include descriptions of the data, the primary messages, interesting insights from the visualizations, and ideas to improve them further.
Students can then present their visualizations and narratives or share them on a class-wide platform, such as a discussion forum in the course’s learning management system. This can involve sharing the final visualizations only or the entire process, from getting data to tidying it to preparing it for the visualization, and the steps (code) for creating the visualization.
The gallery of student work provides another opportunity for discussion about the decisions they made while completing the assignment and the challenges of working with complex real-world data.
Resources for Teaching
These resources provide data and class activities related to the pandemic; feel free to contribute your own.
Teaching Statistics During the COVID-19 Health Crisis.
covid19-r: collection of analyses, packages, visualizations of COVID-19 data in R.
I Eat Data Science for Breakfast: Pandemic 2020 edition.
Dangerous Numbers? Teaching About Data and Statistics Using the Coronavirus Outbreak.
Cumulative deaths from COVID-19.
Further Reading
Abuelezam, Nadia N. 2020. Teaching Public Health Will Never Be the Same.
Baumer, Benjamin S., Kaplan, Daniel T., and Horton, Nicholas J. 2017. Modern Data Science with R. CRC Press.
Carver, Robert, et al. 2016. Guidelines for Assessment and Instruction in Statistics Education (GAISE) College Report 2016.
Hardin, Jo. 2020. Ethical Data Viz. Teach Data Science.
Koerth, Maggie, et al. 2020. Why It’s So Freaking Hard to Make A Good COVID-19 Model. FiveThirtyEight.
Makulec, Amanda. 2020. Ten Considerations Before You Create Another Chart About COVID-19. Nightingale.
About the Authors
Mine Çetinkaya-Rundel is a senior lecturer at the University of Edinburgh, associate professor of the practice at Duke University, and data scientist and professional educator at RStudio. Her work focuses on innovation in statistics and data science pedagogy, with an emphasis on computing, reproducible research, student-centered learning, and open-source education, as well as pedagogical approaches for enhancing retention of women and under-represented minorities in STEM. She also works on the OpenIntro project, whose mission is to make educational products that are free and transparent, and lower barriers to education. She also organizes the ASA DataFest.
Maria Tackett is an assistant professor of the practice in the department of statistical science at Duke University. Her research focuses on using technology and active learning techniques to enhance student learning and motivation in large undergraduate statistics courses. Tackett earned a PhD in statistics from the University of Virginia and worked in industry as a statistician at Capital One.
In Taking a Chance in the Classroom, column editors Mine Çetinkaya-Rundel and Maria Tackett focus on pedagogical approaches to communicating the fundamental ideas of statistical thinking in a classroom using data sets from CHANCE and elsewhere.