How Can We Estimate the Death Toll from COVID-19?

Visual Revelations

This research is collaborative in every respect and the order of authorship is alphabetical. We would like to thank Joseph Bernstein for most of the introduction, which he conveyed in a series of emails, and Stephen Stigler for the ideas of the second section, as well as the copy of Figures 1 and 2. This essay would not exist without their help and we are indeed grateful. We also wish to single out Bruce Abramson for his thoughts, and especially his critical reading of an earlier draft, which corrected a serious error.

Introduction

How can we estimate the number of deaths caused by COVID-19? Assessing causality is never easy, and in this instance, there are some additional difficulties because of a new rule from the National Center for Health Statistics that says “COVID-19 should be reported on the death certificate for all decedents where the disease caused or is assumed to have caused or contributed to the death” (emphasis the authors’). What does “assumed to have contributed” mean?

Nobody would quibble over citing COVID-19 as the cause of death for the demise of a previously healthy 42-year-old nurse who has respiratory failure (measurable from blood oxygen and CO2 levels) with pulmonary inflammation seen on X-rays after she worked in a COVID-19 ICU without a mask. That (happily) is not the typical story, however. In an overwhelming majority of deaths, it is less clear who deserves the COVID-19 label.

A more-common story might be that of an 88-year-old who lives in a nursing home, uses supplemental oxygen, develops a urinary tract infection, dwindles, is taken to the emergency room, and for whatever reason, dies. He may be listed as a “COVID-19 death” even though he may not have been tested, let alone tested positive.

We have found that to reach the best answer to a question, we must sometimes go a long way out of the way and then come back a short distance correctly. This is one of those times. Let us begin with the city of London during September of the year 1665, at the very start of what was to become modern epidemiology.

Estimating the Death Toll from the Plague

To monitor burials in the city of London, London’s Worshipful Company of Parish Clerks gathered weekly mortality statistics. These reports were dubbed the London Bills of Mortality and were instituted toward the latter half of the 16th century. Over time, their content grew to include baptisms and the causes of the deaths reported. The statistics they contained found many purposes. A principal use was to provide Londoners with an early warning of the arrival of any pestilence or plague that had begun to establish itself, but there were other, more-surprising uses.1

In 1662, three years before the great plague was to descend on London, the haberdasher John Graunt collected the London Bills and drew, in his words, Natural and Political Observations from them. In doing so, he became known as the world’s first demographer and epidemiologist.

A weekly report for one week of September in the year 1665 is shown as Figure 1. Note that of the 8,297 people buried that week, 86% (or 7,165) of the deaths were attributed to the plague. A distant second place, with 309 deaths, was attributed to “Feaver.”

Figure 1. Diseases and casualties for one week of September 1665.

Chicago statistician/historian Stephen Stigler collected the data from all of the 52 weeks of 1665 and drew the graph in Figure 2.

Figure 2. The 97,306 deaths in London during 1665, shown by week. Red (top) line shows total deaths; blue (lower) line shows the 68,596 burials designated “Plague.”

There are four points of interest:

1. Week 32 ended on August 1. That was the day that Isaac Newton left Cambridge, which had closed because of the outbreak of the plague, for his ancestral home Woolsthorpe-by-Colsterworth, where he started what has come to be called his annus mirabilis and launched the beginning of modern physics.

2. The total number of deaths was 97,306, of which 68,596 were attributed directly to the plague. The difference of 28,710 was apparently due to some other cause.

3. The population of London at that time was about 400,000, so in that year, roughly 25% of London’s population perished. Because of this, 1665 has also been called annus horribilis.

4. The usual fatality rate in London during that time period was about 300 deaths/week, or about 15,000/year. This suggests that about half of the “other” deaths are really from the plague (or plague-related). The insight of estimating the true effect of an epidemic by contrasting the death rates during the time of the epidemic with death rates during normal times forms the basis of our approach to studying COVID-19’s effect. This approach to estimating the causal effect of an epidemic is now sometimes called an “excess deaths analysis.”

This excursion to a time almost 400 years distant thus provides us with an approach to answering the question of “How can we estimate the death toll from COVID-19?”

Estimating the Death Toll from COVID-19

It is useful to think of the COVID-19 pandemic as an experimental treatment. Our goal is to measure the causal effect of that treatment on the lives of people in the United States. The treatment group will be the U.S. population from February 1 through April 30, 2020.

We must also calculate the death rate of a control group whose members are just like those in the treatment group but are not exposed to COVID-19. We chose the U.S. population for the same three months during the time period 2014 through 2018 for the control group. The dependent variable is the total number of people who died.

Because the seriousness of the effects of the COVID-19 virus differs markedly by the age of its victims, we stratified the sample by age. Shown in Table 1 are the total number of deaths recorded in the U.S. for the control group (2014 through 2018). On the right side are the parallel figures for 2020.2

Table 1—Number of U.S. Deaths, February 1–April 30

The numbers for each year are not strictly comparable because 2016 and 2020 are leap years and each contains one additional day. Calculating the daily death count adjusts for this and yields the results shown in Table 2. Table 2 has been augmented by mean number of daily deaths for each age category. We could have used this as an estimate of what is usual and thus as what we would expect for 2020, which would have paralleled Stigler’s analysis of the London plague data, but doing so would have missed the time trend of increasing deaths. Instead, we fit the data with an expanded model that had an age effect based on the mean deaths for each age group and a second term based on the regression of age-adjusted death rates against year. This approach accommodates such yearly changes associated with population growth and aging.

Table 2—Number of U.S. Deaths per Day, February 1–April 30

On applying this model, derived from the five years of data from 2014–2018, to estimate the number of daily deaths in 2020, we found the model predicted 7,819 death/day, and the actual daily rate was 8,262. The difference of 443 deaths between these two (what we label as the “Residual”) is what we ascribe to the net causal effect of the COVID-19 pandemic.

A closer look at the Residual column shows that there were fewer deaths among the young (younger than 25) than expected, but an excess of deaths among older people, with those 55 and older accounting for 497 additional deaths (6% of the total number of 2020 deaths in February through April).

Discussion and Conclusions

Every day, the news updates the inexorably mounting numbers of U.S. deaths attributed to the COVID-19 virus (as this was written, 98 days since the first COVID-19 death was recorded in the U.S., the toll had reached about 85,000). What is never provided is parallel information that would help us process this dreadful statistic; specifically, compared to what?

In previous years, there are typically about 7,700 deaths a day in the U.S. Thus, we would ordinarily expect about 755,000 deaths in an average three months; 85,000 deaths represents about 11% of that number. Is the current total death rate from all causes 11% higher than what we would expect? Is it greater than that, suggesting that there are additional COVID-19 deaths not being recorded as such? Or is it less than that, suggesting that some deaths are recorded as COVID-19 but were not caused by COVID-19?

The goal of this research was to try to provide a more-rigorous answer to the question, “Compared to what?” We expanded the data by stratifying by age groups because all available intelligence suggested that the effects of the COVID-19 virus are dramatically different depending on the person’s age.

The column labeled Residuals in Table 3 is the estimate of the causal effect of the “treatment” (being in year 2020) relative to the “control” (the five years from 2014 to 2018). What makes the treatment year different from the control year? The obvious answer to that question is the COVID-19 pandemic—but that is not the only thing. The stay-at-home isolation that began in some affected states (California, New York, Washington, New Jersey) at the end of March 2020 also vastly reduced the number of motor vehicles on the road. This may reduce the number of traffic fatalities during this period (although complete data are not available yet), but the usual number of traffic deaths is large enough to have some influence on the totals. For example, the age group with the most daily traffic fatalities is drivers between the ages of 45 to 54, and the daily traffic fatality rate for them is about 30. This could account for some diminution of daily deaths seen in 2020 from the model’s prediction.

Table 3—Model-based Estimate of 2020 Daily Death Rate from Daily Deaths in 2014 through 2018 where the Residuals are the Excess Number of Deaths in 2020

Our estimates are based solely on the number of people who died during these three months when compared with the number of deaths during the same time frame as the five-year period 2014–2018. Deaths are generally accurately tallied and the estimate we calculated is not based on anyone’s classification of cause. Our estimate, like the one shown in Figure 2 of the number of deaths ascribed to the plague, is a somewhat-indirect measure of COVID-19’s effect, but it is free of the sorts of uncertainties of diagnosis described in the introduction.

A shortcoming of inferences made from our results is the time span of the data we had available. The deaths are averaged over the three months February through April, but very few COVID-19 deaths were reported in the first half of that time period, so it is very likely that for at least 45 (perhaps 60) of the 90 days, there was essentially no difference between what has occurred to date in 2020 and the estimate of 2020 deaths based on the five control years. Hence, what we present as the effect of the pandemic is the excess of deaths in April averaged over the three-month period.

If the increase of COVID-19 deaths over time was linear, using the average over that time period would still be a fair representation, but if, as we expect for a pandemic, there was exponential growth, the average would yield a distorted representation. Yet, these were the data we had available at this time, so we used them to illustrate this method of estimation.3 Our goal was to provide a timely example of how to assess the deadliness of the COVID-19 pandemic in a way that is unaffected by possibly idiosyncratic diagnoses. We look forward to what is uncovered as it is applied to richer data sets in the future.

Further Reading

Arbuthnot, J. 1710. An argument for Divine Providence taken from the Constant Regularity in the Births of Both Sexes. Philosophical Transactions of the Royal Society 27, 186–190. London: The Royal Society.

Graunt, J. 1662. Natural and Political Observations on the Bills of Mortality. London: John Martyn and James Allestry.

Footnotes

1In 1710, Dr. John Arbuthnot used the number and sex of christenings listed at the bottom of the Bills to prove the existence of God and, in the process, invented modern hypothesis testing.
2The 2014–2018 numbers are, per the CDC data set description, https://wonder.cdc.gov/wonder/help/ucd.html#.
The 2020 numbers are from https://data.cdc.gov/NCHS/Provisional-Death-Counts-for-Coronavirus-Disease-C/hc4f-j6nb/data.
3After we completed the study more recent data became available, so we repeated the analysis with May’s data (as available as of June 12) and confirmed both the size of the effects and the conclusion that the excess deaths were primarily visited upon those 65 years of age and older.

About the Authors

Debra M. Boka is a doctoral student in sociology at the University of California, Irvine. She holds a master’s degree in demographic and social analysis.

Howard Wainer is a statistician and author who has written this column since 1990. His latest book is A History of Data Visualization and Graphic Communication, (with Michael Friendly), which will be published by Harvard University Press early in 2021.

howard wainer
Visual Revelations covers many topics, but generally focuses on two principal themes: graphical display and history. Howard Wainer, column editor, encourages using this column as an outlet for popular statistical discourse. If you have questions or comments about the column, contact Wainer at wainerhoward@gmail.com.

Back to Top

Tagged as: ,