## Book Reviews 33.1

*Bayesian Cost-Effectiveness Analysis of Medical Treatments*

*Bayesian Cost-Effectiveness Analysis of Medical Treatments*

###### Elías Moreno, Francisco José Vázquez-Polo, and Miguel Angel Negrín-Hernández

**Hardcover:** 284 pages

**Publisher:** Chapman and Hall/CRC Press, 2019

**ISBN-13:** 978-1138731738

Health technology assessments typically use cost-effectiveness analyses, which deal with comparing health interventions/treatments based on both cost and effectiveness (ability to improve health). Cost-effectiveness analyses help policymakers decide how to allocate limited healthcare resources and select the optimal treatments/interventions that maximize health outcomes that can be obtained within a limited available budget. They are commonly used in health care where it is difficult to put a monetary value on outcomes. Cost and effectiveness are not measured in commensurate terms, but where outcomes themselves can be counted and compared.

According to the preface, this book provides the basis of statistical decision theory and methods required for cost-effectiveness analysis, and is aimed at students of health economics with a solid background in statistics, as well as applied statisticians, researchers in health economics, and health managers. That statement is, without any doubt, appropriate for researchers in theoretical health economics and possibly correct for advanced graduate students consulting it along with other books covering the same topic. However, the statement may be questioned by some applied statisticians and health managers, given the highly academic style of the presentation of the topic.

In any case, the part of the statement that should not be under-estimated is “with a solid background in statistics,” to which it would be good to add “and in probability.”

The book is organized in six chapters. The first one provides a fair introduction to health economics with a focus on the conventional tools typically used for performing cost-effectiveness analyses—the principal index used for mixing cost versus effectiveness, such as the incremental net benefit index (and the associated incremental cost-effectiveness ratio).

The second chapter is essentially a short introduction to statistical inference theory through general sections about maximum likelihood and Bayesian estimators. In particular, that second chapter contains a section, critical for the two last chapters of the book, about Bayesian model selection with a particular emphasis on intrinsic priors.

The third chapter introduces the main concepts of statistical decision theory: ordering reward, utility function (including the axioms of existence), and optimal decision-making based on information extracted from observations.

These chapters—half of the book—just introduce standard health economics and basic statistical concepts. In particular, the second and third chapters contain very few references to cost-effectiveness evaluations and all the provided examples are artificial. These two chapters can thus be skipped by readers already familiar with the statistical inference and decision theories.

The second half of the book is clearly its core. It is composed of three chapters that finally present three key topics for cost-effectiveness analyses: finding the optimal treatment given homogeneous data, heterogeneous data (e.g., data collected from different centers), and grouped data, respectively (e.g., including patient covariates in the associated cost and effectiveness characterizations using linear regression models). Optimal treatments are obtained by considering two utility functions for the net benefit.

The first utility function compares the net benefit in expectation, while the second utility function uses a comparison for net benefit in probability. Once the utility functions are defined, multiple sub-sections cover different statistical models, considering different probability distributions for cost and effectiveness measures. These multiple variations unnecessarily slow down the reading flow. Cost-effectiveness analyses considering heterogeneous data or sub-group patient analyses intensively use Bayesian model selection for identifying clusters and for selecting significant covariables, respectively.

A large part of these two last chapters is devoted to the use of Bayesian model selection techniques by defining intrinsic priors and obtaining associated posterior distributions.

Although the title emphasizes the Bayesian orientation of the book, the frequentist solutions are also partially considered and compared to the Bayesian counterpart solutions whenever possible. Unfortunately, the book lacks real applications to be useful as an introduction. The two chapters providing the overview of statistical and decision sciences are relatively well-written and satisfactorily complete, but too disconnected from the main topic. Moreover, it is clear that the material covered in the book is quite heavily influenced by the specialized research of the authors.

The book would have benefited from some additional comparisons with alternatively developed solutions in the literature.

Finally, it is a little bit frustrating that no practical implementations are presented in/associated with the book to facilitate the use of the described solutions, although the authors indicate that their code can be obtained by request.

Having said that, the book could be used as parallel reading for a course using another introductory book, like G. Baio’s *Bayesian Methods in Health Economics* (CRC Press. 2013). In any case, this book is interesting and worth reading by researchers in health economics.

*The 9 Pitfalls of Data Science*

*The 9 Pitfalls of Data Science*

###### Gary Smith and Jay Cordes

**Hardcover:** 272 pages

**Publisher:** Oxford University Press (September 2019)

**ISBN-13:** 978-0198844396

I received *The 9 Pitfalls of Data Science* by Gary Smith—who has written a significant number of general interest books about personal investment, statistics, and artificial intelligence—and Jay Cordes, before a train trip to Salzburg, Austria, which gave me the perfect window to peruse it. This short book contains a lot of anecdotes and what I would qualify as small talk about job experiences and colleagues’ idiosyncrasies.

More to the point, it reads as a sequence of examples of bad or misused statistics, as many general-interest books on statistics do, but with little to say about how to spot such misuses of statistics. Its title (which made me realize “The 9 pitfalls of …” is a rather common title element for a book title), however, started a (short) conversation with a fellow passenger who wanted to know whether the job opportunities in data sciences were better in Germany than in Austria. While I had no clue about this important question, I do not think the book would have helped, either.

Chapter I, “Using bad data,” provides examples of truncated or cherry-picked data often associated with poor graphics; only one-dimensional outcomes and also very U.S.-centric examples.

Chapter II, “Data before theory,” highlights spurious correlations and post hoc predictions, criticism of data mining; some examples are quite standard.

Chapter III, “Worshipping maths,” sounds like the perfect opposite of the previous chapter; discusses the fact that all models are wrong but some may be more wrong than others. Also provides examples of over-fitting, *p*-value hacking, regression applied to longitudinal data. Underlying message seems to be that (math) assumptions are handy and helpful, but not always realistic.

Chapter IV, “Worshipping computers,” is about this new golden calf and contains rather standard material about trusting computer output because it is from a machine. However, the book falls somewhat foul of the same mistake by trusting a Monte Carlo simulation of a shortfall probability for retirees, since Monte Carlo also depends on a model. Computer simulations may be fine for bingo nights or poker tournaments, but are much more uncertain for complex decisions like retirement investments.

This chapter is also missing (until Chapter IX, at least) a warning about the biasing aspects in constructing recidivism prediction models, which were, for instance, pointed out in *Weapons of Math Destruction*, which I reviewed in this column awhile ago. The chapter also mentions adversarial attacks, if not GANs.

Chapter V, “Torturing data,” mentions famous cheaters like Wansink of the bottomless bowl and pizza papers, and contains more about p-hacking and reproducibility.

Chapter VI, “Fooling yourself,” is a rather weak chapter. Apart from describing Ioannidis’s take on Theranos’s lack of scientific backing, it spends quite a lot of space on stories about poker gains in the unregulated era of online poker. This chapter includes boasts of significant gains that might be from compulsive gamblers playing their family savings, which is not particularly praiseworthy. It also offers a personal entry about Brazilian jiu-jitsu, which takes us somewhat far from data science.

Chapter VII, “Correlation vs. causation,” predictably mentions Judea Pearl (whose *Book of Why* I found impossible to finish after reading one rant too many about statisticians being unable to get causality right, especially after discussing the book with Andrew Gelman). I did not find much to gather from this chapter, which could have delved into deep learning and its use in avoiding over-fitting. The first example of this chapter is more about confusing conditionals (what is conditional on what?) than turning causation around.

Chapter VII, “Regression to the mean,” sees Galton’s quincunx reappearing after Pearl’s book, which is where I learned that the device was indeed intended for that purpose: illustrating regression to the mean. While the attractive fallacy is worth pointing out, much-worse abuses of regression could be presented. *CHANCE*‘s Howard Wainer also makes an appearance, along with SAT scores.

Chapter IX, “Doing harm,” does engage with the issue that predicting social features like recidivism by (black box) software is highly worrying (and just plain wrong), if only because of this black box nature. The story predictably moves to games of chess and gomoku (Go), including the appropriate comment that this does not say much about real (data) problems. I also found a timely word of warning about DNA testing containing very little information about ancestry, if only because of the company’s limited and biased database.

With further calls for data privacy and a rather useless entry about North Korea, Chapter X, “The Great Recession,” discusses the subprime scandal (as in Ian Stewart’s book, reviewed in this very column) and contains a set of (mostly superfluous) equations from Samuelson’s paper (supposed to scare or to impress the reader?). This leads to the rather-obvious result that the expected concave utility of a weighted average of iid, positive, random variables is maximal when all weights are equal, a result that is criticized by laughing at the assumption of iid-ness in the case of mortgages. One could also laugh at those who bought exotic derivatives whose constructions they could not understand.

The (short) chapter keeps going through all the (a posteriori) obvious ingredients for a financial disaster to link them to most of the nine pitfalls, except for the second one, which is about placing data before theory, because there were no data there—only theory with no connection to reality. This final chapter is rather enjoyable, even if it comes after the facts.

*Prime Suspects*

*Prime Suspects*

###### Andrew Granville, Jennifer Granville, and Robert Lewis

**Softcover:** 232 pages

**Publisher:** Princeton University Press (August 2019)

**ISBN-13:** 978-0691149158

I was contacted by Princeton University Press to comment on this comic book or graphic novel, whose full title is *Prime Suspects: The Anatomy of Integers and Permutations*, by Andrew Granville (mathematician) and Jennifer Granville (writer), and drawn by Robert Lewis. I am not a big fan of graphic book approaches to mathematical matters, even less than to statistical notions—*Logicomix*, which was reviewed earlier in this column, being an exception for its historical perspective and nice drawing style—and this book did nothing to change my perspective.

First, the plot is mostly a pretense at introducing number theory concepts, and I found it hard to follow the shallow story for more than a few pages. The [dark math?] plot is that forensic math detectives are looking at murders that connect prime integers and permutations.

The ensuing NCIS-style investigation gives the authors an opportunity to skim through the whole community of number theorists, plus a few other mathematicians, who appear as more or less central characters, including illusory ones like Nicolas Bourbaki. Alexander Grothendieck makes an appearance as a recluse and clairvoyant hermit, although in real life, he did not live in a cave in the Pyrénées mountains.

Second, I did not (and nor did Andrew Gelman, who was sitting in my office when the book arrived) particularly enjoy the drawings, page composition, or colors of this graphic novel, especially because I found the characters drawn quite inconsistently from one strip to the next, to the point of being unrecognizable, and, if it matters, hardly resembling their real-world inspirations (as seen in the portrait of Persi Diaconis).

To be blunter, the drawings look both ugly and conventional; I do not find much of a characteristic style to them. To contemplate what Jacques Tardi, Franois Schuiten, or José Muñoz could have achieved with the same material … (or even Edmond Baudoin, who drew the strips for the graphic novels he recently coauthored with the mathematician Cédric Villani).

The graphic novel (with a prime 181 pages) has a preface with explanations about the true mathematicians behind the characters, from Carl Friedrich Gauß to Terry Tao, and—of course—and the mathematical theory explaining the analogies between the prime and cycles frequencies that are (not so) behind the story. I do find this preface much more interesting and readable (with a surprise appearance of Kingman’s coalescent!).

The preface proves somewhat self-defeating in that so much has to be explained on the side to make sense to more than a few mathematician readers, because the links between the story, the characters, and the background are heavily loaded with “obscure references.” In the end, those mathematicians may well prove to be the core readership of this book.

There is also a bit of a Gödel-Escher-and-Bach flavor to the book in that a music piece by Robert Schneider called “Rêverie in Prime Time Signature” is included, and an Escher’s infinite stairway appears on one page, not far from what looks like the Vittorio Emmanuelle gallery in Milano. (On the other hand, I am puzzled by the footnote on p. 208 that “I should clarify that selecting a random permutation and a random prime, as described, can be done easily, quickly, and correctly.” This may be connected to the fact that the description of Bach’s—not Johann Sebastian!—algorithm provided therein is incomplete.)

#### About the Author

Christian Robertis a professor of statistics at both the Université Paris-Dauphine PSL and University of Warwick, and a senior member of the Institut Universitaire de France. He has authored eight books and more than 150 papers on applied probability, Bayesian statistics, and simulation methods. Currently deputy editor ofBiometrika, Robert also served as co-editor of theJournal of the Royal Statistical Society Series Band as associate editor for most major statistical journals. He is a Fellow of the Institute of Mathematical Statistics, American Statistical Association, and International Society for Bayesian Analysis, and an IMS Medallion Lecturer.

**Book Reviews** is written by Christian Robert, an author of eight statistical volumes. If you are interested in submitting a book for review, contact Robert at *xian@ceremade.dauphine.fr*.