Books Reviews 33.3

Essentials of Probability Theory for Statisticians

Michael A. Proschan and Pamela A. Shaw

Hardcover: 328 pages
Year: 2016
Publisher: Chapman and Hall/CRC Press
ISBN-13: 9781498704199

On a quarantined if sunny Sunday morning, I read this 2016 book that CRC Press had sent to me quite awhile ago for review. Before moving to serious matters, let me provide my customary evaluation of the cover. I have trouble getting the point of the “face on Mars” being adopted as the cover of a book about probability theory (rather than a book about, say, pareidolia). There is a brief paragraph (p. 37) on post-facto probability calculations, stating how meaningless (indeed!) is the question of the probability of this shadow appearing on a Viking Orbiter picture by “chance,” but this is so marginal that I would have preferred to see any other figure from the book.

The book aims to cover the probability essentials for dealing with graduate-level statistics—in particular, convergence, conditioning, and paradoxes resulting from using non-rigorous approaches to probability. A range that completely fits my own prerequisite for statistics students in my classes and that, of course, involves recourse to (Lebesgue) measure theory, and a goal that I find both commendable and comforting, since my past experience with exchange students led me to the feeling that rigorous probability theory was mostly scrapped from graduate programs.

While the book is not extremely formal, it provides a proper rationale for the essential need of measure theory to handle the complexities of statistical analysis and, in particular, of asymptotics. It thus relies as much as possible on examples that stem from or relate to statistics, even though most examples may appear as standard to senior readers—for instance, the consistency of the sample median or a weak version of the Glivenko-Cantelli theorem.

The final chapter is dedicated to applications (from a probabilist’s perspective!) that emerged from statistical problems. I feel these final chapters were somewhat stretched compared with what they could have been, such as the multiple motivations of the conditional expectation, but this simply makes for more material.

If I had to teach this material to students, however, I would certainly rely on this book, especially because of the repeated appearances of the quincunx for motivating non-normal limits. (A typo near the presentation of Fatou’s lemma missed the dominating measure, and I did not notice the Riemann notation dx being extended to the measure in a formal manner.)

A Computational Approach to Statistical Learning

Taylor Arnold, Michael Kane, and Bryan W. Lewis

Softcover: 362 pages
Year: 2018
Publisher: Chapman and Hall/CRC Press
ISBN-13: 9781138046375

I also read this book at breakfast time over a few sunny mornings at home, and overall, I found it to be much more computational than statistical. The authors delve quite thoroughly into the construction of standard learning procedures, including home-made R codes that obviously help in understanding the nitty-gritty of these procedures—what they call try and tell. What is a missing part is that the statistical meaning and uncertainty of these procedures remain barely touched by the book. This is not uncommon in the machine-learning literature, where prediction error on testing data often appears to be the ultimate goal, but this obviously is not so traditionally statistical.

The authors introduce their work as (a computational?) supplementary to Friedman’s, Tibishirani’s, and Hastie’s Elements of Statistical Learning, although I would find it quite hard to either squeeze both books into one semester or dedicate two semesters to the topic, especially at the undergraduate level.

Each chapter includes an extended analysis of a specific data set, and this is a true asset of the book, despite it sometimes over-reaching in selling the predictive power of the procedures. Printing extensive R scripts may prove tiresome in the long run, at least to me, but this may simply be a generational gap. And the learning models are mostly unidimensional, such as the chapter on linear smoothers, which, in my humble opinion, holds an excessive profusion of methods. The chapter on neural networks has a fairly intuitive introduction that should nicely reach fresh readers, although encountering the handwritten digit data at this time made me shift back to the late 1980s, when my wife was working on automatic character recognition. However, I found the visualization of the learning weights for character classification hinting at their shape (p. 254) most helpful and alluring.

Among the things I feel are missing from this book, one is a life-line on the meaning of a statistical model beyond prediction, and another is some minimal attention to misspecification, uncertainty, and variability, especially when reaching outside the range of the learning data, and when returning regression outputs with significance stars, yet another discussion on the assessment tools like the distance used in the objective function (for instance lacking in scale invariance when adding errors on the regression coefficients) or the unprincipled multiplication of calibration parameters, plus some asymptotics, at least one remark on the information loss due to splitting the data into chunks, giving some (asymptotic) substance when using “consistent,” waiting for a single page 319 to see the “data quality issues” being mentioned.

While the methodology is defended by algebraic and calculus arguments, there is very little about the probability theory side, which explains in retrospect why the authors consider that the students need “be familiar with the concepts of expectation, bias and variance”—and only that. Providing only a few paragraphs on the Bayesian approach does more harm than good, especially with so little background in probability and statistics.

The book possibly contains the most-unusual introduction to the linear model I can remember reading: Coefficients as derivatives…followed by a detailed coverage of matrix inversion and singular value decomposition. (That would not seem like the #1 priority, were I to give such a course.)

The book would have benefited from better attention paid to copyediting or proofreading. The inevitable typo “the the” is found on page 37. A less-common typo is spelling Jensen’s inequality as “Jenson’s inequality,” both in the text (p. 157) and in the index, augmented by a repetition of the same formula in 6.8 and 6.9. A “stwart” (p. 179) that made me search awhile for this unknown verb. Another typo in the Nadaraya-Watson kernel regression, when the bandwidth h suddenly turns into n (I had to check twice because of my poor eyesight). An unusual use of the meaning of partition, where the sets in the partition are called partitions themselves. Similarly, a fluctuating use of dots for products in dimension one, including a form of ⊗ for a matricial product (in equation (8.25)), followed on the next page by the notation for the Hadamard product.

I also suspect the matrix K in (8.68) is missing 1’s or I am missing the point, since K denotes the number of kernels on the next page (just after a picture of the Eiffel Tower).

The book includes a surprisingly high number of references for an undergraduate textbook, with authors sometimes cited with full name and sometimes only with last name. There are also arXiv references and technical reports that do not belong in a book at this level.

The final, and most-pedantic to mention, counter-fact is that Conan Doyle wrote more novels “that do not include his character Sherlock Holmes” than novels that do include Sherlock.

About the Author

Christian Robert is a professor of statistics at both the Université Paris-Dauphine PSL and University of Warwick, and a senior member of the Institut Universitaire de France. He has authored eight books and more than 150 papers on applied probability, Bayesian statistics, and simulation methods. Currently deputy editor of Biometrika, Robert also served as co-editor of the Journal of the Royal Statistical Society Series B and as associate editor for most major statistical journals. He is a Fellow of the Institute of Mathematical Statistics, American Statistical Association, and International Society for Bayesian Analysis, and an IMS Medallion Lecturer.

Book Reviews is written by Christian Robert, an author of eight statistical volumes. If you are interested in submitting a book for review, contact Robert at xian@ceremade.dauphine.fr.

Tagged as: book reviews