Interview with Steve Fienberg
Behseta: Steve, let’s start, somewhat unusually, with the textbook Beginning Statistics, which you co-authored with Fred Mosteller and Robert Rourke.
Steve: A great place to begin for reasons I will explain.
Behseta: I have it on my shelf. What’s the story behind it?
Steve: So first of all, good news. That book is reappearing after 30 years. I can even show you the cover. I added the notes on the back cover about three weeks ago, and it will appear as a Dover paperback reprint at some point later this fall. And I’m very, very pleased. I wish the layout could have been redone because there’s a very long story behind “the book,” but keeping it in print so that others can use it has been my real objective. And I think Fred would be very pleased to see the book still in print, and Bob would have too.
Behseta: Did Mosteller invite you to join him on this?
Steve: Mosteller and Rourke were long-time collaborators, going back to the late 1950s, when Fred got involved in developing statistical materials for high school. They were on a committee together and they wrote considerable material for it—this was the precursor to Probability with Statistical Applications, which was the book they ultimately produced with George Thomas. So their collaboration went way back. Fred also worked on a lot of that material linked to his lectures for the introductory statistics course he did as part of the television program known as “Continental Classroom,” which, as somebody described it recently, was the first statistics MOOC, long before people considered doing online courses.
Thousands and thousands of people watched Fred in the early mornings teach statistics on TV, and that was their introduction to the field. Fred and Bob had worked together on the book with George Thomas, and there were two or three versions of the book, including one that was specific for “Continental Classroom.” Later they worked together on a book called Sturdy Statistics, which was an effort to do nonparametrics (so sturdy in the sense of robust), for people who didn’t want to do nonparametric statistics with all of the mathematics. It was for a second course in statistics.
Behseta: We don’t know Rourke.
Steve: Bob was a high-school teacher at a small, private academy. Originally a Canadian, I think born in Kingston, Bob taught at small, private schools and was involved with a lot of these efforts, having previously coauthored a number of high-school math textbooks. What he brought to the collaborations with Fred was an understanding of what you had to do to make something appealing to students at that real introductory level. Anyhow, their book was Probability with Statistical Applications, and it was heavy on the elementary probability; Fred’s notion was we had to do a statistics book. And maybe we’ll do probability, but that was not the goal of the book. Fred and Bob got started on this and then Bob got sick, and the project got bogged down. They had drafts of—I’m trying to remember how many chapters—maybe a half dozen chapters, some quite polished. The early chapters were EDA (the exploratory data analysis).
So, in this sense, they were quite far along when I joined the project. But then the more heavy-duty statistical stuff had not really been fleshed out. Fred asked me to join the project because Bob wasn’t really able to do any writing. You need to understand that Fred was always somebody who talked about getting projects done. Getting me involved was his notion of how this book would get done. I’m not sure I helped move it along fast enough, but the goal was to get the project done. In the end, we did write a probability chapter, but it came after all the EDA stuff. They had pieces for the next chapters that I helped polish up, but what I contributed most was laying out the early drafts of the regression and analysis of variances. The challenge was to do these topics for students who didn’t have a computer, but to write the material as if they did. If you read the preface, that was always our goal. Of course, when I tested the draft chapters, I did it with students who had access to computing, for example using MINITAB. But the idea in the book was that they should be able to read computer output, but not necessarily create it.
Slavkovic: So who do you think this book is now appropriate for; where could it be nicely used?
Steve: I said there was a long story behind the book. Fred was an editor for Addison-Wesley. The in-house editor at the time was a man named Roger Drumm. He and Fred developed the statistics series for Addison-Wesley. Roger was very good. He had also worked with Fred on the regression book with Tukey, which was a real second course of a much higher level, and Tukey’s EDA book. There was a promise that we would get color and the big page formats—you know, all the fancy stuff they do for very good textbooks these days to market them. About a month before we submitted the manuscript, Roger left the company, or at least shifted to a different position in the company, and the new editor who took over didn’t think very much of what we had done.
We still had a contract with Addison-Wesley, and they weren’t going to void it because Fred was important to them in other ways. But suddenly they weren’t about to do anything special for us. So color disappeared, and all of the fancy things we had talked about disappeared, and the book got to a reduced page size, and they fought with us about figures because they were supposed to redraw the figures and now they wanted us to do the work, and so on and so forth. In the end, just getting it published was a big deal. It was interesting because ours was, in many ways, the first elementary statistics book that did EDA for people who’d never seen statistics, but Addison-Wesley never promoted it as a textbook.
In fact, the same year they published our book, they published two other old-fashioned intro statistics books. And they wouldn’t even show ours to instructors around the country. It was just bizarre. I used it a few times. Did I use it with you when you took the undergrad intro statistics course at CMU?
Slavkovic: I don’t think so.
Steve: I guess we were already using Moore and McCabe at that point. Well, what you didn’t know is that I was really using my book. Because that’s how I thought about my lectures, using the examples, and using the pedagogical ideas that I had been taught by Mosteller and Rourke. Bob used to push this idea called PGP, which you can now see in the book if you go back and look. You begin with the particular, that is you introduce the topic with an example. It might be a simple numerical one, although I always like to use real data for everything. But then you do the concept in a more general form. For example, you write down the algebraic formula for a t-test, although you did the t-test already in the particular example without the formula or the idea behind it.
Then when you have the general structure, you put it to use again in another particular example where it has some payoff so that the student comes away with a sense that what they just learned has some value. We tried to use PGP throughout our book, even long after Bob stopped working. He died before we were done. But that was also the part of the philosophy I took it into the statistics classroom. Many modern textbooks use the PGP approach, although typically not as a rule, and teachers don’t necessarily follow the practice in their classrooms either.
Behseta: You recently edited two books on Mosteller. One is a selection of his papers, and the other is his autobiography. What’s the story behind the autobiography? Was it lost and somehow discovered again?
Steve: When I was at York University in the early 1990s, we had a project to produce a volume of selected papers that Fred wrote. This was to be an adjunct to a project that produced a volume called A Statistical Model, which was supposed to be in honor of Fred’s 65th birthday. I think we made it just before he turned 70—so not quite for his 65th birthday. But the idea was that we would follow that up and get a selection of his papers into print together. David Hoaglin and I got started on this when I was still academic vice president at York, and I brought the project back to CMU. We slowly re-keyed each of the selected articles into LaTeX, and then had to clean them up.
We also had a grand plan to add introductions to the selected papers. But it took a long, long time to just get the papers themselves into final form. We were in a wrap-up mode some 10 years or so ago and David and I went to see Fred in his house to discuss how to bring the project to closure. He had been very, very sick and was just going through some rehab. We went out for lunch together and he pulled us aside afterward and he said, “I have a request. There’s this manuscript I have and it hasn’t been published. Can I entrust you to get it done?” This was clearly something to which David and I couldn’t say no. So we took it. It was typescript. Not even a computer typescript, but real typescript, initially.
Fred’s assistant, Cleo, with whom he had worked starting from about 1950, actually learned Word Perfect in order to re-key it for us, producing this new version from which we could work. The backstory was that Fred had been asked to write an autobiography as part of a series that the Sloan Foundation was publishing on scientists. He couldn’t get himself to finish it because he was too busy doing projects. Every time he got closer to the end, to about the time he was writing, he was too close to what he was doing. So many of the early chapters were quite polished, but as the book got further along chronologically, the chapters got sparser and sparser. He had done most of the writing in the 1980s, and virtually nothing in the intervening decade or so prior to giving the manuscript to us.
The draft Fred gave us was sitting in a plastic basket. We immediately made several copies. I left one with David. We gave one to Cleo, who produced the typescript. I was going from there to Montauk, Long Island, for my annual visit with Judy Tanur. I took a copy out to Long Island and I showed it to her and Judy just thought it was great—she had worked with Fred going back 30 years or more to Statistics by Example and Statistics, A Guide to the Unknown. In the end, the three of us agreed we would somehow look after it. The first task was to get a publisher. I called up the people from Sloan and they said no, we’re not publishing these anymore. So that was bad. Then I went to my editor at Springer, John Kimmel, and I said to John: “You know how important Fred has been for statistics.” Springer had published A Statistical Model. The Selected Papers were almost done at this point and we were going to put those into print as a Springer book. I said to John: “Springer really needs to do the autobiography; this is what people really want to read.”
We sent John a copy and he read it and said, “Well, it needs to be edited.” He said things like, “This chapter is a little too long, can’t you do more here?” But he was supportive. On my next trip to Montauk, Judy and I sat with the chapters about his early life and began to whittle them down. We took everything up through his undergraduate career at Carnegie Tech and condensed it. Because that’s what John had thought probably was a little long. But we also added afterwards (epilogs) to each of the first six chapters.
It’s a very unusual autobiography. It really doesn’t begin at the beginning. It starts with descriptions of six big collaborative projects. Fred was a master of big projects. He could organize 20, 30, or even more people to do something together. Each of these six chapters talks about a project or a major application, for example, the Federalist Papers, the project he did with David Wallace, or the critique of the Kinsey Report, which he did with Bill Cochran and John Tukey. Another deals with the National Halothane study, which was a major NRC study, where the statisticians included Mosteller, Tukey, Lincoln Moses, and John Gilbert. By the way, John collaborated with me on one of my first papers; he made the first physical model of the two by two contingency table.
Slavkovic: We’ll get back to that.
Steve: He made it out of wires. I made my own with wires and string. At any rate, the Halothane Study also involved major efforts by several graduate students, most notably Yvonne Bishop, who spent a year working out at the Center for Advanced Study in the Behavioral Sciences at Stanford, running log-linear model contingency table analyses. Fred described each of these projects in the early chapters of the book. And for those, we simply said, “They’re terrific, minor edits.” What we did was we added a postscript to each of those, saying here’s how to think about this project 20 years or 30 years later. Here’s how this work changed a whole chunk of the field. I did a lot of those and then David edited them—David’s a fabulous editor and collaborator. He leaves no word untouched.
We condensed the middle chapters and then we had to figure out how to end the autobiography, since Fred stopped writing in the late 1980s. That took some time and we consulted a number of others. This led to the final chapter, in which we got contributions from other people and tried to fashion it together to reflect on how Fred would have wanted his post-1990 activities to be described. Sadly, the month the Selected Papers came out, Fred died. I got to see him in the hospital and I delivered a Xerox version of what was coming out, in a binder, because I was afraid he wasn’t going to see it. And he died about a week later.
But we hadn’t really got the autobiography properly polished yet. And as a consequence of a series of sessions we did in Fred’s honor, we were able to pull together pictures from lots of people. We have this fabulous collection of pictures, and his daughter Gail did a lot to gather some. And we also reached out to people. They included baby pictures and they went into the chapter where he was born about his childhood. And we have pictures of him as a student when he got to Princeton. I don’t think we had a Carnegie Tech one. There are wedding pictures, there are pictures taken at his summer home on Cape Cod. We didn’t have a lot of each of us with Fred, but we located one of each of us with Fred that went into the last chapter. David had worked with him for decades on projects, Judy’s interactions with Fred went back even further, and he was my advisor and mentor.
Slavkovic: I’m sure you loved the book, the entire book, and the six projects in particular but if somebody tells you: okay, I have time right now to read one chapter, what would be the chapter that you would recommend?
Steve: You can’t do that. If you want to know what made Fred great, you read the first six chapters. Because they tell you about the projects. And it isn’t just the results of what he did. It was how he organized the projects and how he got the work done. Take the Federalist Papers project. Fred describes how he was visiting the University of Chicago and how he and David decided to do a project to show that Bayes could really be done with real data, because nobody at the time was doing Bayesian analyses on major practical problems—it was just too difficult. That’s what I tell somebody to read. But then if you want to know about Fred, you’ve got to read the rest.
There are some classic paragraphs here and there that are so revealing. I said something about getting the project done. There’s a little paragraph that talks about—his parents split when he was in school. His father ran road construction crews in West Virginia, and Fred spent some summers working for his father there. And of course, when the weather was good, you had to get the project done because the rains would come and screw everything up. So there’s this little paragraph where we learn how Fred learned to get the project done.
Slavkovic: That’s great. You spoke about Fred and—on many other occasions—you spoke about your other mentors and collaborators, as well as different book projects. For me, personally, when I was at CMU, one of the early books I read on statistics that was helpful was DeGroot’s Probability and Statistics that Larry Wasserman was using for a class he was teaching. This brings me to your early days at CMU and your collaborations with DeGroot. We wanted to know how exciting those days were. You also co-authored Statistics and the Law with DeGroot and Jay Kadane at that time. So how did that book come to be? It’s really a collection of articles, right?
Steve: Yes, but you have to go way back! I first met Morrie in 1970. So I was two years old, as a statistician. I was a junior faculty member at The University of Chicago. Morrie got his PhD from Chicago under Jimmie Savage, and he went from Chicago to Carnegie Tech in 1957 before there was a statistics department. Jimmie was notoriously tough on his students and some of them never finished. But Morrie was one of the really good ones and he submitted his thesis a year or so after moving to Pittsburgh.
As I said, I met Morrie in 1970. I had already been versed in Bayesian thinking, a little bit by Fred, although Fred didn’t push. Actually much more by Howard Raiffa and Bob Schlaifer, who were at the Harvard Business School, and whose weekly seminar I would attend. John Pratt was part of that seminar as well. The presentations and discussions, especially the latter, were really formative activities for me because these where statisticians who were real subjective Bayesians and they did both theory and actual applications.
Behesta: Is this the Schlaifer from the Decision Theory book?
Steve: Yes. Schlaifer was self-taught as a statistician. And basically he and Raiffa carried out this systematic research program in the late 1950s where they set out to recreate exponential families from a Bayesian perspective. They invented conjugate priors—they invented the name, although the idea had being around in different forms earlier. They created pre-posterior analyses. Their book is just a fabulous treatment of Bayesian tools, although it had the world’s most God-awful notation, which Howard Raiffa has happily defended in the interview I did with him for Statistical Science. He remembered exactly what the notation was there for, but I never got very excited by it.
Anyhow, back to Morrie. By the time I met him, I was into Bayesian thinking, and I was at a regional IMS meeting, in North Carolina. I got to know Morrie, as we were drinking with some others in a bar. We seemed to meet and drink in bars often over the years. At any rate we became friendly. We were both, along with Jay Kadane, part of this seminar that Arnold Zellner ran on Bayesian econometrics and statistics. It was a very small group, initially, of deeply committed Bayesians. It included Arnold and some other people from the business school at Chicago: Bruce Hill and Bill Erickson from Michigan. Both George Box and George Tiao were involved, as were Seymour Geisser and Jim Press. Jimmy Savage came and spoke at one of seminars. Morrie, Jay, and I interacted a lot at those meetings. Jay, at the time, wasn’t yet at CMU. He was between Yale and CMU at the Center for Naval Analysis.
A bit later on, Morrie became editor of Theory and Methods for JASA. He had previously been Book Review editor. When he became Theory and Methods editor, succeeding Brad Efron, I became Book Review editor. Then Bob Ferber, who was Coordinating and Applications editor, stepped down and I succeeded him, and Jim Press became Book Review editor. There we were, three Bayesians running JASA! I was in Minnesota already at this time, so we spent more time in bars. And not just talking about the politics and world affairs, but we were also dealing with how to run the journal and how to bypass all the efforts by the ASA Board to control and change the content.
I was the first editor who oversaw the production office, which moved from the editor’s office into ASA. ASA wanted to run it and I wanted to oversee the activities so that the editors could do their jobs. The editors had a very different point of view about how to get things done, as you may now understand. Editors know what they want, and somehow things never get done quite the way they want when others control the process.
Morrie, Jay, and I were interacting in these two spheres. In 1978, I had been offered a job at another institution, which never quite worked out. And Morrie and Jay knew. We were in a bar one night and they said, “You should come to CMU.” I was ready. I had already got myself psyched up about the possibility of moving because you shouldn’t interview for something if you’re not ready to take the job. I interviewed at Carnegie Mellon and I spent two hours with Dick Cyert, then CMU president, who was Morrie’s collaborator. Dick thought his job was to talk me into coming to CMU, which he did. After I got there, Morrie and I began our collaboration on probability forecasting. Actually, it started before I got there as a result of a conversation we had at the first Valencia meeting in 1979.
Behseta: Was there a statistics department?
Steve: Morrie and Dick Cyert pushed to create the department in 1966, right about the time Carnegie Tech and Mellon Institute merged. It was about the same time as the university added a bunch of other units: SUPA, which became a Heinz College, the College of Humanities and Social Sciences which grew out of Margaret Morrison Carnegie College for women. And statistics was created as a university-wide department, reporting at first to Dick, who was dean of GSIA, the business school. Then Dick became president in the early 1970s and statistics reported to him as president. Morrie was the first head. They did make an effort to hire outside, but that’s always hard, and they finally prevailed on Morrie to become head. Jay became head succeeding Morrie when he came in the early 1970s.
Slavkovic: Jay was just telling me this morning about his role and needing to report to different deans and the president, directly.
Steve: Exactly. So what happened was Dick got busy, and he finally said, “Why don’t you report to the provost?” The provost was a nice guy, but he didn’t want Jay to report to him. Finally, in about 1978, the provost said to Jay, “Pick a college; we’re not going to continue this arrangement.” Jay sent a memo to all of the deans, inviting them to apply to the position. Very typical Jay. We had several good offers. The best one came from the College of Humanities and Social Science. So that’s where we ended up.
Behseta: Typically, statistics departments are not housed in social sciences and humanities!
Steve: There are a few similar settings for statistics departments at other universities, but it’s not common. More typically, statistics is with math and in physical sciences or computer science. But what was very clear was that we had the mandate to continue to work with everybody in the university. When I became head, I used to tweak the dean of H&SS because we didn’t agree to be part of his college; we just agreed to have him as our dean. So I referred to him as the dean of humanities and social science and statistics. But then we officially became part of the college.
Behseta: So because of that formation, since day one the statistics department at CMU must have been Bayesian.
Steve: It was the Bayesian department in the sense that Morrie and then Jay were Bayesians, but initially there were others involved. Don Gaver who was here did some Bayesian things, but others were frequentists. John Lehoczky came in 1969, but John wasn’t really a Bayesian.
Slavkovic: So did the bar outings convince you to become Bayesian, or was there something else?
Steve: No, no. Bayesians go to bars, is the way I think you have to think about it. Or Bayesians have good times!
Slavkovic: So what did convince you to become Bayesian? Then we’ll come back to the book we wanted to ask you about.
Steve: I was really convinced by Raiffa and Schlaifer, and that seminar they ran. I did it simultaneously with my interactions with Fred. One of my first projects was actually something that you would now call empirical Bayes. But even though Fred had done the Federalist Papers work, which was fully Bayesian, he didn’t push Bayes. When I got to Chicago, I taught Bayesian classes, and I interacted with the business school people. But going back to Harvard, I was part of the Raiffa and Schlaifer seminar and that was formative. I remember there was a basic statistics class that Jerry Klotz, who later went to Wisconsin, taught out of Lehmann.
Behseta: The Testing Statistical Hypotheses book?
Steve: Yes, that was the only Lehmann book at the time. Partway through the course, I said to Jerry Klotz, “But what about Bayesian methods?” And Jerry said to me, “If you want to be a Bayesian, you go to the business school. I don’t do that.” So I went to the business school because I wanted to know what it was. Of course, I’d heard the word before from Don Fraser, but he wasn’t Bayesian. He had his own label for it, but it was really a form of Fisherian fiducial inference.
Anyhow, we never got to the book.
Steve: Morrie and Jay and I were all interested in legal applications, and we thought it would be a neat way to collaborate. We spent a lot of time chatting about the framing of statistical testimony and decided that getting contributions from people who really did statistical testimony as experts would be of interest to others. That’s what we then organized. Morrie and I worked on research projects separately. Jay and I wrote together at the time as well.
Behseta: And it’s an interesting book because it reads like a collection of case studies with comments and rejoinders and the sort of back and forth commentaries that you would find in technical journals.
Steve: Well, and also that you see in trials.
Behseta: That’s right! It has the legal feel!
Steve: In fact, part of what we tried to do—not overly successful—was actually have some of that. Our idea was that if we could identify a couple of good cases and get the experts from the two sides and let them describe things—because the two sides always tell a different story—and therefore their experts do different analyses and reach different conclusions, even though the data are the same.
Behseta: You have a keen sense of appreciation for the history of statistics as a discipline. And you’ve written about it and published around that theme. Where does that come from? In your class lectures, you were always referring to historical occurrences.
Steve: It’s an interesting question.
Slavkovic: We’ll give you a few minutes to think about it. We sat in your course. We had a seminar touching on the history of statistics.
Steve: It’s a funny thing. When you’re a graduate student, you think something done two years ago is old, and therefore you don’t begin doing things by going back and reading what other people wrote a long time ago. One of the things I read as a graduate student was this lovely little book by Jack Good called Estimation of Probabilities. It was actually a lot of what Jack had done over the 1950s and early 1960s in a slightly different form. But if you read the early chapters of that, it takes you back into the earlier contributions of people doing Bayesian-like analyses. For example, there’s Johnston’s Postulate. By reading Good, I began to have some sense that I should pay attention to that work.
I got interested through a series of different problems in understanding where statistical ideas came from. Iterative proportional fitting, which Yvonne Bishop brought into contingency tables, I knew came from Deming and Stephen in 1940. Now, that was ancient history to a grad student working in the 1960s. I went and read their papers. I began to read some of the other older literature on contingency tables as well.
When I got to Chicago, I fell under the influence of Bill Kruskal, who was this unbelievable scholar. He would constantly direct me to things in the literature that were from prior eras in one form or another. For example, I did this paper on the draft lottery, with Bill’s encouragement. The analyses were trivial in many senses. But the paper was not, because with Bill’s urging I went back and look at the history of lotteries. Every time I thought I had some of it done, Bill would go into the library at night and find these old books and journal that he though I should read and reference. By the time I had completed that project, I was already doing my own historical digging and paying attention to people who wrote about historical topics. And it was fun, but there wasn’t a lot of reward for doing that. You don’t get tenure for writing history papers unless you’re in the history department. Then you’ve got to write ones that historians care about. So it took a while for my interests in the history of statistics to emerge. Steve Stigler has been a very close friend for a long time, and he was into writing about our history in a pretty serious way, starting in the 1970s.
I found Steve’s papers really fascinating. He and I did lots of interaction on the origin of contingency table analyses over the years. Although, you can never beat Steve Stigler. He always finds something earlier. For example, I thought I had systematically traced the work on contingency tables back to Pearson and Yule. In 2002, I co-organized a journal issue in honor of a statistician who had done some seminal work on log-linear models and quasi-symmetry, Henri Caussinus. Steve did a paper for that on the early history of contingency tables with examples from Galton and others that I wouldn’t have found in a million years. But I confess that I got into a lot of statistical history inspired by Steve. There was a sudden spate of books on the history of statistics in the 1980s, including one by Steve. I wrote a review essay drawing upon all of these historical accounts, but I then I also went back to a substantial amount original material as part of the effort. Not the way Steve does; Steve is really a professional. I’m an amateur historian.
My review essay characterizing the work in these books is, in fact, my own take on a history of statistics. It’s shorter and it’s much more readable than many, but it’s never more readable than Stigler. And it has a different emphasis.
I followed up on that essay in different ways. The project I did with Margo Anderson in the 1990s, which is about census taking, took us back to the first census, for a variety of reasons. Margo is the historian of the census. And much of what I know about the history of census taking I learned from her. She has lots of source material, and though Margo I’ve dabbled in the history of the census, on topics like estimation and adjustment, and on the measurement of race in the census.
My paper, “When Did Bayesian Inference Become Bayesian,” which was one of my labors of love, started in a bar in Ann Arbor, Michigan. Margo and I were on the ICPSR council, and we were attending a council meeting. While we were waiting for others to join in on a dinner, we were having drinks and she turned to me and asked, “When did people start to talk about Bayesian analysis?” I said, “I don’t know.” We talked about it a little bit. I knew that when I was a student people talked about Bayesian analyses, but I also knew that way earlier most statisticians talked in terms of inverse probability, a term attributed to Laplace. I said, “I’ll have to work on it.” So I went back and I pulled Jimmie Savage’s 1954 book off my shelf, and much to my surprise it didn’t mention the word Bayesian at all. Well, that’s pretty strange, I said to myself, because shortly thereafter, there’s Raiffa and Schlaifer’s book, which is full of the word Bayesian. Then I did a JSTOR search, and gradually I began to do a back and forth with Steve Stigler, and then with Jack Good and others. What started as a simple question in a bar turned out to be a 40-page paper several years later. Being an amateur historian is fun.
Behseta: Which takes us to the two books that Sesa and I really love! One is The Analysis of Cross Classified Categorical Data. When I took the course with you, that was my textbook and I loved it and I learned a lot from it. And I was looking at it in the anticipation of this interview the other day, and it’s pretty fresh.
Steve: I wrote it after I had worked on Bishop, Fienberg, and Holland, which we finally published in 1975. There is a very long story behind that.
Slavkovic: We will get to that, too.
Steve: Bishop, Fienberg, and Holland was an amalgam of things that grew out of collaborations, but it had a lot of contributing authors. Darrell Bock of The University of Chicago got me to co-teach a short course on categorical data analysis, just after Bishop, Fienberg, and Holland appeared. I wrote some lecture notes for that course. They were pretty successful. Darrell had written his own book on categorical data analysis, which had a very different approach to the topic. Very good, but quite different. We did the short course again the next year and I expanded my lecture notes.
I really wanted to rewrite Bishop, Fienberg, Holland in the spirit of these notes. I tried to convince Yvonne and Paul to do this. I even circulated an outline of what I thought we could do, totally redoing the book. Yvonne wanted to redo it, but totally differently than I did. Paul’s interests were different yet again, and he had somewhat different ideas of how to redo it. After a while, it became very clear that we would never, ever agree on a plan. In fact, we couldn’t even agree on how to do a second edition with relatively small changes because we each wanted to do to do the small changes very differently. Finally, I said to them, “I’ve got these notes. Let me do more with them and we’ll leave our book alone.” I developed new examples, and my goal was essentially to prepare a book from which I could teach to statisticians and nonstatisticians.
In a sense, ACCD allows someone to use the same material, but at two different levels. After the book appeared in 1977, I quickly realized it wasn’t as complete as I wanted. So I did a second edition about three years later, and I’m still working now on the third edition. I actually have material. There will be a third edition if I survive. Basically, the course I taught for both of you was not just what was in the book, but also in the notes for the third edition. I have all of the notes, but I need something like a summer where I’m working on other projects.
Behseta: Does that mean you plan to include something on graphical models and causal analysis? I mean, you already had a chapter on causal models, right?
Steve: Yes, there is a chapter on that topic, but it’s the wrong chapter.
Slavkovic: What would be the right one?
Steve: Well, the second edition of the book came out just as Darroch, Lauritzen, and Speed published their 1980 paper on graphical models. Adding that material and what followed from it is essential, and that does lead to so-called causal models for categorical data using DAGs. Replacing the existing chapter on that topic is crucial as well. I probably wouldn’t make the book Bayesian, however. I will have something Bayesian in it, but that would be a different book.
Slavkovic: So back to the Bishop, Fienberg, and Holland book. I hope when re-doing the book for new versions, there will be no plans to exclude the tetrahedron! Both Sam and I remember you bringing the 3D model of the tetrahedron and trying to explain the surface of independence. All of us were totally perplexed about what was going on.
Steve: But look, you got it!
Slavkovic: Exactly! So we’ll get back to that. But did you find it useful to bring some of those visual tools for teaching? Do you think they help students?
Steve: It helped me. I confess I realized it didn’t really help everybody; many students wondered why I taught that material. As an undergraduate, I was brought up thinking about mathematics and statistics geometrically. That’s one of the things I learned from Don Fraser. I remembered Don lecturing and projecting into a lower dimensional space, using his hands, in order to do his regression estimate. Don taught geometrically. At least as important, were the courses I took from H.M.S. Coxeter as an undergraduate. That’s where I learned about barrycentric coordinates. His book, Introduction to Geometry, was my textbook in a third-year geometry course Coxeter taught. And that’s what I brought to bear in my thesis work on the two by two contingency tables.
It’s from Coxeter and related kinds of “old-fashioned” geometry that I recognized what the surfaces were. We now know that the algebraic geometers call the surface of independence the Segre variety. It also has this great representation, as Miron Straf was reminding me last night. When I gave my job talk at The University of Chicago in January 1968, he asked me whether the surface of independence was a minimal surface in a physics sense? He reminded me that I didn’t answer him. I said that’s because I didn’t know the answer. But Miron was, of course, correct. Because if you think about Kullback-Leibler information and ask about what’s the minimizer, under independence, you get the surface of independence in my paper on the two by two table.
Slavkovic: I think you have influenced many students with geometry. I remember a few years ago when I gave a seminar at Ohio State and visited with Elly Kaizar, she had a little 3D model of the tetrahedron with the surface of independence. A few years ago, I was asked to do a 15-minute presentation at a welcoming session for the prospective students at Penn State. Among other things, I brought a little 3D tetrahedron that my student Vishesh Karwa made out of a set of basic pencils and some rope and I was talking about geometry and algebraic statistics.
Steve: The funny thing about it was that it was really John Gilbert who made the first physical model. I was in the stat department at Harvard one day when he came in with this coat-hanger and copper-wire model. But he didn’t know what the mathematics behind it was. He had independence in his thinking too.
Slavkovic: And not the other surfaces.
Steve: Right. But he couldn’t say why it was independence. John was great. By the way, he was a Chicago PhD student who never got a PhD. They were really tough on students there in the 1950s.
Behseta: Did he work with Savage?
Steve: I don’t remember with whom John worked, but they were all tough on students back in those days. And it wasn’t just true of the faculty at Chicago. We try to be much better about getting students through their thesis work these days. We don’t succeed always, but we try.
Slavkovic: For me, personally, I eventually learned about the mathematics of the surface of independence, and that was my link to data privacy, too. I actually always wondered and I would never have believed asking this: How and why did you get involved in data privacy and confidentiality research?
Steve: There were two pieces to this work. When I was department head at CMU, Diane Lambert was a junior faculty and working with George Duncan on a little project that turned into a pair of papers, one in JASA and one in JBES, the business and economics statistics journal. They were basically taking a decision theoretic approach to confidentiality and trying to reel in the structure to what people said they were doing about confidentiality protection in the statistics agencies. Those were the first statistical papers I read on the topic. I wasn’t involved in the research, but because Diane was up for promotion—or reappointment and then promotion—I actually read the papers really carefully at some point, and then I filed the ideas away for future use.
When I was at York University, Denise Lievesley was helping organize a conference in Dublin on confidentiality and she asked me to do an overview of statistical approaches to the topic. The conference was heavily dominated by government statisticians. In fact, maybe two-thirds. There were also several lawyers because they dealt with the laws governing what the statistics agencies do about confidentiality. And then there were a handful of statisticians. George Duncan came, and there were a few statisticians from England besides Denise. I spent the better part of a year gathering up what I could find on the topic. We didn’t have Google in those days so it wasn’t very easy. There were the Tore Dalenius papers. I knew Tore because he had written for one of the early issues of CHANCE on data protection of subjects in that article.
But he also wrote this really important paper with Reiss, who was a computer scientist, on data swapping. When I was done with the review, I said to myself, “This is a real gold mine.” I had thought about that when I read Diane and George’s papers in the 1980s, but didn’t follow through. I reread their papers as part of this exercise and it was pretty prominent in what I had to say, because they were being statistical and most of the literature on confidentiality was ad hoc. I asked myself, “What does a good statistician do with a topic like this?” He or she brings order to ad hoc ideas and gives them structure. The published version of the review that appeared in the Journal of Official Statistics a couple of years later was a slightly more polished version of my Dublin presentation. I had already embarked on a follow-up project with Udi Makov.
During my second year at York, Udi was visiting as a faculty member. He and I originally met at the first Valencia conference. I was actually a discussant of Udi’s paper, which was an outgrowth of his thesis work. We became friends and, in anticipation of his arrival in Toronto he said we should work on a project together. I told him about the work on confidentiality and shared a draft of my paper with him. I said this is a gold mine, let’s just find something and we’ll work on it. It was during that year that we started our intruder modeling work. Of course it, like everything else, the problem we chose to tackle was harder than it looked to begin with. We worked on it for another year or so and I got a grant from the Census Bureau to develop it a little more.
Russ Steele was an undergrad in one of my classes back at CMU at the time, and we got him to work on the project in the summer. This led to Russ’ senior thesis. Every time I turned around to do another version of the confidentiality protection problem, there were all these other people saying they were doing it statistically, but I didn’t think they were. I tried to entice young statisticians like you into doing some of this work because there were all these unsolved problems that needed attention.
Slavkovic: I wanted to bring one other thing up in relation to privacy. I had a recent exchange with Cynthia Dwork about the more recent collaborations between cryptographers and statisticians. And she says—I’m quoting her about your role in getting us together and working together on this problem—”to Steve’s great credit, he saw something new was happening and he welcomed us [cryptographers] to his world, rather than trying to exclude us. I will always admire this.” And so what Sam and I were wondering is …
Steve: You should have that quote yesterday. That was better than the one about …
Slavkovic: The “paranoia” and “ad hoc”? Oh, but I thought that would get people going, which it did. So what do you look for when you start a new collaboration, a new project, or when you encourage these new interdisciplinary projects—you just said privacy looked like a gold mine. But in general, in these new collaborations and new projects, what do you look for?
Steve: I’ve said in other settings that I had mentors from whom I learned to do this. Don Fraser was a mentor in one sense, but that was very technical. Fred Mosteller and Bill Kruskal were mentors in a very different sense. Paul Meier, also, especially in connection with statistics and the law. They were Renaissance men in the sense that they were interested in everything.
Slavkovic: Like you!
Steve: Well, but you learn from people. And to me, one of the great things about being a statistician was that if I saw something that was interesting, more often than not there would be some statistical aspect to it. More often than not within a body of work arising in a substantive area, there would be some problem that I could formulate in a way that nobody else had, or reformulate so that it took on a slightly different form. In a sense, I do technical things in my interdisciplinary work but I don’t view the technical things I do as my most important accomplishments. I view formulating problems as my best contribution to these enterprises. Being able to see beyond what looks like a fog and say, “Hey, there might be something there; if we only tried to look at it in a different way, we could make something out of it.” In some ways, one of my neatest collaborations is one that nobody in the statistics department knows anything about, or almost nothing, Cognitive Aspects of Survey Methodology. That grew out of a little seminar that I participated in related to the redesign of the National Crime Survey just after I arrived at CMU. It was Al Biderman’s idea, and he brought several top researchers together to discuss what cognitive psychology had to say about questionnaire design but he didn’t know how to make them really talk to each other.
The psychologists were really neat people. Beth Loftus and I got along really well and she wrote a paper inspired by the seminar. Endel Tulving was another psychologist who was there. He taught a course at the University of Toronto which I took as an undergrad on research methodology in psychology. I didn’t know what he really did at the time, because your teachers don’t always teach about their research, they teach what the class is about. Learning to talk with people like this, and not stop because I speak log-linear models and they speak some language in psychology is what makes interdisciplinary work so much fun. If you can focus on substantive problems that they have and you find interesting, it’s really very easy. What may be hard for some people is to do this interdisciplinary work in lots of different areas because you often need to acquire deep subject matter knowledge.
But it’s really not hard to do. Back in 2003 when Cynthia was giving this talk at CMU, she had a half-baked idea. That’s unfair. She had an 80 percent baked idea, maybe even a higher percent. In fact it was a great idea but it didn’t deal with the privacy problem as I saw it. What was clear was that Cynthia and her collaborators had technical tools at their disposal that were really super, and we would ignore them at our peril. One of the things that Fred and Bill indirectly taught me is you give credit to everybody for what they do. You will never be hurt by giving other people credit. Cynthia and the cryptographers were doing neat things. We as statisticians helped them to make them neater. They polished their ideas up and generating a new literature on privacy with important results for statisticians. Their professional world is slightly different world than ours, and they still can’t deal with the problems that motivated me initially. But their work is of fundamental importance to the research on confidentiality and I, and students who work with me, are trying to use their ideas to solve statistical problems. And Cynthia’s a good friend and we dine and drink together.
Behseta: So what are you working on these days? What’s next?
Steve: There’s privacy. Sesa and I have a project, and in fact we have a paper that we have to finish writing in the next couple of weeks which draws on the work of one of my PhD students, Fei Yu, and which takes something that she and I did with Caroline Uhler on privacy protection in GWAS studies. Fei’s got neat results and he is going to have a great PhD thesis.
I’m also running this joint Living Analytics Research Centre with people in Singapore. It involves very substantial digital data sets that come from commercial partners on which we’re doing different kinds of machine learning style analyses and network modeling. As I explained in my lecture yesterday, there are totally new problems in this work waiting to be solved, like the design and analysis of experiments in networked environments. And I’m sure that when we get to the end of the project, we still won’t have solved all of the statistical problems but we will have made progress. I also have the revision of my book if I ever get to it. And finally I have the project with Judy Tanur that we have ……
Slavkovic: The book?
Steve: The book! We really have something like six chapters, including a pair on the history of sample surveys and on the history of experimental design. And those ones are up to date, despite what she said. It’s how to finish it that we don’t quite know to do. And we’ll have to do a lot of new writing. We may not have to do new research.
Behseta: Maybe for the benefit of our readers, you could say a few words about the book.
Steve: Judy and I started to collaborate when she was writing a review of a set of social science research things—quantitative social science—for a special volume that NSF was pulling together. Maybe it was SSRC-related. This was in the late 1970s and we had already become friends. And she invited me to come to their cottage in Montauk at the tip of Long Island. We planned this summer trip that began with a visit to my brother in Tennessee. This was with Joyce the kids, and our dog Princess. We showed up with the dog at the Tanur’s cottage. They also had dogs, and that began a summer tradition which is much more punctuated now, but we still visit. We don’t have a dog anymore, but they have two.
In Montauk, we would go up on the upper deck and Judy and I would work. We also swam and had a good time with our families and dogs. Originally it was my helping Judy with the NSF survey paper. Then we branched out and began to collaborate more directly. We talked about the history of surveys and where ideas came from and where they should go. At some point, I explained the parallels between surveys and experiments. This was something I new about in informal ways but when I started to do some background reading, I realized was never properly captured in the statistics literature. We started to write papers on the topic. We got a grant and we wrote more. And we actually wrote six chapters, and we published a half dozen papers as well as a nice one on cognitive aspects of survey modeling, with Beth Loftus. In a joint paper in Science we brought the two ideas together and pointed out that the key thing was to test variants in survey questionnaires totally differently than the survey people did. They did spit samples with interviewers dealing with only one of the variants. We noted that by having the variants of survey questionnaires done within an interviewer, we could control for the variability much better. Components of variance is an experimental design idea, and we were importing it into survey design. We’ve never won that battle with the government sample survey folks, by the way. But some people do what we suggest. We worked on the book through the mid 1990s and then I got sucked off into other things like Fred’s autobiography. That got in the way.
Slavkovic: Yesterday, there were already orders for this book. So when is it supposed to come out?
Steve: Everything takes longer than you think it should. I don’t have Fred’s ability to “get the job done.” But I do get them done at some point. I hope this will happen with the book with Judy, too.
Behseta: Where do you see our discipline is heading? What is the future for statistics?
Steve: Well, you just have to read the newspapers. You have to read what my colleagues are writing. Chad Schaeffer yesterday had a full-page piece in the Pittsburgh Post Gazette on statistics and Big Data. Joel [Greenhouse] had a piece in the Huffington Post a week or so ago about the uses of statistics. The action these days is, in many ways, Big Data and data science. People who ignore these developments and ignore the emphasis on them emanating from computer science, physics, and bioinformatics, ignore them at their peril. Statistics departments that try to close those folks out will lose out in the long run. Opening the profession up to collaborations on Big Data and data science is similar to our welcoming Cynthia and her really smart colleagues into what we do on confidentiality and privacy protection. It makes what we do better. We as statisticians have a lot to offer the Big Data movement, but so do others.
When I do Big Data, I also do little data. I teach contingency tables, not by starting with the National Long Term Care Survey and six waves each involving 200 variables; I start it with a two by two table. It’s back to that philosophy that I told you—PGP—that I learned from Mosteller, Rourke, and my collaboration with them. You start with the simplest possible way of the understanding a problem, and then you learn from that and generalize and scale up to Big Data. What we have to learn to do from our computer science friends is generalize to handle big problems, not just the little problems.
Slavkovic: What does that mean for training of a new generation of statisticians? What do we need to do to prep them?
Steve: We need to get their attention first. When I got to Minnesota, I had a very interesting experience. I was the first chair of applied statistics and had all of these young colleagues over in St. Paul. The older statisticians were in Minneapolis—with some young people, they weren’t all old. When we wanted to introduce courses that were much more methodological, they said, well, we’ll have add the new courses onto the existing requirements. I said, “That’s crazy.” You’ll add this on and you’ll add on another one and then the students won’t get to research until their sixth year instead of their fourth year.
We had a year-long sequence in multivariate analysis because that’s what one of the faculty members did and he taught it every year. I’m all for teaching multivariate analysis, but we often think that we need our students to know the union of what all the faculty know and not the intersection, before we have them do something new. That’s clearly stupid; you can’t do that. We’ve got to streamline what we teach, but we don’t want to lose it all, either. After all, where will they learn about how the concepts of experimental design and survey design fit together in the context of real-world problems if we no longer teach either subject. As we bring in new tools and other ways of thinking—different kinds of computation because it’s a very different computational world—other topics have to get set aside. But are goal should be not to lose them totally. What if somebody says we don’t need the Rao-Blackwell theorem anymore because I don’t use it for Big Data. It would be a mistake not to teach about it. And by the way, Rao-Blackwell is important in Big Data settings.
Or suppose someone says, we don’t need the Pearson chi-square statistic anymore, because it’s based on the wrong asymptotics, so we won’t teach it. My response is that, if you don’t teach that topic, how the hell are you ever going to get to the variant of it that you need for Big Data? There’s a constant tension between retaining the older ideas, which may serve as the basis for new statistical research, and at the same time keeping up with the machine learning people and the cryptographers who are out ahead of us in many dimensions (that’s a pun). These are very smart people. That’s why I want them as my colleagues and collaborators. They make terrific researchers and they’re going a mile a minute.
We as statisticians want to do things that they do, but we want to root our work in the theory and structure in which we’ve been trained. If we can bring statistical rigor to those Big Data problems, we will also change the nature of what the machine learning researchers do. The future is Big Data at one level, but it’s Big Data infused with the richness that the hundreds of years of statistical methodology and theory brings to the table.
Slavkovic: I am sure you’ve been asked this question so …
Steve: And it’s your job to follow through on what I have just said because Big Data is a young person’s pursuit.
Slavkovic: But what is Big Data?
Steve: It doesn’t matter. Because everybody else thinks that Big Data is what they do and that’s what we need to train our students to feel comfortable doing. If that’s what brings students in the door, if we as statisticians can turn the students onto interesting problems and tell them that the statistical work they are doing is Big Data, then it doesn’t matter if we can define Big Data, right?
I used to think that Big Data in statistics was what we collected in censuses. I told this to my friend Ralph Roskies who is a physicist and co-directs the Pittsburgh Supercomputer Center. When I explained how big the U.D. decennial census database was, he said, “I want to know how many gigabytes.” I said, “It’s not gigabytes, it’s megabytes.” He said, “So it doesn’t generate a large data set.” I said, “It’s large in a different sense. And we try to combine census data with data from other sources.”
Does the National Long Term Care survey generate Big Data? Well, you know, there are hundreds of questions; that’s large. And six survey waves, so the dimensionality of individual level data is hundreds to the power of six, if you think longitudinally. In this sense, statisticians and others have been doing big for a long time. But computer scientists and physicists know how to do a lot of things that we don’t know how to do. And they can compute faster than I can think. We need to have our students learn how to do that, but also learn how to think statistically.
Slavkovic: Learning the basics of algorithms is important these days.
Slavkovic: Besides Big Data, is there anything else that comes to mind that would be prominent in any way? Because we are valuable now, I mean statisticians are valuable now, and it would be good if we can stay that way.
Steve: The thing I marvel at in some ways in my environment at CMU is that if we were to have twice as many people in the department, everybody would be just as busy interacting with others around the university, doing collaborative research, working on real problems. Delivering on real problems, to me, remains a focus of what I do and where I see the profession going. Look, every once in a while, I do mathematical statistics, or something that I call that. I even publish in the Annals of Statistics when I have good collaborators who can help me get our work published there. But that’s not my forte. It’s getting that kind of theoretical work together with the applications. And that’s the future of the field. For me, it is also the future of IMS, which has mathematical in its title. I am afraid ASA blew it when it came to many of these developments over the past 15 years. That’s when our CMU collaborations with computer scientists began to gel into a separate unit that later became the machine learning department.
CHANCE is an ASA journal and I’m happy to put my views on record. ASA could have been a leader by reaching out to the machine learning community as it was beginning to grow and take shape. Instead, ASA turned its back on them. Not everybody. There were great people from the statistics community who saw this as an opportunity and began to do that work that is now called machine learning, but very few. ASA had all sorts of directions that board members thought the organization should move in, and they undercut what I would have done in this crucial domain. ASA and IMS should have co-sponsored the big machine learning and data mining conferences.
We should have been there at the outset. ASA failed to do that. You could almost forgive IMS for failing to do it because it didn’t look like there weren’t lots of theorems in machine learning. That by the way wasn’t true, and IMS actually reached out in some ways. With the creation of the Annals of Applied Statistics, we jumped over the barrier dividing statistics and machine learning and we published and continue to publish research that really sits at that interface. You don’t see a lot of this in ASA journals. You see a lot of it around here at JSM, or at least more than before. But ASA hasn’t changed. IMS is actually changing, little by little, with AOAS as a great bulkhead. ASA, belatedly decided to co-sponsor a data mining journal, but it’s the wrong one.
ASA should have said to the people who were starting the Journal of Machine Learning Research, “We want to provide you with the resources to turn this into a journal to change the field of statistics.” And so that journal has its own organization. ICML, a machine learning conference has its own organization. NIPS, another one, has its own organization. It’s too late to make these part of our statistical enterprise in the way that really would have changed ASA. I don’t think it’s too late for IMS. And it is not too late for statisticians to embrace Big Data and data science in ways that will enhance the field of statistics.
Behseta: Steve, thank you very much for this wonderful interview.
Slavkovic: Thank you, Steve!
Steve: It was fun to chat, and I’m especially pleased that this will appear in CHANCE. For many years, it was my baby, but now it has grown up and matured. It’s especially gratifying to see both of you involved with it today.