Happiness and Causal Inference
My old and very dear friend Henry Braun describes a statistician as someone who’s pretty good with numbers but hasn’t got the personality to be an accountant. I like the ambiguity of the description, vaguely reminiscent of a sign next to a new housing development near me, “Never so much for so little.”
Although ambiguity has an honored place in humor, it is less suitable within science. I believe that although some ambiguity is irreducible, some could be avoided if we could just teach other scientists to think more like statisticians. Let me provide one illustration.
Issues of causality have haunted human thinkers for centuries, with the modern view usually ascribed to David Hume. Statisticians Ronald Fisher and Jerzy Neyman began to offer new insights into the topic in the 1920s, but the last 40 years—beginning with Don Rubin’s unlikely sourced 1974 paper—have witnessed an explosion in clarity and explicitness on the connections between science and causal inference.
A significant event in statisticians’ modern exploration of this ancient topic was Paul Holland’s comprehensive 1986 paper, “Statistics and Causal Inference,” which laid out the foundations of what he referred to as “Rubin’s model for causal inference.”
A key idea in this model is that finding the cause of an effect is a task of insuperable difficulty, and so science can make itself most valuable by measuring the effects of causes.
What is the effect of a cause? It is the difference between what happens if some unit is exposed to some treatment vs. what would have been the result had it not been. This latter condition is a counterfactual and hence impossible to observe.
Stated in a more general way, the causal effect is the difference between the actual outcome and some unobserved potential outcome. Counterfactuals can never be observed hence, for an individual unit, we can never calculate the size of a causal effect directly. What we can do is calculate the average causal effect for a group of units.
This can credibly be done through randomization. If we divide a group of units randomly into a treatment group and a control, it is credible to believe that, because there is nothing special about being in the control group, the result we observe in the control group is what we would have observed had the treatment group been enrolled in the control condition. Thus the difference between the treatment and the control outcomes is a measure of the size of the average causal effect of the treatment (relative to the control condition).
The randomization is the key to making this a credible conclusion, because it provides the expectation of balance with respect to all other (potentially confounding) factors, known or unknown, measured or unmeasured. Randomization is the only tool that guarantees this. But, for randomization to be possible, we must be able to assign either treatment or control to any particular unit.
Thus is derived Rubin’s bumper sticker–worthy conclusion that there can be “no causation without manipulation.” This simple result has important consequences. It means that some variables, like gender or race, cannot be fruitfully thought of as causal, since we cannot randomly assign them. Thus the statement “she is short because she is a woman” is causally meaningless, for to measure the effect of being a woman we would have to know how tall she would have been had she been a man. The heroic assumptions required for such a conclusion removes it from the realm of empirical discussion.
Although the spread of Rubin and Holland’s ideas has been broad within the statistical community, their diffusion through much of the social sciences, where they are especially relevant, has been disappointingly slow.
The one exception to this is in economics, where making valid causal inferences has crucial importance. One goal of this note is to help speed that diffusion by showing how they can illuminate a vexing issue in science, assessing the direction of the causal arrow. Or, more precisely, how we can measure the size of the causal effect in each direction.
This issue arose most recently in an article titled “Increasing Adiposity: Consequence or Cause of Overeating?” in the Journal of the American Medical Association that proposed a theory of obesity that turned the dominant theory on its head. Specifically, the authors argued there is evidence people eat too much because they are fat, in addition to they are fat because they eat too much. Obviously, measuring the relative size of the effects of the two plausible causes is of enormous practical consequence. I will leave a careful discussion of how we might do that to a later account. Today, let us tackle a different manifestation of the same problem, because it has some subtler aspects worth illuminating: happiness.
Happiness: Its Causes and Consequences
There is an extensive literature surrounding the relationship between a human sense of well-being (what I will call ‘happiness’) and successful performance on some cognitive task (say school grades or exam scores). Some suggest happy students do better in school (e.g., the effect of being happy is higher grades); others point out that when someone does well, it pleases them and they are happier (e.g., the effect of doing well is increased happiness). How are we to disentangle this chicken and egg problem?
Before we tackle this, let’s drop back a little and describe the state of the art, as much as I could discern it, in the happiness literature. Alexandra Robbins, in her book The Overachievers: The Secret Lives of Driven Kids, claims the rigors associated with high performance often generates unhappiness. To achieve the goal of making our children happier, this ‘finding’ has led to the suggestion that academic standards should be relaxed. The existence of this suggestionii and that it is being seriously considered lifts the subject of the direction of the causal arrow (as well as the size of the causal effect) out of the realm of the theoretical into that of the immediately practical.
Despite this pop-psychology belief, the empirical evidence actually shows a positive relationship between happiness and performance. How credible is this evidence? This is hard for me to judge since much of it appears in scientific journals like the Journal of Happiness Studies or Education Research International, with which I am completely unfamiliar.
I did note a fair number of cross-sectional studies that show a positive relationship between happiness and school success (see Gilman & Huebner, 2006, and Verkuyten & Thijs, 2002, in Further Reading). But they often carry a caveat akin to “As with any study based on correlational evidence, care must be taken in the interpretation and generalization of the findings.” Specifically, the nature of the evidence does not support a causal link between examined variables even if one, in truth, does exist. “Additional research would therefore be warranted to further investigate additional relational dimensions between and among the variables explored in the present study.”
What sort of care? Happily, the authors help us with an elaboration: “A larger sample with a more even distribution of gender and race could also stand to strengthen the findings as would a sample of participants from beyond the Midwestern United States and from larger universities.”
Is the character of the sample the only problem? P. D. Quinn and A. L. Duckworth, in a 2007 poster presentation titled “Happiness and Academic Achievement: Evidence for Reciprocal Causality,” pointed out that the causal questions of interest would be better explored “in a prospective, longitudinal studyiii,” which they did.
In their study, they measured the happiness of a sample of students (along with some covariates) and then recorded the students’ school grades, and then returned a year later and did it again.
They report, “Participants reporting higher well-being were more likely to earn higher final grades” and “students earning higher grades tended to go on to experience higher well-being.” They conclude, “The findings suggest the relationship between well-being and academic performance may be reciprocally causal.”
Trying to draw longitudinal inferences from cross-sectional data is a task of great difficulty. For example, I once constructed a theory of language development from a detailed set of observations I made while on a walking tour of south Florida. I noted that most people, when they were young, primarily spoke Spanish. But, when people were old, they usually spoke Yiddish. I tested this theory by noting that the adolescents who worked in local stores spoke mostly Spanish, but a little Yiddish. You could see the linguistic shift happening. It is easy to see that the results obtained from a longitudinal study are less likely to suffer from the same artifacts as a cross-sectional one. But because a longitudinal study’s causal conclusions suffer from fewer possible fatal flaws than a cross-sectional study does not mean such conclusions are credible.
Something more is needed. We turn to Rubin’s model for help. Let us start with the idea that when people do well, they are happier than if they do badly. Simultaneously, it does not seem farfetched to believe happier people will do better. This matches Quinn and Duckworth’s causal conclusion. But the important question is quantitative, not qualitative.
How can we design an experiment in which the treatments can be randomly assigned? Suppose we take a sample of students and randomly divide them in half, say into groups A and B. We now measure their happiness using whatever instruments are generally favored. Next, we administer an exam to them and subtract 15 points from the scores of all students in group A while simultaneously adding 15 points to those in group B (it is easy to see generalizations of this in which the size of the treatments are modified in some systematic way, but that isn’t important right now). Now we re-measure their happiness.
My suspicion is that those students who did worse than they expected will be unhappier than they were originally, and those who did much better will be happier. The amount of change in happiness is the causal effect of the treatment, relative to the control. Had the fuller experiment been done, we could easily calculate the functional relationship between the number of points added and the change in happiness. This ends Stage 1 of the experiment.
Stage 2: We now have two groups of students whose happiness was randomly assigned, so we can re-administer a parallel form of the same test. We calculate the difference in the scores from the first administration (the actual score, not the modified one) to the second. The size of that difference is a measure of the causal effect of happiness on academic performance. The Stage 2 portion of the experiment, where the assignment probabilities depend on the outcomes of the first stage, is usually called a sequentially randomized or “split plot” design. The ratio of the size of the two causal effects tells us the relative influence of the two treatments. An outcome from such an experiment could yield conclusions like “the effect of improved performance has 10 times the effect on happiness than a similar increase in happiness has on performance.iv”
Conclusions
Even a cursory reading of the happiness literature reveals the kind of conclusions researchers would like to make. Typically, Zahra Salehi, Akbar Afarinesh Khak, and Shahram Alam in a 2013 European Journal of Experimental Biology article tell us, “Results showed that in addition to the positive and significant correlation between happiness and academic achievement of university students, happiness could also explain 13% of changes of academic achievement.”
You can feel the authors’ desire to be causal, and they come very close to making a causal claim—certainly a lay interpretation of “explain” would have it that way. But the character of the published studies mitigates against the sorts of causal interpretations that seem to be yearned for. Most were observational studies, and the rest might be called “some data I found lying in the street.”
But I bring good news. Through the use of Rubin’s model, we can design true experimental studies that can provide answers to the questions we want to ask. Moreover, the very act of precisely describing the real or hypothetical randomized experiment needed to measure the causal effects of interest greatly clarifies what causal question is being asked. The bad news is that such studies are not as easy or inexpensive as picking up old data off the street and summarizing them. But if making causal inferences correctly were easy, everyone would do it.v
Further Reading
Fisher, R. A. 1925. Statistical methods for research workers. Oliver & Boyd: Edinburgh.
Gilman, R., and E. S. Huebner. 2006. Characteristics of adolescents who report very high life satisfaction. Journal of Youth and Adolescence 35(3):311–319.
Holland, P. W. 1986. Statistics and causal inference. Journal of the American Statistical Association 81:945–970.
Hume, D. 1740. A treatise on human nature.
Ludwig, D. S., and M. I. Friedman. 2014. Increasing adiposity: Consequence or cause of overeating? Journal of the American Medical Association. Published online May 16, 2014. doi:10.1001/jama.2014.4133
Neyman, J. 1923. On the application of probability theory to agricultural experiments. Translation of excerpts by D. Dabrowska and T. Speed (PDF download). Statistical Science 5(1990):462–472.
Quinn, P. D., and A. L. Duckworth. 2007. Happiness and academic achievement: Evidence for reciprocal causality. Poster session presented at the meeting of the Association for Psychological Science, Washington, DC.
Robbins, A. 2006. The overachievers: The secret lives of driven kids (1st ed.). New York: Hyperion.
Rubin, D. B. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of Educational Psychology 66:688–701.
Rubin, D. B. 1975. Bayesian inference for causality: The importance of randomization. In Social Statistics Section, Proceedings of the American Statistical Association. 233–239.
Rubin, D. B. 1978. Bayesian inference for causal effects: The role of randomization (PDF download). The Annals of Statistics 7:34–58.
Rubin, D. B. 2005. Causal inference using potential outcomes: Design, modeling, decisions. 2004 Fisher Lecture. Journal of the American Statistical Association 100:322–331.
Verkuyten, M., and J. Thijs. 2002. School satisfaction of elementary school children: The role of performance, peer relations, ethnicity, and gender. Social Indicators Research 59(2):203–228.
Walker, C. O., T. D. Winn, and R. M. Lutjens. 2008. Examining relationships between academic and social achievement goals and routes to happiness. Education Research International (2012), Article ID 643438.
Waterman, A. S. 1993. Two conceptions of happiness: Contrasts of personal expressiveness (eudaimonia) and hedonic enjoyment. Journal of Personality and Social Psychology 64(4):678–691.
Zahra, S., A. A. Khak, and S. Alam. 2013. Correlation between the five-factor model of personality-happiness and the academic achievement of physical education students (PDF download). European Journal of Experimental Biology 3(6):422–426.
iMy gratitude to Henry Braun, Scott Evans, and Don Rubin for encouragement and many helpful comments and
clarifications.
iiThis notion falls into the category of “rapid idea.” Rapid ideas are those that only make sense if you say them fast.
iiiThe value of a longitudinal study harkens to Hume’s famous criteria for causality. A key one is that a cause must come before an effect. Without gathering longitudinal data, we cannot know the order. But this is a necessary condition for causality, not a sufficient one.
ivIntuitively this surely makes sense, for if you don’t know the answer to a question, being happier isn’t going to change your knowledge base.
vPrecision is important. Note the treatment in Stage 1 is not higher performance vs. lower, but rather higher scores than expected vs. lower scores than expected. A subtle, but important, distinction.
About the Author
Howard Wainer is currently distinguished research scientist at the National Board of Medical Examiners. He has won numerous awards and is a Fellow of the American Statistical Association and the American Educational Research Association. His interests include the use of graphical methods for data analysis and communication, robust statistical methodology, and the development and application of generalizations of item response theory. He has published more than 20 books; his latest is Medical Illuminations: Using Evidence, Visualization, and Statistical Thinking to Improve Healthcare (Oxford University Press, 2014).
Visual Revelations covers many topics, but generally focuses on two principal themes: graphical display and history. Howard Wainer, column editor, encourages using this column as an outlet for popular statistical discourse. If you have questions or comments about the column, please contact Wainer at hwainer@nbme.org.