## Preparing Students for a Data-centric World

It is the beginning of the fall semester 2015 here at the University of Georgia. One of us is teaching an introductory statistics course for honors students. On the first day, the class is discussing the importance of statistical literacy. When asked if they realize how much data they generate each day, even each hour, the majority of these young university students admit that they had never thought about it. This student generation of “digital natives”—more connected to technology than any prior generation—has a surprising unawareness of how much data surrounds them and how much of these data they generate themselves. When we first started talking about it, the lack of awareness was shocking. But is it, in fact, so surprising?

Even with all the national efforts in recent years to integrate statistical standards at the school level, K–12 students still receive limited exposure to data analysis and statistical reasoning. Most college students are not data-savvy when they leave high school, but, once in college, realize they need to become statistically literate and learn how to reason with data. As many a statistics professor has often lamented, “How can I expect to help students develop the skills in 15 weeks to reason statistically and to learn the skills for exploring data while students in traditional mathematics have all of the schooling, from kindergarten to 12th grade, needed to become mathematical thinkers?” What we are really asking is, “How do we help students become data-savvy?”

At some level, the answer is obvious. As with mathematics, we begin by exposing students to data starting in kindergarten, introducing them to data thinking and visualization. One of us has a fourth-grader. Anecdotal evidence from that fourth-grader and his friends, gathered from kindergarten on, suggests that young children become excited by taking surveys on themselves, trying to answer statistical questions they develop with the help of their teachers. Why? Because statistics taps into a desire that even young children have to better understand our world.

These children were born at about the same time as Twitter and the explosion of social media (Facebook, Snapchat, Instagram).

For them, perhaps even more than for current undergraduate students, the notion of living in a data stream and all that entails might seem completely normal. As educators, we should be able to take advantage of this and help that next wave of young people become truly skilled in thinking about and using data on any scale.

Furthermore, it is natural to integrate statistical topics with many of the basic mathematical skills (such as counting, measurement, proportional reasoning) taught in the earlier grades. Students generally are competent with computers, and that allows them to explore data.

With the implementation of the Common Core Mathematics State Standards in the United States that includes a considerable amount of statistics in grades 6–12, we have the opportunity to motivate young students to explore the vast amount of data that surrounds them and that they help generate. Using technology, young students can develop database management and programming skills, use simulation for modeling and for developing conceptual understanding of inference, and thereby learn to appreciate the importance of data in their everyday lives.

#### What Prevents More Integration of Statistics in K–12?

Integration of statistics at the K–12 levels is hindered because, first, schools have not yet developed a culture in which statistics and working with data are seen as an important part of the curriculum, particularly in mathematics, where the statistics standards are currently housed. The mindset is in “slow mode” among many schools; it is still the traditional priority to maintain that calculus is the summit but, in today’s society, for the majority of students, statistics needs to be the peak instead.

Second, most school-level teachers are, themselves, inadequately prepared to teach statistics and to work with data. There is a resultant lack of confidence about delivering a data-driven curriculum. Until assessments at the school level include a significant amount of data analysis items, schools and teachers may not be motivated to make statistical reasoning a priority in the classroom. University programs that prepare mathematics teachers must make statistics training a priority. Unfortunately, most teacher-educators are inadequately trained to know how to prepare future teachers in statistics.

The emergence of graduate programs in statistics education, as distinct from mathematics education, is a sign that this may be changing, but such programs are still scarce. Fortunately, a current priority of the American Statistical Association (ASA) is to support teacher preparation in statistics. To that end, the ASA published *Statistical Education of Teachers (SET)* in April 2015.

*SET* outlines the type of statistical understanding that all teachers (school-level and higher-education alike) need to be able to deliver a curriculum that allows students to become data-savvy. The recommendations of this document are targeted at the preparation of school-level teachers, but how we train our teachers also affects the way we train our statistics majors (undergraduate and graduate) at the post-secondary level. This requires that we, as a profession, reconsider the university statistics curriculum.

In a recent article, George Cobb argues that simply “tweaking” what we currently do is not enough. Rather, he suggests that we need to essentially tear down the entire undergraduate curriculum and rebuild it from scratch to meet the new demands of the data-centric world. In rethinking the post-secondary statistics curriculum, we need to emphasize more exposure to Big Data—data that are messy and unstructured. Students need to learn how to manage, organize, and analyze these data, as well as how to use simulation and visualization. This type of program should be a continuation of the student’s statistics curriculum at the school level.

Communication skills (both written and oral) are a necessary part of this perspective as well. It is important to find the balance between the more-traditional mathematical approach to teaching statistics and the statistical practice skills necessary to be data-savvy.

Critical to the success of students becoming data-savvy is collaboration between K–12 and post-secondary education. What happens at the K–12 level is driven, in large part, by what is perceived as the desired curriculum at the post-secondary level. If the statistics curriculum being delivered to undergraduate and graduate students is more traditional, then K–12 will follow suit. In other words, the universities must be the leaders in setting the priorities for the statistical training of our students.

The content that universities deliver, and how they deliver it, is driven in turn, in some part, by what employers want, so there is a system of feedback loops between the job market, the universities, and the K–12 schools. In the Big Data or analytics realm, businesses and industry seek graduates who are data-literate; who know how to handle and visualize messy, large, unstructured data sets; and who have good communication skills. This is a lot to achieve in four years of undergraduate study, but, if the foundations are set before students arrive at college or university, it will be easier to achieve.

There are models currently in place for delivering a school-level course that allows students to develop skills for becoming those data-savvy individuals. Rob Gould at UCLA has developed one such program.

Gould’s “Introduction to Data Science” (IDS) is a yearlong course for high-school students that introduces them to fundamental concepts in working with data. Students who successfully complete the program validate the Algebra II requirement for the University of California and California State University systems. IDS is a product of Mobilize, an NSF-funded partnership between several entities at the University of California, Los Angeles (UCLA) in its Department of Statistics, Center for Embedded Networked Systems, and Center X of the Graduate School of Education and Information Sciences, and the Los Angeles Unified School District (LAUSD).

Mobilize lessons are based on Participatory Sensing, a data-collection paradigm in which students use smartphones and other mobile devices to collect data about themselves, their schools, and their communities. IDS challenges students to develop both computational and statistical thinking skills by learning to ask statistical questions, examine and collect participatory sensing data (and other data collection paradigms), analyze data using R (through RStudio and the mosaic and MobilizR packages), and interpret and then present their results in written and oral reports.

IDS is designed to precede a “formal” statistical inference course such as AP Statistics, so it solidifies students’ notions of informal inference and causality, develops skills in exploratory data analysis, and teaches students to solve probability problems through designing and coding simulations. Data are a big part of our lives and culture, so IDS students examine the role that data play in privacy, social networking, and other areas of contemporary life. Data in IDS are often messy, and come from a variety of sources, including large national databases and sensors (both human and machine). Students learn to merge data sets, recode variables, and write scripts that enhance scientific reproducibility.

The 2014–15 academic year was the first pilot year for IDS. It was taught by 10 teachers at 10 high schools to approximately 365 students in LAUSD. With roughly 650,000 students, LAUSD is the second-largest district in the country, and serves a predominantly Latino and African-American student body, with a large percentage of English learners. While Gould and his team are still processing evaluation data from the pilot year, they have heard anecdotally from some students that this is the first math course they have understood, and from some that this is the first course they have passed on the first try.

Teachers have reported that not a single student has asked them to explain why the material is relevant to the “real world,” and that the course has created a fundamental change for the better in some students’ attitudes and beliefs about science and mathematics.

IDS is currently expanding to a new cohort of 24 teachers from 22 schools with the hope that IDS will be offered at all high schools in LAUSD within two academic years.

While the IDS is just one program, it is indicative of the type of creative thinking that we, as a profession, may need to embrace to introduce schoolchildren to the field that we all love and appreciate. The pervasiveness of data and the importance at a societal level of having citizens who are not data-naïve (and, hence, easily swayed by false statistical reasoning), together with the natural pull that even young children feel toward the statistical way of thinking, all give us some directions to work in and toward. Maybe when this year’s fourth-graders take introductory statistics in college, they will not be surprised to learn how much data they themselves generate in their daily lives!

#### Further Reading

Bargagliotti, A., and C. Franklin, 2015. The statistical education of teachers: Preparing teachers to teach statistics. *CHANCE* 28.3.

Cobb, George. 2015. Mere renovation is too little too late: We need to rethink the undergraduate curriculum from the ground up. To appear in *The American Statistician*.

Franklin, C., A. Bargagliotti, C. Case, G. Kader, R. Scheaffer, and D. Spangler. 2015. *Statistical education of teachers*. Alexandria, VA: American Statistical Association.

Usiskin, Z. The relationships between statistics and other subjects in the K–12 curriculum. *CHANCE* 28.3.

#### About the Authors

Nicole Lazarearned her PhD from the University of Chicago and is a professor in the department of statistics at the University of Georgia. Her research interests include the statistical analysis of neuroimaging data, empirical likelihood and other likelihood methods, and data visualization. She also is an associate editor ofThe American StatisticianandThe Annals of Applied Statistics, and the author ofThe Statistical Analysis of Functional MRI Data.

Christine Franklinis the Lothar Tresp Honoratus Honors Professor in the department of statistics at the University of Georgia and a fellow of the American Statistical Association. She was the chair of theSETwriting team. Her main research interests include statistics at the school level (K–12). She was a 2015 U.S. Fulbright Scholar in New Zealand, focusing on statistics in the school-level curriculum and teacher preparation.

In

**The Big Picture**, Nicole Lazar discusses the many facets of handling, analyzing, and making sense of complex and large data sets. If you have questions or comments about the column, please contact Lazar at

*nlazar@stat.uga.edu*.