Special Issue on Astrostatistics
Like most areas of science, astronomy has benefited from a tremendous increase in the amount of available data gathered in recent decades, from both ground- and space-based instruments. For example, the Sloan Digital Sky Survey (SDSS) commenced in 2000, at a time when studies were typically conducted with galaxy samples of sizes on the order of 1,000. By 2008, SDSS had produced a map of 930,000 galaxies, with high-resolution spectral information for each. By the time of its 15th data release in 2017, SDSS had a catalog of more than 2.5 million galaxies. The Large Synoptic Survey Telescope (LSST), currently under construction in Chile, is projected to yield a catalog of 1010 galaxies by the time its 10-year survey ends in 2032.
Data gathered about astronomical objects take a range of forms, but at their basic level, they consist of intensity measurements, often observed at a range of different wavelengths, at different spatial positions, and at various points in time. It is the variability across objects, wavelengths, and space and time that encodes invaluable information regarding the formation, evolution, and current state of the universe and its components. Analyses are complicated, however, by observational limitations, including measurement error, contamination, and the inherent limitations of our Earth-centered frame of reference.
The role and participation of statisticians in astronomy has increased along with the sizes of the data sets and the complexity of the inference challenges, but further collaboration is needed to address the many open problems. A core group of statisticians is committed to this effort; they are working to make astronomical data analysis known to the broader statistical community and helping to bridge any language barriers that may exist.
It has been our experience that the perceived difficulty of the subject matter is exaggerated. We typically involve undergraduate and graduate students, with no background in astronomy, in fruitful projects.
With this in mind, this special issue of CHANCE seeks to provide some context for common themes in astrostatistics, the types of data encountered, and the important role that statistical methods can play. Methods well-established in statistics may be little-known or used in astronomy. Astronomical problems also present unique challenges that force the development of new methods, or at least push existing methods in interesting directions.
In this issue, Green, Mintz, Xu, and Cisewski-Kehe discuss one of the recurring challenges of working with modern astronomical data—namely, that the data in their raw form are complex and not amenable to classical statistical analysis. Astronomers have often used ad hoc compression of these data, extracting low-dimensional features that are believed to encode important information, but the potential loss of information is great. Ideas from topological data analysis, a new and growing area of study in statistics, have great promise for data-driven approaches to extracting important information from data such as the large-scale structure of the universe.
Politsch and Croft describe the challenges and potential of working with the Lyman-alpha forest, another complex data set that informs our understanding of the large-scale structure of the universe. Lyman-alpha forest data are particularly interesting because of the innovative approach for collecting them. The light from distant quasars provides access to some aspects of the intergalactic medium, which can then be used to infer properties of the distribution of gas in regions that would otherwise be inaccessible.
Although modern astronomical data sets are massive, individual observations are often of low quality. Freeman provides an overview of the challenges of using low-resolution photometry to estimate a fundamental property of a celestial object: its distance from us. On cosmological scales, distance is a proxy for time, so any study of the evolution of the universe relies on accurate distance measures, typically quantified via the redshift.
Eadie considers a similarly fundamental problem—that of estimating the mass of gravitationally bound dynamical systems like galaxies. Such estimates are crucial to understanding the nature of dark matter. How the positions and velocities of stars (called tracers) are used to constrain a system’s mass illustrates a range of inference approaches in astronomy: These observations have a complex, physically motivated relationship with the quantity of interest. Estimation in such situations presents unique challenges, and has motivated the development of a range of novel techniques.
A chief benefit of increased participation of statisticians in astronomy is the cross-pollination of ideas across fields: Statisticians work in a wide range of domains, of course, and experts in the development of statistical methods are always on the lookout for new areas of application for classic and novel approaches. Segal and Segal apply the Patient Rule Induction Method (PRIM), a less-known but potentially useful, tool for supervised learning about a classification problem in astronomy that involves identifying open clusters of stars using Gaia data.
Perhaps one of the most-exciting recent developments in astronomy is the discovery of a large number of exoplanets orbiting stars in the Milky Way, and statistical tools play a crucial role in making these discoveries. Feigelson provides an overview of the challenges of searching for the signatures of exoplanets in noisy time series, and discusses how classical time series models and modern machine learning can combine to develop improved approaches to this important problem.
Corliss demonstrates the potential of modern approaches to clustering in the classification of supernovae—the explosive deaths of stars—based on their observed time series. This work demonstrates that clustering, especially with data such as these, requires careful consideration of the choice of similarity measure.
With this issue as a starting point, there are many avenues for getting involved in astrostatistics.
- The American Statistical Association has an Astrostatistics Special Interest Group that welcomes and encourages the participation of statisticians and astronomers, including students, postdocs, researchers, faculty, and members of industry or government.
- The Astrostatistics and Astroinformatics Portal (ASAIP) is a central site that compiles information about the field relevant to astronomers, computer scientists, and statisticians. It includes information about papers, meetings, and other resources for both new and active researchers in astrostatistics.
- The Cosmostatistics Initiative (COIN) is an international and interdisciplinary community focused on developing stronger ties between the various fields related to astrostatistics. COIN also organizes “Residence Programs” where a small group of researchers in astronomy, computer science, cosmology, and statistics meets for a week in various destinations around the world to focus on solving several problems in the field while forming closer interdisciplinary ties. The Residence Programs have been quite productive and are a great way to make connections with those interested in astrostatistics.
- The International Astrostatistics Association (IAA) seeks to bring together researchers from around the world who are interested in advancing the field of astrostatistics.
Involvement in any of these organizations can be a great way to learn more about astrostatistics. We hope many of you will join us in the exciting pursuit of exploring our universe.