DMS-EPSRC Collaborative Research: Advancing Statistical Foundations and Frontiers from and for Emerging Astronomical Data Challenges

Lead Research Organisation: Imperial College London
Department Name: Mathematics

Abstract

Statistical theory and methods play a fundamental role in scientific discovery and advancement, including in modern astronomy, where data are collected on increasingly massive scales and with more varieties and complexity. New technology and instrumentation are spawning a diverse array of emerging data types and data analytic challenges, which in turn require and inspire ever more innovative statistical methods and theories. This proposal is guided by the dual aims of advancing statistical foundations and frontiers, motivated by astronomical problems and providing principled data analytic solutions to challenges in astronomy. The CHASC International Center for Astrostatistics has an extensive track record in accomplishing both tasks. This NSF-EPSRC project leverages CHASC's track record to make progress in several new projects. Fitting sophisticated astrophysical models to complex data that were collected with high-tech instruments, for example, often involves a sequence of statistical analyses. Several UK-led projects center on developing new statistical methods that properly account for errors and carry uncertainty forward within such sequences of analyses. Additional US-led work will focus on developing theoretical properties of novel statistical estimation procedures to address data-analytic challenges associated with solar flares and X-ray observations. Other US-led projects involve fast and automatic detection of astronomical objects such as galaxies from 2D or even 4D data. The PIs will develop statistical theory and methods in the context of these projects, building statistical foundations and pushing the frontiers of statistics forward for broad impact that will extend well beyond astrostatistics. The PIs plan to offer effective methods and algorithms for tackling emerging challenges in astronomy, with the aspiration of promoting such principled data-analytic methods among researchers in astronomy. Its provision of free software via the CHASC GitHub Software Library will enable the distribution and impact of the proposed methods and algorithms.

Publications

10 25 50
 
Description StatML CDT: Modern Statistics and Statistical Machine Learning
Amount £6,202,023 (GBP)
Funding ID 2740612 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 09/2022 
End 09/2026
 
Description CHASC International Center for Astrostatistics 
Organisation Harvard University
Country United States 
Sector Academic/University 
PI Contribution The PI is a member of the CHASC International Center for Astrostatistics. Since its founding at Harvard in 1997, CHASC has established an impressive track record both for developing new statistical methods to solve challenging problems in astrophysics and for leveraging these problems to devise new general purpose statistical theory, methods and computational techniques. CHASC actively engages diverse groups of postgraduate students in its interdisciplinary research; many of these students have gone on to successful academic careers. CHASC is devoted to promote sound statistical practice among scientists to derive insight and knowledge, and to help statisticians develop better scientific understanding and insight. The CHASC Center also provides a worldwide forum for exchanges on challenging data problems in astronomy and for disseminating the methods developed by the Center, including free software. Several CHASC packages have been incorporated into CIAO, the primary data analysis environment used by X- ray astronomers. Other software packages are distributed via the CHASC GitHub Astrostatistics Software Library. In addition, CHASC organizes many sessions at meetings of statisticians to highlight newly developed methods of general interest, and at meetings of astronomers sometimes to convey new methods and other times with a more basic educational emphasis. Van Dyk (PI, Imperial) was the original founder of CHASC and has been its overall leader since. He is an international leader in astrostatistics with specific expertise in Bayesian methodology, multi-level models, and EM-like and MCMC algorithms. He has published extensively in leading astrophysical and statistical journals.
Collaborator Contribution Meng (PI, Harvard) has extensive research experience in statistical modeling, fitting, and improvement, in developing methods for complex incomplete data, and in the interplay of inferential perspectives for complex model fitting and estimation. He is also a leading voice in promoting principled data scientific and statistical methods, especially in his role as the Founding Editor-in-Chief for Harvard Data Science Review, a fast expanding international forum for perspectives, education, and research in data science. Lee (PI, Davis) has deep expertise in image processing, large scale computations, spatio-temporal modeling, time series and change point problems. He publishes frequently in leading statistics and engineering outlets. He and van Dyk are two of the three co-founders of the ASA Astrostatistics Interest Group. Chen is an alumna of the Center where she led a major effort on the calibration project, and has been involved in multiple collaborative astrostatistics projects since joining University Michigan. The astronomers (Kashyap and Siemiginowska, Center for Astrophysics | Harvard & Smithsonian) are leading experts in the analysis of high energy astrophysics data. They have vast experience with instrument calibration, astronomical software analysis systems, and are deeply involved with the development of methods, algorithms, and publicly available software for Chandra data. In consultation with these experts the statistics PIs (Meng, van Dyk, Lee, and Chen) oversee the entire project, with the PI in the lead institute (Meng) as the overall coordinator and convener. Research carried out simultaneously at the four institutions in all stages of the project.
Impact Fan et al. (2023), as well as the preprint: Meyer et al. (2023+ , arXiv:2207.09327) which is under review a ApJ.
 
Description CHASC International Center for Astrostatistics 
Organisation Smithsonian Astrophysical Observatory
Country United States 
Sector Public 
PI Contribution The PI is a member of the CHASC International Center for Astrostatistics. Since its founding at Harvard in 1997, CHASC has established an impressive track record both for developing new statistical methods to solve challenging problems in astrophysics and for leveraging these problems to devise new general purpose statistical theory, methods and computational techniques. CHASC actively engages diverse groups of postgraduate students in its interdisciplinary research; many of these students have gone on to successful academic careers. CHASC is devoted to promote sound statistical practice among scientists to derive insight and knowledge, and to help statisticians develop better scientific understanding and insight. The CHASC Center also provides a worldwide forum for exchanges on challenging data problems in astronomy and for disseminating the methods developed by the Center, including free software. Several CHASC packages have been incorporated into CIAO, the primary data analysis environment used by X- ray astronomers. Other software packages are distributed via the CHASC GitHub Astrostatistics Software Library. In addition, CHASC organizes many sessions at meetings of statisticians to highlight newly developed methods of general interest, and at meetings of astronomers sometimes to convey new methods and other times with a more basic educational emphasis. Van Dyk (PI, Imperial) was the original founder of CHASC and has been its overall leader since. He is an international leader in astrostatistics with specific expertise in Bayesian methodology, multi-level models, and EM-like and MCMC algorithms. He has published extensively in leading astrophysical and statistical journals.
Collaborator Contribution Meng (PI, Harvard) has extensive research experience in statistical modeling, fitting, and improvement, in developing methods for complex incomplete data, and in the interplay of inferential perspectives for complex model fitting and estimation. He is also a leading voice in promoting principled data scientific and statistical methods, especially in his role as the Founding Editor-in-Chief for Harvard Data Science Review, a fast expanding international forum for perspectives, education, and research in data science. Lee (PI, Davis) has deep expertise in image processing, large scale computations, spatio-temporal modeling, time series and change point problems. He publishes frequently in leading statistics and engineering outlets. He and van Dyk are two of the three co-founders of the ASA Astrostatistics Interest Group. Chen is an alumna of the Center where she led a major effort on the calibration project, and has been involved in multiple collaborative astrostatistics projects since joining University Michigan. The astronomers (Kashyap and Siemiginowska, Center for Astrophysics | Harvard & Smithsonian) are leading experts in the analysis of high energy astrophysics data. They have vast experience with instrument calibration, astronomical software analysis systems, and are deeply involved with the development of methods, algorithms, and publicly available software for Chandra data. In consultation with these experts the statistics PIs (Meng, van Dyk, Lee, and Chen) oversee the entire project, with the PI in the lead institute (Meng) as the overall coordinator and convener. Research carried out simultaneously at the four institutions in all stages of the project.
Impact Fan et al. (2023), as well as the preprint: Meyer et al. (2023+ , arXiv:2207.09327) which is under review a ApJ.
 
Description CHASC International Center for Astrostatistics 
Organisation University of California, Davis
Country United States 
Sector Academic/University 
PI Contribution The PI is a member of the CHASC International Center for Astrostatistics. Since its founding at Harvard in 1997, CHASC has established an impressive track record both for developing new statistical methods to solve challenging problems in astrophysics and for leveraging these problems to devise new general purpose statistical theory, methods and computational techniques. CHASC actively engages diverse groups of postgraduate students in its interdisciplinary research; many of these students have gone on to successful academic careers. CHASC is devoted to promote sound statistical practice among scientists to derive insight and knowledge, and to help statisticians develop better scientific understanding and insight. The CHASC Center also provides a worldwide forum for exchanges on challenging data problems in astronomy and for disseminating the methods developed by the Center, including free software. Several CHASC packages have been incorporated into CIAO, the primary data analysis environment used by X- ray astronomers. Other software packages are distributed via the CHASC GitHub Astrostatistics Software Library. In addition, CHASC organizes many sessions at meetings of statisticians to highlight newly developed methods of general interest, and at meetings of astronomers sometimes to convey new methods and other times with a more basic educational emphasis. Van Dyk (PI, Imperial) was the original founder of CHASC and has been its overall leader since. He is an international leader in astrostatistics with specific expertise in Bayesian methodology, multi-level models, and EM-like and MCMC algorithms. He has published extensively in leading astrophysical and statistical journals.
Collaborator Contribution Meng (PI, Harvard) has extensive research experience in statistical modeling, fitting, and improvement, in developing methods for complex incomplete data, and in the interplay of inferential perspectives for complex model fitting and estimation. He is also a leading voice in promoting principled data scientific and statistical methods, especially in his role as the Founding Editor-in-Chief for Harvard Data Science Review, a fast expanding international forum for perspectives, education, and research in data science. Lee (PI, Davis) has deep expertise in image processing, large scale computations, spatio-temporal modeling, time series and change point problems. He publishes frequently in leading statistics and engineering outlets. He and van Dyk are two of the three co-founders of the ASA Astrostatistics Interest Group. Chen is an alumna of the Center where she led a major effort on the calibration project, and has been involved in multiple collaborative astrostatistics projects since joining University Michigan. The astronomers (Kashyap and Siemiginowska, Center for Astrophysics | Harvard & Smithsonian) are leading experts in the analysis of high energy astrophysics data. They have vast experience with instrument calibration, astronomical software analysis systems, and are deeply involved with the development of methods, algorithms, and publicly available software for Chandra data. In consultation with these experts the statistics PIs (Meng, van Dyk, Lee, and Chen) oversee the entire project, with the PI in the lead institute (Meng) as the overall coordinator and convener. Research carried out simultaneously at the four institutions in all stages of the project.
Impact Fan et al. (2023), as well as the preprint: Meyer et al. (2023+ , arXiv:2207.09327) which is under review a ApJ.
 
Description CHASC International Center for Astrostatistics 
Organisation University of Michigan
Country United States 
Sector Academic/University 
PI Contribution The PI is a member of the CHASC International Center for Astrostatistics. Since its founding at Harvard in 1997, CHASC has established an impressive track record both for developing new statistical methods to solve challenging problems in astrophysics and for leveraging these problems to devise new general purpose statistical theory, methods and computational techniques. CHASC actively engages diverse groups of postgraduate students in its interdisciplinary research; many of these students have gone on to successful academic careers. CHASC is devoted to promote sound statistical practice among scientists to derive insight and knowledge, and to help statisticians develop better scientific understanding and insight. The CHASC Center also provides a worldwide forum for exchanges on challenging data problems in astronomy and for disseminating the methods developed by the Center, including free software. Several CHASC packages have been incorporated into CIAO, the primary data analysis environment used by X- ray astronomers. Other software packages are distributed via the CHASC GitHub Astrostatistics Software Library. In addition, CHASC organizes many sessions at meetings of statisticians to highlight newly developed methods of general interest, and at meetings of astronomers sometimes to convey new methods and other times with a more basic educational emphasis. Van Dyk (PI, Imperial) was the original founder of CHASC and has been its overall leader since. He is an international leader in astrostatistics with specific expertise in Bayesian methodology, multi-level models, and EM-like and MCMC algorithms. He has published extensively in leading astrophysical and statistical journals.
Collaborator Contribution Meng (PI, Harvard) has extensive research experience in statistical modeling, fitting, and improvement, in developing methods for complex incomplete data, and in the interplay of inferential perspectives for complex model fitting and estimation. He is also a leading voice in promoting principled data scientific and statistical methods, especially in his role as the Founding Editor-in-Chief for Harvard Data Science Review, a fast expanding international forum for perspectives, education, and research in data science. Lee (PI, Davis) has deep expertise in image processing, large scale computations, spatio-temporal modeling, time series and change point problems. He publishes frequently in leading statistics and engineering outlets. He and van Dyk are two of the three co-founders of the ASA Astrostatistics Interest Group. Chen is an alumna of the Center where she led a major effort on the calibration project, and has been involved in multiple collaborative astrostatistics projects since joining University Michigan. The astronomers (Kashyap and Siemiginowska, Center for Astrophysics | Harvard & Smithsonian) are leading experts in the analysis of high energy astrophysics data. They have vast experience with instrument calibration, astronomical software analysis systems, and are deeply involved with the development of methods, algorithms, and publicly available software for Chandra data. In consultation with these experts the statistics PIs (Meng, van Dyk, Lee, and Chen) oversee the entire project, with the PI in the lead institute (Meng) as the overall coordinator and convener. Research carried out simultaneously at the four institutions in all stages of the project.
Impact Fan et al. (2023), as well as the preprint: Meyer et al. (2023+ , arXiv:2207.09327) which is under review a ApJ.
 
Title SRGonG 
Description Data from high-energy observations are usually obtained as lists of photon events. A common analysis task for such data is to identify whether diffuse emission exists, and to estimate its surface brightness, even in the presence of point sources that may be superposed. We have developed a novel nonparametric event list segmentation algorithm to divide up the field of view into distinct emission components. We use photon location data directly, without binning them into an image. We first construct a graph from the Voronoi tessellation of the observed photon locations and then grow segments using a new adaptation of seeded region growing that we call Seeded Region Growing on Graph, after which the overall method is named SRGonG. Starting with a set of seed locations, this results in an oversegmented data set, which SRGonG then coalesces using a greedy algorithm where adjacent segments are merged to minimize a model comparison statistic; we use the Bayesian Information Criterion. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Using SRGonG we are able to identify point-like and diffuse extended sources in the data with equal facility. We validate SRGonG using simulations, demonstrating that it is capable of discerning irregularly shaped low-surface-brightness emission structures as well as point-like sources with strengths comparable to that seen in typical X-ray data. We demonstrate SRGonG's use on the Chandra data of the Antennae galaxies and show that it segments the complex structures appropriately. 
URL https://iopscience.iop.org/article/10.3847/1538-3881/aca478
 
Description CHASC Seminar 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Professor van Dyk co-organises the CHASC Seminar with programme partners at UC Davis, Harvard, and the Harvard-Smithsonian Center for Astrophysics. The seminar is run monthly or fortnightly. Speakers present new state-of-the-art statistical methods of interest to astronomers and/or statistical challenges arising in astronomy with the aim of sparking new collaborations and methodological development. The seminar is held in person with video links and attacks participants from the University of California, Davis, Imperial College London, Harvard University, the Harvard Smithsonian, University of Crete, Cambridge University, Simon Fraser University, University of Toronto, and other leading academic centres.
Year(s) Of Engagement Activity 2022,2023
URL https://hea-www.harvard.edu/astrostat/CHASC_2223/
 
Description JSM 2022 Discussion 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Professor van Dyk was the invited discussant at a technical session on "Open Problems in Astrophysics" at the Joint Statistical Meetings held in Washington DC in August 2022. The JSM is the largest gathering of professional statisticians in the world and attacks researchers and practitioners working in academia, government, and industry. The session itself was well attended and sparked a lively discussion giving Professor van Dyk an opportunity to discuss ongoing research funded by this project.
Year(s) Of Engagement Activity 2022
URL https://ww2.amstat.org/meetings/jsm/2022/onlineprogram/ActivityDetails.cfm?SessionID=223148
 
Description RISE-CHASC Workshop : August 2 and 3, 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The RISE-CHASC Workshop was held at the Harvard Smithsonian Center for Astrophysics. Hosting more than 850 scientists, engineers, and support staff, the CfA is among the largest astronomical research institutes in the world. These researchers are the primary users of the methods developed under this project. The workshop provided us an excellent opportunity to disseminate recently developed methodology to end-users. Presentations with extended question-and-answer sessions and open discussion were held on a range of topics funded by this programme (e.g., Modelling populations off-ray sources, New methods for estimating cosmological parameters, Bayesian astrophysical image analysis: extended sources and boundaries, Bayesian source detection, Machine-learning based source classification, Flare detection, Non-parametric image segmentation, etc.) A range of follow-up discussions indicated interest in our work and possible/likely update our our research outputs.
Year(s) Of Engagement Activity 2022
URL https://hea-www.harvard.edu/AstroStat/CHASC_2122/workshop.html