Leading the Next Generation of Data-Driven Discoveries

Lead Research Organisation: Lancaster University
Department Name: Physics

Abstract

Extracting accurate information from the modern data flood without sacrificing the ability to discover the unexpected is a core data challenge in many fields. A cross-disciplinary approach presents major opportunities to advance the state of the art in each. This project focuses on bringing together the state of the art in two fields: astrophysics, and humanitarian Earth observation.

In astrophysics, the proliferation of data that will be available from upcoming large sky surveys presents significant challenges and opportunities. Accurate classification of this data (for example, whether a galaxy contains a buried, feeding supermassive black hole, or whether a new point of light in a galaxy is a supernova, and what kind) will enable major advancement in our understanding of how the Universe has evolved and will continue to evolve, and how galaxies such as the Milky Way form and grow within it.

Following a natural disaster, responders need to know where roads are blocked, where buildings are damaged, and where survivors are sheltering. When satellites look down at the Earth instead of up at the heavens, they capture data that has some similar qualities but many complementary differences to astrophysical images. The driving need, however, is fundamentally the same. For instance, detecting the important changes before and after a storm or earthquake is similar to identifying the type of exploding star newly observed in a distant galaxy. In this example, astrophysical and humanitarian analysis have lessons to teach each other: Astrophysics has advanced methods of adjusting its algorithms to compensate for differences in observing conditions such as atmospheric turbulence, while humanitarian "Earth Observation" algorithms have tested promising methods of identifying critical changes based on only one before-and-after data point. There are many other examples of the potential for symbiosis, and this project will capitalise on this by working on both simultaneously in a cross-pollinating environment.

Generally speaking, this research project will develop new tools for efficient, accurate data classification and labelling in these new data regimes where images and other data points arrive rapidly and are of varying quality and origin. Both fields make use of machine learning algorithms and combine machine classification with expert and high-quality crowd data labels. The project will test new techniques developed in each field on the other, using tool-specific expertise and combined domain knowledge to make new discoveries. By extracting insights from each regime and advancing their most effective tools, this work will enable us to understand the changing skies and the changing Earth in a way that provides real benefit (e.g. increased resilience, decreased recovery time, saving of lives) to distressed populations, maximising impact both near and far.

The specific scientific topics this project will address cover some of the most pressing humanitarian needs across the globe and some of the most fundamental open questions about the Universe.

Planned Impact

This project employs a cross-disciplinary approach to advance the state of the art in Astrophysics and humanitarian Earth Observation. As such, it has the potential to impact multiple groups. These include:

- academics in many disciplines, from astronomy to zoology;
- members of the public, both those directly affected by humanitarian crises and those who join in the crowdsourcing aid efforts (these include members of UK society and those living in developing nations);
- stakeholders from industry seeking to use the most advanced algorithms in commercial settings, particularly in Earth Observation; and
- policy- and decision-makers seeking to improve the effectiveness of humanitarian responses and resilience to future disasters.

Owing to the nature of this research and its potential to provide substantial innovations in a number of areas and applications, broad and deep impact is expected. Examples of potential impact (many based on past impacts of precursor research to this project) include:

- substantial and quantifiable improvements in response times for large-scale relief efforts following a major natural disaster such as an earthquake, hurricane, or flood;
- more rapid economic recovery to disaster-stricken regions as a result of better decision-making due to the improved situational awareness provided by disaster maps the project generates;
- improved resilience to future disasters in at-risk areas across the globe as a result of highly accurate and statistically robust predictive maps generated by the project;
- large cumulative economic savings as a result of new local- and regional-level policies following detailed assessment of humanitarian deployment impacts and lessons learned;
- economic value provided by application of software and techniques developed by the project to Earth Observation commercial projects in agriculture, infrastructure logistics and anti-poaching;
- improved conservation responses as a result of advances in change-detection algorithms applied to wider ecological disciplines;
- increased learning and societal awareness of research topics and issues as a result of participation in the citizen science projects deployed during this project; and
- more accurate diagnostic tools for biomedical imaging resulting from application of software and techniques developed during this project to other research.

The software developed during this project will be released under an open-source license, which maximises the long-term impact of this project.

Note: the selected countries in the Beneficiary Countries section of this proposal are those which have directly been impacted by previous deployments of the humanitarian response project led by the PI and which will also be involved in this project. The potential list of impacted developing nations is much larger, but the actual future list will depend on where and when there are natural disasters and humanitarian crises over the full term of this project.

Publications

10 25 50

publication icon
Bartlett O (2023) Noise reduction in single-shot images using an auto-encoder in Monthly Notices of the Royal Astronomical Society

publication icon
Brivio R (2022) GRB 080928 afterglow imaging and spectro-polarimetry in Astronomy & Astrophysics

publication icon
Garland I (2023) The most luminous, merger-free AGNs show only marginal correlation with bar presence in Monthly Notices of the Royal Astronomical Society

publication icon
Glikman E (2023) A Candidate Dual QSO at Cosmic Noon in The Astrophysical Journal Letters

publication icon
Géron T (2021) Galaxy zoo: stronger bars facilitate quenching in star-forming galaxies in Monthly Notices of the Royal Astronomical Society

publication icon
Géron T (2023) Galaxy Zoo: kinematics of strongly and weakly barred galaxies in Monthly Notices of the Royal Astronomical Society

 
Description Training machine learning algorithms to predict which buildings are damaged, and how much, often relies on very precise data being available: but immediately following a disaster, such information is often not yet available, despite the machine predictions being urgently needed. One outcome of this project is that it is still possible to train a machine to make reasonable predictions even if the available information is imperfect (e.g. only knowing the position of a building with a damage label, not its exact outline on a satellite image). The algorithms are less accurate when the available training information is less detailed, but the resultant predictions can still be accurate enough for initial assessments by responders on the ground.

There is strong interest within the Zooniverse's online volunteer community in contributing time and effort to online disaster mapping projects. Some of that interest is generated locally to a particular disaster, but the majority of the Zooniverse crowd contributions come from volunteers across the globe. Their contributions are highly accurate and useful. Designing a crowdsourcing project to focus that interest into useful outputs for responders on the ground takes advance planning, and there are often tradeoffs between different stakeholders (e.g. the responders who need quick information that is "good enough" versus the computer scientists who would prefer highly detailed crowd contributions which require considerable time to input). Data providers, e.g. satellite imagery providers, can best assist distributed online disaster mapping efforts by making their data open via an accessible, easy-to-use interface with an open API.
Exploitation Route This project is ongoing, and the outcomes so far might be useful to those wishing to train agile algorithms to make rapid predictions relevant to humanitarian aid organisations based on incomplete and/or imperfect labels.
Sectors Aerospace, Defence and Marine,Environment,Security and Diplomacy,Other