Human-Machine Classification for Astrophysical Projects

Lead Research Organisation: University of Oxford
Department Name: Oxford Physics

Abstract

A major challenge for twenty-first century science is in learning to deal with the large datasets which are rapidly produced by a wide variety of modern surveys and experiments. This is especially true for astrophysicists, who not only deal with large and diverse datasets but who often have to make rapid decisions about which objects to target with future observations. Two separate sets of solution have been proposed. The first relies on advances in computer vision and machine learning to automate the process of astronomical classification, but much of the recent progress in these fields relies on the availability of large training sets of already classified data, something that is difficult to supply in many cases. In many cases, the datasets are so large that even very accurate computer classification will leave an enormous dataset to be sifted through.

The other solution has been through citizen science, collaboration with volunteers in order to work through large datasets. This has been enormously successful for a wide variety of astronomical problems, from the discovery of extra-solar planets to galaxy classification and space weather forecasting. However, the size of datasets expected from new astronomical surveys will overwhelm even the largest and most enthusiastic volunteer base.

This proposal aims to demonstrate a flexible system which can combine these two methods. It builds on the successful Zooniverse platform, which has been responsible for many of the most successful citizen science data analysis projects. We will :

1. Produce more efficient citizen science projects by being smart about assigning tasks to particular volunteers. At present, for example, images which are to be classified are shown randomly to volunteers instead of allowing experts to review more difficult cases. Our preliminary research shows this may increase our efficiency by a factor of ten.
2. Include machine and human classifiers together in the same project. As volunteers work their way through a dataset, so machines can learn from them. This allows an increasing proportion of the dataset to be automatically processed, reducing the burden on the volunteers.
3. Combine both of these new facilities allowing us to sort through data in a new way. We will establish a hierarchy of classification tasks, so that volunteers look for the most common types of object first, before moving on to rarer objects. This will allow a cycle of human and machine classification to rapidly search through large and diverse datasets, and critically will allow us to search for categories of interest that develop during the classification process.

For this demonstration project, we will run an example which makes use of all of these features in a real astronomical survey. This will allow us to demonstrate and measure the efficiencies achieved by these improvements, as well as producing valuable science in its own right. As the Zooniverse platform already supports projects across many disciplines, these tools will be made available for use by a wide range of scientists and researchers, accelerating their progress and making the time invested by more than 1.3 million volunteers more useful.

Planned Impact

Citizen science through the Zooniverse has already proven to be a transformational way of engaging the public in science. Since the projects' beginnings in 2007, more than a million people have used our platform to make a real contribution to science. Participants have explored galaxies, worked with ecologists in Antartica and the Serengeti, and uncovered hidden texts in historical archives. By engaging with the researchers who are leading the projects they are participating in, these volunteers gain a real sense of ownership over the research process; our studies show that Zooniverse volunteers are overwhelmingly more likely to engage with scientific content after encountering our projects. This is especially important because research shows that, having begun, volunteers are equally likely to go on to substantial participation whatever their initial educational level; rather than finding an audience already excited about science, the Zooniverse is creating a new crowd of hungry participants in research.

Nor is this impact limited to the participants themselves. Our projects have featured in museums around the world, and play a regular starring role in the BBC's Stargazing Live series, which reaches an audience of millions. The projects are heavily engaged on social media, with one - Planet Hunters - being amongst the most visited science pages on Facebook. Our volunteers enjoy communicating with their friends and colleagues about their scientific adventures, making them powerful advocates for the scientific process.

We have recently redeveloped our core platform to make it easier for people to build projects, and are already seeing the adoption of citizen science by new audiences. Partnerships with Cancer Research UK - who used our platform to build science-filled games, and with the Natural History Museum demonstrate the uses to which our software can be put. Collaboration with Microsoft Research, and with researchers at Google, inform our understanding of how participants behave in these projects, and how we can do better. Companies like Imperative Space are adapting Zooniverse's astronomy projects for use in the classroom, giving schools a taste of cutting-edge science, and our platform also supports school-led experiments at CERN.

The platform can also be used for more than research. A recent partnership with an NGO, Rescue Global, and the Earth observation company Planet Labs allowed rescuers to quickly generate new maps of settlements in Nepal following the tragic earthquake there. The work contained in this proposal - which aims at efficient, rapid classification - will be key in enabling us to expand this disaster relief work for future crises.

Zooniverse is a project whose primary goal is to aid science. However, uniquely, at its core is the need to engage a very large company of volunteers, and a methodology which allows for long-lasting and effective transformation in attitudes to science. It is effective science, and highly exciting public engagement.

Publications

10 25 50
publication icon
Kostov V (2020) TOI-1338: TESS' First Transiting Circumbinary Planet in The Astronomical Journal

publication icon
Boyajian T (2018) The First Post- Kepler Brightness Dips of KIC 8462852 in The Astrophysical Journal

publication icon
Smethurst R (2018) SDSS-IV MaNGA: the different quenching histories of fast and slow rotators in Monthly Notices of the Royal Astronomical Society

publication icon
Ralph N (2019) Radio Galaxy Zoo: Unsupervised Clustering of Convolutionally Auto-encoded Radio-astronomical Images in Publications of the Astronomical Society of the Pacific

publication icon
Kapinska A (2017) Radio Galaxy Zoo: A Search for Hybrid Morphology Radio Galaxies in The Astronomical Journal

publication icon
Boyajian T (2016) Planet Hunters IX. KIC 8462852 - where's the flux? in Monthly Notices of the Royal Astronomical Society

publication icon
Fortson Lucy (2018) Optimizing the Human-Machine Partnership with Zooniverse in arXiv e-prints

publication icon
Mahabal A (2019) Machine Learning for the Zwicky Transient Facility in Publications of the Astronomical Society of the Pacific

publication icon
Robertson Brant E. (2017) Large Synoptic Survey Telescope Galaxies Science Roadmap in arXiv e-prints

publication icon
Beck Melanie R. (2018) Integrating human and machine intelligence in galaxy morphology classification tasks in Monthly Notices of the Royal Astronomical Society

publication icon
Beck M (2018) Integrating human and machine intelligence in galaxy morphology classification tasks in Monthly Notices of the Royal Astronomical Society

publication icon
Walmsley M (2019) Identification of low surface brightness tidal features in galaxies using convolutional neural networks in Monthly Notices of the Royal Astronomical Society

publication icon
Smethurst R (2017) Galaxy Zoo: the interplay of quenching mechanisms in the group environment? in Monthly Notices of the Royal Astronomical Society

publication icon
Hart R (2017) Galaxy Zoo: star formation versus spiral arm number in Monthly Notices of the Royal Astronomical Society

publication icon
Kruk S (2018) Galaxy Zoo: secular evolution of barred galaxies from structural decomposition of multiband images in Monthly Notices of the Royal Astronomical Society

publication icon
Simmons B (2017) Galaxy Zoo: quantitative visual morphological classifications for 48 000 galaxies from CANDELS in Monthly Notices of the Royal Astronomical Society

publication icon
Walmsley M (2020) Galaxy Zoo: probabilistic morphology through Bayesian CNNs and active learning in Monthly Notices of the Royal Astronomical Society

publication icon
Willett K (2017) Galaxy Zoo: morphological classifications for 120 000 galaxies in HST legacy imaging in Monthly Notices of the Royal Astronomical Society

publication icon
Kruk S (2017) Galaxy Zoo: finding offset discs and bars in SDSS galaxies? in Monthly Notices of the Royal Astronomical Society

publication icon
Walmsley M (2023) Galaxy Zoo DESI: Detailed morphology measurements for 8.7M galaxies in the DESI Legacy Imaging Surveys in Monthly Notices of the Royal Astronomical Society

publication icon
Wright D (2017) A transient search using combined human and machine classifications in Monthly Notices of the Royal Astronomical Society

 
Description We were able to build a deploy a sophisticated system for incorporating human and machine classifications in the same system; this means faster and more reliable classifications of a variety of astronomical objects are now possible. In particular, our system for detecting transient objects such as supernovae now includes both a trained neural network and human classifications via a Zooniverse project; the combination is more accurate than either alone. More than forty Zooniverse projects are using the infrastructure which was provided as part of this grant.
Exploitation Route Our software is open source and we encourage researchers to make use of the Zooniverse platform however possible. 1715 Labs, our spin out company, is making use of this code.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Education,Environment,Healthcare,Culture, Heritage, Museums and Collections,Pharmaceuticals and Medical Biotechnology,Retail

 
Description There are a variety of impacts from the improvements to the Zooniverse platform supported by this award. Firstly, increased participation and range of projects offered means that greater numbers of people are participating, which we know from prior research leads to greater likelihood of participating in other research related activities. Secondly, we have spun out the for-profit company 1715 Labs, which is commercialising our software. Finally, we have had an indirect impact on policy on Antarctic fishing through our work with the Penguin Watch team.
First Year Of Impact 2018
Sector Other
Impact Types Cultural,Societal,Economic,Policy & public services

 
Description Crowdsourcing and Machine Learning for Disaster Relief and Resilience
Amount £212,000 (GBP)
Funding ID ST/S00307X/1 
Organisation Science and Technologies Facilities Council (STFC) 
Sector Public
Country United Kingdom
Start 04/2019 
End 03/2020
 
Description Searching for Serendipity with Hybrid Machine Learning and Citizen Science
Amount $380,000 (USD)
Organisation Alfred P. Sloan Foundation 
Sector Charity/Non Profit
Country United States
Start 08/2021 
End 07/2023
 
Title Galaxy Zoo DECALS 
Description This is an algorithmic classification of 300000 galaxies. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact N/A 
URL http://data.galaxyzoo.org
 
Description Google 
Organisation Google
Country United States 
Sector Private 
PI Contribution We are deploying aspects of the Zooniverse platform on Google cloud, developing ways to use TensorFlow and other tools for machine learning.
Collaborator Contribution Providing funding to support the use of Google Cloud tools, including our infrastructure.
Impact Open source code available via the Zooniverse repo.
Start Year 2018
 
Description Microsoft Research 
Organisation Microsoft Research
Country Global 
Sector Private 
PI Contribution We are working with Microsoft Research to develop models which encourage volunteers to remain active on the Zooniverse platform. Through this collaboration, we are building tools which allow for interventions such as messaging when volunteers appear to be flagging.
Collaborator Contribution The MSR team are developing the statistical models which predict user behaviour, and which can be used to direct interventions on the platform.
Impact Code available via the Zooniverse open source repository.
Start Year 2018
 
Description Partnership with Crick Institute 
Organisation Francis Crick Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution We are working on a combined human/machine classification scheme for generic high resolution microscopy data which will make use of ConSciCom's understanding of user communities around medical and clinical communities.
Collaborator Contribution Crick are providing data and expertise on machine learning for a suite of such projects.
Impact Projects are currently in beta.
Start Year 2016
 
Title NERO 
Description NERO is the software built on top of the Zooniverse API which provides task assignment and allocation for projects. It is released under an open source license. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Impact The tool has already made possible our partnership with Stargazing Live 
URL https://github.com/zooniverse/nero
 
Company Name 1715 LABS LIMITED 
Description 1715 Labs seeks to make commercial use of the Zooniverse platform, bringing the techniques for crowd management and crowdsourcing we have developed into the commercial realm. 
Year Established 2018 
Impact N/A
Website https://www.1715labs.com/
 
Description Adler Kavli Talks: Zooniverse/LSST 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Developed, wrote and presented script for planetarium show, which was built by the Adler Planetarium to fit script, on Zooniverse and the promise of LSST. Live to audience of 500 in two showings in Chicago, broadcast live to 35 other planetaria. Visualizations produced available to be used in other planetaria.
Year(s) Of Engagement Activity 2019
 
Description Agile Rabbit 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk and discussion on serendipity to art/tech group in Exeter
Year(s) Of Engagement Activity 2021
 
Description Ashford Astronomical Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk on serendipity.
Year(s) Of Engagement Activity 2021
 
Description Cafe Scientifique Didcot 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk to Didcot Cafe Scientifique online
Year(s) Of Engagement Activity 2021
 
Description Chipping Norton Astronomical Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk to society
Year(s) Of Engagement Activity 2021
 
Description Chris Lintott Talk at Bluedot Festival 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk about history of citizen science at large festival.
Year(s) Of Engagement Activity 2018
 
Description Chris Lintott Talk at Latitude Festival 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Talk given as part of Latitude Festival - headlining smaller stage.
Year(s) Of Engagement Activity 2018
 
Description Crowd and Cosmos Talk (Oxford Library) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Talk to library users about research; reached an audience who would normally not attend talks in Oxford
Year(s) Of Engagement Activity 2020
 
Description Dark Skies Festival Marlborough 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk on modern astrophysics.
Year(s) Of Engagement Activity 2021
 
Description Dark Sky Wales 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk on serendipity
Year(s) Of Engagement Activity 2021
 
Description Norwich Science Festival 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Discussion at science festival, released as podcast.
Year(s) Of Engagement Activity 2021
 
Description Open House Leeds, hosted by tech/design company. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Industry/Business
Results and Impact Talk to industry/design meet up.
Year(s) Of Engagement Activity 2020
 
Description Public Talk (Orpington) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk at Orpington Astronomical Society
Year(s) Of Engagement Activity 2019
 
Description School Public Talk (Canterbury) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Talk hosted by school to pupils and their guests.
Year(s) Of Engagement Activity 2019
 
Description Talk (Oxford: IF Festival) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Launch for the Crowd and Cosmos book.
Year(s) Of Engagement Activity 2019
 
Description Talk (Thomas Hardye School) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Talk for school community.
Year(s) Of Engagement Activity 2020
 
Description Talk at Google 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Talk at Google.
Year(s) Of Engagement Activity 2019
 
Description Talk: Hammersmith Apollo 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Talk as part of event at Hammersmith Apollo on discovering the unexpected.
Year(s) Of Engagement Activity 2016
 
Description Twitter 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Lintott's twitter feed covers items of interest in citizen science, and now has more than 25000 followers
Year(s) Of Engagement Activity 2017
URL http://twitter.com/chrislintott
 
Description World Space Week 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Talk on modern astrophysics
Year(s) Of Engagement Activity 2021