Cambridge Astronomical Survey Unit (CASU): Filling the Astronomical Data Lake (2020-2024)

Lead Research Organisation: University of Cambridge
Department Name: Institute of Astronomy

Abstract

Observational survey astronomy is powering discovery, with the UK leading and benefiting from its investment, both financially and intellectually in state-of-the-art observational facilities. The last decade has seen the delivery of comprehensive imaging of both the northern and southern hemispheres in the near infrared using WFCAM and VISTA, with the UK leading these survey initiatives. These are providing key insights across a wide range of astrophysics. The next decade sees the arrival of large scale spectroscopic surveys to probe key populations (be they galaxies, stars, asteroids) revealed by the imaging. The UK is well poised to lead discovery, through involvement and definition of the specific surveys to be carried out on these facilities such as WEAVE, 4MOST and MOONS.

The 2020's has the potential to be the decade of "total" survey astronomy enabling profound insights into astrophysics at all scales through the combination of comprehensive imaging and spectroscopic scale data. Ensuring the scientific potential of these facilities and surveys requires the availability of expertise and systems to optimally extract information from the data. Here the UK has a substantial lead with the Cambridge Astronomical Survey Unit (CASU) and its proven ability to provide cost effective data systems to the UK and wider communities.

CASU have been at the forefront of survey astronomy, both pioneering techniques to optimally extract knowledge from survey data, and also in taking a proactive role in exploiting this information to produce world-leading research. This synergy and feedback between data processing and science delivery has been repeatedly demonstrated to be essential in ensuring delivery of the best possible science data products for exploitation by the widest community of UK and European astronomers. In the last decade,

CASU generated science data products from VISTA, WFCAM and VST imaging and Gaia-ESO VLT spectroscopy have supported world-class research programmes across almost every UK institute involved in astrophysics. CASU are filling the astronomical data lake, the vital data resource which the community are able to mine, combine with other multi-wavelength data (e.g. Euclid, PLATO) and discover rare and unique objects for further detailed study by facilities such as the ELT or the JWST.

The role of CASU has been acknowledged in the wider context, with ESO relying on CASU to provide the science data products from its wide range of public surveys currently running on the ESO survey facilities

The CASU design philosophy is to allow the evolution of an optimal ergonomic solution to this avalanche of data, through access to Petabyte scale data storage systems and expert pipeline processing systems. Continued development, maintenance and operation of the CASU processing and analysis pipelines will ensure that the UK community is well positioned to rapidly scientifically exploit the data from these key new survey facilities. In addition CASU expertise will be essential in meeting the challenges inherent in the application of Machine Learning assisted discovery in these data and provide a potential resource for the UK when developing the UK-LSST partnership.

This grant proposal builds on the tremendous advances already made by CASU and requests funding for the period 2020-2024 for the following activities:
- Design and development of the data management and analysis systems for the next generation of ESO / ING wide field massively multiplexed optical spectrographs WEAVE, MOONS and 4MOST;
- Further development, maintenance, operation and user support of the science and analysis pipelines for the UK-led VISTA large scale surveys;
- Operational support, pipeline processing and further development of the science and analysis pipelines for the UK-led ESO VST public surveys;
- Deployment of advanced data interfaces to the CASU science data, enabling machine learning assisted mining of the data.

Planned Impact

The University of Cambridge has one of the most successful programmes for encouraging knowledge transfer and resulting societal impact between University departments and industry both in the United Kingdom and elsewhere. CASU's approach to the search for impact opportunities has been guided by the mechanisms the University has in place to
facilitate this.

CASU continues to be involved in the transfer of image analysis and data handling systems to the medical domain, and in particular image processing applied to oncology. This exchange has significant potential to both increase the effectiveness of clinical health care and enhance the quality of life of those with cancer, through improved outcomes through better targeted therapeutic treatments.

The partnership of CASU staff with the University of Cambridge's Department of Oncology and Cambridge Institute, Cancer Research UK, has continued, with involvement in the Cancer Research UK Grand Challenge initiative. The IMAXT (Imaging and Molecular Annotation of Xenografts and Tumors) project is a £20M Cancer Research UK Grand Challenge project, led from the Cambridge Institute. The IoA participates in this (Walton as co-I leading the IMAXT data analysis system development), with CASU expertise being leveraged to develop and deploy the image analysis and data handling system required to generate the segmented, registered, image catalogues for the range of cutting edge imaging technologies employed in the project. This includes Serial Two Photon Tomography, Imaging Mass Cytology and MERFISH data. In combination these allow annotated maps of cancer tumours in 3-D at the sub cellular level, where the gene and protein makeup of all cells are described. Linked to large scale breast cancer trials, this is enabling a better understanding of disease and treatment pathways. The CASU expertise is in the transfer of image analysis techniques to the medical data, and the associated methods in handling and combining the multi-modal data (for instance challenges in registering the data sets at the micron level). Within IMAXT, a significant processing infrastructure has been deployed, along with an associated science platform.

Locally CASU is providing ad-hoc data expertise to the STFC Cambridge Centre for Doctoral Training (CDT) PhD programme, and this will be broadened more widely to the other STFC PhD CDT networks, especially when access and manipulation to the science data is simplified with the deployment of the CASU Science Data Access Hub as noted in Section 10.4.3.

As noted in the Pathways to Impact plan, CASU provides material to support the wider IoA Outreach programme, including the new IoA/KICC outreach programme aiming to increase STEM subject take up in schools across Cambridgeshire, Norfolk, Suffolk, and Peterborough .CASU is active in supporting the public understanding of science activities undertaken more generally at the IoA and the university of Cambridge. CASU provides a range of high quality processed images of the sky which are used as high impact
visual material in outreach activities such as the successful series of one-day conferences for schools, each day in turn targeting KS2, KS3, KS4, KS5 and secondary school teachers.

Further details are contained within the Pathways to Impact document.

Publications

10 25 50