Deep Learning for Astronomically Big Data

Lead Research Organisation: University of Manchester
Department Name: Physics and Astronomy

Abstract

The soft-real time constraints on SKA image formation mean that the science data processor (SDP) responsible for creating image data products will require a processing power of ~0.5ExaFlops, producing approximately 1PB of data products per day. For astronomers, the data products will then be shipped around the world over a fibre network to the SKA regional data centres.

Imaging for radio interferometers relies on a processing model that inverts datasets from their native Fourier measurement basis to form an image. For the SKA these individual image cubes are very large (0.25PB on average) and will each contain tens to hundreds of thousands of different astronomical sources. For scientific exploitation, it will be necessary to automatically identify the objects in these data and classify them. Machine learning approaches for such an operation have started to be considered in astrophysics across a range of fields. For classification of previously identified objects with a range of measured and catalogued features, random forest classification is very popular; however in radio astronomy the use of convolutional neural networks has started to emerge as a potential mechanism for classifying objects directly within the image data in parallel with identification. Applications of this sort are still in their infancy in astrophysics and a consideration of how these methods will be applied to datasets with volumes equivalent those from the SKA is unclear. Scaling machine learning approaches to deal with SKA size image cubes will be a key big data challenge for SKA regional centres around the world.

In addition, the regional centres have a further potential advantage for advanced image analysis. Not bound by the real-time processing constraints of the SKA SDP, machine learning approaches that incorporate image formation within their processing model could provide significantly enhanced outputs. Fourier image formation by its nature enforces characteristics in the output image due to (e.g.) applied Fourier component weighting that will bias classification based on output image products alone. Incorporating the Fourier data directly into a deep learning approach would provide additional information that could improve classification.

For this project, data from the LOFAR telescope will be used as the closest available analogue to that from the SKA. The project will use deep data from the GOODS-N field obtained as part of the LOFAR Magnetism Key Science Project (MKSP). These data are a deep survey field and will provide a large volume of data on the same region of sky.

Publications

10 25 50