Exploration of machine learning for pre-emptive scheduling in single-instruction multiple-data mega-kernel designs

Lead Research Organisation: Imperial College London
Department Name: Computing

Abstract

This project is about exploring the question if we can use machine learning to make parallel computing units like graphics processing units (GPUs) more efficient. GPUs have become powerful parallel processors, but using this power is often difficult. Currently, GPUs are mostly used for algorithms that are easy to parallelize. However, I believe that it is possible for more algorithms to benefit from the power of GPUs, if we offer new ways for GPU programming and more intelligent task scheduling strategies.

In the past I have been researching novel frameworks to allow the parallelisation of computational methods that are usually hard to translate for an execution of parallel hardware like GPUs. For that I used a parallel programming concept called Mega-Kernel. The essence of this concept is that a few computing units out of several thousand are responsible for scheduling computational tasks with different priorities. The Mega-Kernel concept has been demonstrated to provide a powerful extension of conventional kernel based program execution management that can deliver significant performance enhancement from single instruction multiple data (SIMD) approaches. However, to date the queue optimisation capabilities that are at the core of the approach use static rule based decision processes and in particular do not provide optimal hardware utilization or automatic intelligent preemptive scheduling. There are many real world applications for which performance could potentially be transformed by more dynamically adaptive scheduling capabilities and the SIMD architecture itself provides an opportunity to realise this. In this project we will explore statistical machine learning at the parallel hardware level to automatically predict the priorities of tasks in complex real world data analysis tasks such as the reconstruction and motion correction of n-dimensional motion corrupted medical image data. In particular, we will explore the real-time capabilities of machine learning supported preemptively scheduled reconstruction for the direct integration of motion correction into the scan process of fetal magnetic resonance data. Motion correction is the only way to provide comprehensive investigation of all fetal organs at high resolution. However, the currently used algorithmic pipeline is unidirectional, slow and consists of error-prone post-scan-processing steps. The lack of interactivity makes manual corrections or an integration into the scan process impossible at the moment. Automatically prioritised preemptive scheduling of the algorithm's tasks on commodity hardware like GPUs will likely provide a way to introduce real-time capabilities for such extremely complex computing methods.

It will be feasible during the 14 months of this project to explore if machine learning could be an option to predict the hardware utilisation of the tasks of a complex example algorithm like motion correction with potential corrective user input. If successful, the results of this project are likely to introduce a new paradigm in high-performance computing and will contribute to a paradigm-shift in medical image acquisition of moving objects.

Planned Impact

This project focuses on developing new software technologies for high-performance computing (HPC) with applications to medical image analysis.
The most immediate impact is expected through uptake of the developed fundamental methods by the HPC community and by researchers in medical image analysis. The PI will advance this uptake through direct collaboration with experts in HPC (Imperial College, MPI Saarbrücken) and medical image acquisition (King's College London) and is currently exploring the potential for collaboration with other research groups (academic visits for example to Vienna University of Technology, Austria, the HPC facility in Nottingham, UK and the University Hospital Leipzig, Germany).

Dynamic scheduling for GPUs provides a great number of possibilities. Exploring the integration of machine learning into these techniques can pave the way for new parallel programming models, following the idea that algorithms can be written more easily by defining work entities and their relations rather than execution steps. With the possibility to control execution states at runtime, we can provide an intelligent task manager for GPUs. We will also explore the application areas that can be tackled with advanced GPU scheduling strategies, including motion correction in n-dimensional magnetic resonance imaging (MRI), which includes research on computational intelligence and real-time scheduling for time critical processes.

At the same time we will work on the remaining features that hamper the straight forward mapping of complex algorithms for GPU execution. We will enable learned task priorities that completely hide task prioritisation from the programmer, allowing an arbitrary small granularity of priority levels. With the combination of all these efforts, we will make GPUs a more accessible device for the development of complex parallel algorithms.

The chosen example application will contribute significantly towards the development of better and more objective imaging biomarkers in medical imaging. Such markers are crucial for early or differential diagnosis. In addition they allow clinicians to reach informed decisions and viewpoints on potential treatments and therapies. Therefore, clinicians in medical imaging and fetal medicine will be also direct beneficiaries of this research.

Our novel fast and interactive motion correction methods will likely lead to novel work flows for screening, diagnostics and image guided intervention in the prenatal clinical practice. Together with our clinical partners we will work out new work flows that base on the unleashed power of intelligently scheduled parallel algorithms. In the long run our methods are generalizable and can be transferred to other hard to parallelise algorithms that require interactive prioritised input.

It is also very likely that the medical imaging technology industry and GPU technology industry will be highly interested in the proposed research. I will therefore keep my existing connections to these industries alive and discuss research results with them and potential new contacts permanently during the project. IP arising from this will be protected for commercialization (e.g. via patents) and exploited together with Imperial Innovations (which coordinates the activities of technology transfer, company incubation and investment for Imperial College London).

This project will also have an indirect impact on the next generation of computer scientists. Students at Imperial will indirectly benefit from the project by being close to most recent state of the art in HPC and parallel algorithm development. The PI is a dedicated lecturer and is keen to pass on his knowledge about these techniques to the next generation. Expertise about GPU programming and HPC is an important feature of this project. Parallel algorithm development requires special training and most Computer Science curricula take this into account by now.

Publications

10 25 50
 
Description We have tackled GPU acceleration issues for complex motion compensation and semantic image understanding problems in medical imaging applciations.

Methods developed during this project also solved the limited capture range and the requirement to provide high quality initializations for optimization-based 2D/3D image registration methods can significantly degrade the per- formance of 3D image reconstruction and motion compensation pipelines. Challenging clinical imaging scenarios, that contain sig- nificant subject motion such as fetal in-utero imaging, complicate the 3D image and volume reconstruction process. In this paper we present a learning based image registra- tion method capable of predicting 3D rigid transformations of arbitrarily oriented 2D image slices, with respect to a learned canonical atlas co-ordinate system. Only image slice intensity information is used to perform registration and canonical align- ment, no spatial transform initialization is required. To find image transformations we utilize a Convolutional Neural Network (CNN) architecture to learn the regression function capable of mapping 2D image slices to the 3D canonical atlas space. We extensively evaluate the effectiveness of our approach quantitatively on simulated Magnetic Resonance Imaging (MRI), fetal brain imagery with synthetic motion and further demon- strate qualitative results on real fetal MRI data where our method is integrated into a full reconstruction and motion compensation pipeline. Our learning based registration achieves an average spatial prediction error of 7 mm on simulated data and produces qualitatively improved reconstructions for heavily moving fetuses with gestational ages of approximately 20 weeks. Our model provides a general and computationally efficient solution to the 2D-3D registration initialization problem and is suitable for real- time scenarios.

We also solved a fundamental problem in intensity-based 2D/3D registration, which concerns the limited capture range and need for very good initialization of state-of-the-art image registration methods. We propose a regression approach that learns to predict rotation and translations of arbitrary 2D image slices from 3D volumes, with respect to a learned canonical atlas co-ordinate system. To this end, we utilize Convolutional Neural Networks (CNNs) to learn the highly complex regression function that maps 2D image slices into their correct position and orientation in 3D space. Our approach is attractive in challenging imaging scenarios, where significant subject motion complicates reconstruction performance of 3D volumes from 2D slice data. We extensively evaluate the effectiveness of our approach quantitatively on simulated MRI brain data with extreme random motion. We further demonstrate qualitative results on fetal MRI where our method is integrated into a full reconstruction and motion compensation pipeline. With our CNN regression approach we obtain an average prediction error of 7mm on simulated data, and convincing reconstruction quality of images of very young fetuses where previous methods fail. We further discuss applications to Computed Tomography and X-ray projections. Our approach is a general solution to the 2D/3D initialization problem. It is computationally efficient, with prediction times per slice of a few milliseconds, making it suitable for real-time scenarios.

Furthermore our GPU acceleration techniques lead to improved Deep learning approaches. For example, convolutional neural nets have consistently outperformed previous methods on challenging tasks such as dense, semantic segmentation. However, the various proposed networks perform differently, with behaviour largely influenced by architectural choices and training settings. This paper explores Ensembles of Multiple Models and Architectures (EMMA) for robust performance through aggregation of predictions from a wide range of methods. The approach reduces the influence of the meta-parameters of individual models and the risk of overfitting the configuration to a particular database. EMMA can be seen as an unbiased, generic deep learning model which is shown to yield excellent performance, winning the first position in the BRATS 2017 competition among 50+ participating teams.
Exploitation Route We have provided finished software as open source on https://github.com/. These can be used as basis to develop further and more specialised software solutions.
Sectors Healthcare

URL http://bernhard-kainz.com/
 
Description TU-Vienna 
Organisation Vienna University of Technology
Country Austria 
Sector Academic/University 
PI Contribution We worked together on the publication 'Placenta maps: in utero placental health assessment of the human fetus'
Collaborator Contribution Visualisation UI implementation
Impact Placenta maps: in utero placental health assessment of the human fetus
Start Year 2017
 
Title PVR software 
Description This tool implements a novel method for the correction of motion artifacts as acquired in fetal Magnetic Resonance Imaging (MRI) scans of the whole uterus. Contrary to current slice-to-volume registration (SVR) methods, requiring an inflexible enclosure of a single investigated organ, the proposed patch-to-volume reconstruction (PVR) approach is able to reconstruct a large field of view of non-rigidly deforming structures. It relaxes rigid motion assumptions by introducing a defined amount of redundant information that is addressed with parallelized patch-wise optimization and automatic outlier rejection. We further describe and provide an efficient parallel implementation of PVR allowing its execution within reasonable time on commercially available graphics processing units (GPU), enabling its use in the clinical practice. We evaluate PVR's computational overhead compared to standard methods and observe improved reconstruction accuracy in presence of affine motion artifacts of approximately 30% compared to conventional SVR in synthetic experiments. Furthermore, we have verified our method qualitatively and quantitatively on real fetal MRI data subject to maternal breathing and sudden fetal movements. We evaluate peak-signal-to-noise ratio (PSNR), structural similarity index (SSIM), and cross correlation (CC) with respect to the originally acquired data and provide a method for visual inspection of reconstruction uncertainty. With these experiments we demonstrate successful application of PVR motion compensation to the whole uterus, the human fetus, and the human placenta. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact globally used for motion correction for prenatal MRI 
URL https://github.com/bkainz/fetalReconstruction
 
Description Imaging with Modulated/Incomplete Data 2016 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 50 mathematicians and computer scientists attended this workshop. I was invaded to give on of the keynote talks there.
Year(s) Of Engagement Activity 2016
URL http://imsc.uni-graz.at/mobis/imaging16/index.html