Health and Bioscience IDEAS - Imaging, Data structures, gEnetics and Analytical Strategies

Lead Research Organisation: University College London
Department Name: Institute of Neurology

Abstract

We will develop a training program for researchers at universities, hospitals, and businesses that helps them better use data from medical images in their research. These images are generated from many different machines, such as microscopes, magnetic resonance imaging (MRI) scanners, and computer tomography (CT) scanners. These technologies not only can produce maps of the human body's structure (from single cells to large organs), but they can also tell you how the body is functioning, such as how blood is being delivered to different parts of the body or where abnormal proteins that could cause Alzheimer's disease are present. These images have played a major part in many recent scientific discoveries across a range of medical fields. As a result, imaging data is being used more often in research and the amount of imaging data available for researchers to work with and ask new scientific questions is also growing rapidly. This means that researchers from a diverse set of backgrounds now have the opportunity to work with imaging data and bring their own expertise and perspective to answering new questions about various diseases. However, this imaging data is often complex and difficult to work with, specialist training is required to understand how to handle, process, and obtain accurate and reliable information from these images. This training is often not a part of most scientists' standard curriculum, so it can be an entry barrier for many scientists in terms of getting started with research involving imaging.

We are proposing a series of short courses that will cover many different aspects of working with medical imaging data. Most importantly, as security of the sensitive personal data contained in imaging becomes a greater concern, we will focus on teaching attendees about best practices in terms of data management and protection. Some of these courses will be given as intensive face-to-face workshops, while others will be online courses where the users will set their own pace. The courses will cover a wide range of backgrounds, from newcomers who are just getting started working with imaging data to those who wish to do more advanced, complex analysis. We will also develop some courses that will allow a smaller set of individuals be able to train staff at their own institutions about how to work with this imaging data. We have experts in all of these aspects who will develop the lectures and exercise for the courses, and we will hire a course leader who will combine this content into a cohesive set of courses about medical imaging. Feedback will be sought from those attending the courses, as to assess their effectiveness, tailor them to the needs of the community, and improve them. By the end of the two years, we will make this project self-sufficient after the 2 years; the resulting courses will continue to be delivered to new researchers, with a charge that is both fair and good value to attendees throughout the UK, while also being sustainable financially from the university's perspective. During the project we will create and nurture a network that starts with fostering interaction between attendees during and after the course, creating a community of these "imaging literate" researchers throughout the UK. We hope this will lead to a range of collaborations and critical-mass for exiting and large projects.

Technical Summary

Imaging and genetic analysis have dramatically transformed medicine and bioscience in a short period of time. This transformation has been so fast that a substantial knowledge gap has developed between most scientists/clinicians and the computational experts. While open-source methods for data structuring and analysis can provide an equilibrating force, the expertise to apply these responsibly and effectively is lacking. Researchers must also balance open science mandates with data protection law, creating complex challenges surrounding collecting, managing, analysing and sharing imaging data, particularly when integrating with genetic data. Our consortium will produce a series of face-to-face and virtual workshops, as well as self-directed online courses and tools to close the knowledge gap for clinical and imaging science researchers in the UK and beyond. This content will cover handling imaging and genetic data, data protection, data sharing, and code sharing; with hands-on activities (such as hackathons) to help attendees develop skills while addressing their research. We will generate a core curriculum that will subsequently be tailored to fit the skill levels and research domain of the attendees. We will target this training at researchers who are based at universities, hospitals, and the burgeoning UK biotech industry. A large majority of these researchers will be clinicians, basic scientists, epidemiologists, and medical physicists, among others. At the completion of this grant, we will have established a course that will produce good results for the researchers and thus good feedback and reputation, and we will continue to offer this content at a price that represents good value for potential attendees while maintaining financial stability and support from the university.

Publications

10 25 50
 
Title Improvements to software packages to allow online demos on the cloud. 
Description We modified two repositories of widely used disease progression modelling packages in Python to allow users from around the world to learn about these techniques interactively without needing to install them on their computers. This allows users who are just getting started with these techniques to learn the concepts without the distraction of the technical setup of their own development environment. We applied these to two techniques: the Event Based Model (https://github.com/HealthBioscienceIDEAS/kde_ebm) and Subtpye And Stage Inference (SuStaIN - https://github.com/HealthBioscienceIDEAS/pySuStaIn). 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact These online demos use Binder (http://mybinder.org) to launch a virtual computer on the cloud with all necessary software installed and web access to the online notebook that explains how to run the code so that all users need to learn these methods is a decent internet connection. The latter adaptation was used as part of a workshop ran by the creators of SuStaIn to demonstrate how to use the package with the video from the event online found at https://youtu.be/xzs4TL9_kZw 
URL https://github.com/HealthBioscienceIDEAS/pySuStaIn
 
Title Interctive demo of data harmonisation 
Description For a workshop hosted by the DEMON network, we created an interactive hands-on demo that attendees could use themselves without the need to install any software on their machine. This was done using Binder (https://mybinder.org), which creates a virtual machine on the cloud based on a github repository, such that all the required software installed on the virtual machine and users can access a teaching notebook in Python or R that goes through the concepts. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? Yes  
Impact Users and attendees have the ability to understand how to use these data harmonisation strategies before grappling with various challenges that might come from installing it on their own machines. 
URL https://github.com/HealthBioscienceIDEAS/demon-imaging-harmonisation
 
Title Online demonstration of reproducible medical imaging pipelines 
Description As part of the UCL Medical Imaging Computing Summer School, we created a demo to illustrate the power of modern imaging data management platforms with reproducible medical imaging pipelines. We created a project that involved individuals storing imaging data into a web-based image data management platform called XNAT. We created artificial identifying information for each of the medical image sets and helped people understand the importance of de-identifying the underlying metadata in the image's data structure. We then created Docker images that ensured reproducible imaging analysis and showed individuals how to run them all through the XNAT, providing a much simpler interface for users to process data than the command line that is commonly required. Finally, we asked individuals to do an extra de-identification step, mainly reducing the facial features of an individual, and asked them to investigate whether the results from the same imaging analysis were changed. The documentation for this project is available as our the containers, but we were not able to release the entire repository because at the time it relied on UCL specific infrastructure. We are in the process of refining the repository so that in the future, individuals will be able to try it on their own infrastructure 
Type Of Material Data handling & control 
Year Produced 2021 
Provided To Others? No  
Impact Feedback we got from the attendees indicated that they had a better understanding of why imaging data management systems would be important for their research and a better understanding of the compute resources that are required for some analyses. 
URL https://healthbioscienceideas.github.io/MedICSS-Project-Repro-Pipelines/
 
Description Carpentries' sMRI lesson material collaboration 
Organisation Centre for Addiction and Mental Health (CAMH)
Country Canada 
Sector Hospitals 
PI Contribution Up to now, this has been only a point of contact and making plans how our efforts could be combined to deliver more courses around neuroimaging. They have already two courses at a usable level, though still in beta: - Introduction to dMRI (https://carpentries-incubator.github.io/SDC-BIDS-dMRI/), and - Structural MRI (Pre)processing and Neuroimaging Analysis (https://carpentries-incubator.github.io/SDC-BIDS-sMRI/)
Collaborator Contribution Provided ideas about making the notes more sustainable and easier to maintain.
Impact Obtained first-hand experiences of a similar training material following the Carpentries style.
Start Year 2021
 
Description Partnership around XNAT workshop organisation. 
Organisation Institute of Cancer Research UK
Country United Kingdom 
Sector Academic/University 
PI Contribution Three of our team members were part of the organising committee for an international workshop for users and developers of an image data management platform called XNAT. We met weekly for a six month buildup before the workshop took place to prepare the workshop agenda. Our group identified and approached speakers to participate in the workshop, and we were responsible for the online platform that was used to host a majority of the workshop's content. We assisted on the administration of the event on the day, as well as charging a session and providing a poster for the virtual poster session. We are continuing this relationship by developing an in-person workshop for 2022.
Collaborator Contribution The main partners were also part of the steering committee. Washington University provided experts to speak at the workshop, organised sessions and handled registration. ICR and NCITA also contributed expertise and talks for the workshop, setting up the virtual poster session and developed some of the promotional and communications material for the meeting. All of the partners will be a part of organising the workshop in 2022.
Impact Hosting the 2021 XNAT International workshop
Start Year 2021
 
Description Partnership around XNAT workshop organisation. 
Organisation Washington University in St Louis
Department School of Medicine
Country United States 
Sector Academic/University 
PI Contribution Three of our team members were part of the organising committee for an international workshop for users and developers of an image data management platform called XNAT. We met weekly for a six month buildup before the workshop took place to prepare the workshop agenda. Our group identified and approached speakers to participate in the workshop, and we were responsible for the online platform that was used to host a majority of the workshop's content. We assisted on the administration of the event on the day, as well as charging a session and providing a poster for the virtual poster session. We are continuing this relationship by developing an in-person workshop for 2022.
Collaborator Contribution The main partners were also part of the steering committee. Washington University provided experts to speak at the workshop, organised sessions and handled registration. ICR and NCITA also contributed expertise and talks for the workshop, setting up the virtual poster session and developed some of the promotional and communications material for the meeting. All of the partners will be a part of organising the workshop in 2022.
Impact Hosting the 2021 XNAT International workshop
Start Year 2021
 
Description Delivered invited lecture and weeklong project for UCL Medical Image Computing Summer School 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact MedICSS (UCL Medical Image Computing Summer School) is an annual summer school organised and run by the Centre for Medical Image Computing (CMIC) and Wellcome-EPSRC Centre for Interventional and Surgical Sciences (WEISS).. It is intended for any individuals regardless of their careers stage who are interested in medical image computing for their research..
Multiple members of the IDEAS team contributed lectures to the program that cover essential elements of imaging that we wish to highlight as part of this project:
1. David Cash gave a talk entitled "Medical Imaging at scale". It covered strategies for handling, storing, and processing larger medical imaging datasets. It also helped attendees understand where open science and data protection have shared goals and where there are potential conflicts.
2, Jamie McClelland gave a talk entitled "Medical Image Registtration: A Brief Introduction" - which provided the overview of image registration, the fundamental elements that make up a medical image registration framework and how these elements differ depending on the application, and validating the results of the registration after it has been performed.
3. Andre Altmann gave a talk entitled "Combining genetics and imaging to better understand brain disorders" which looks at ways of combining imaging biomarkers with genetic analyses to better understand what are the genetic factors driving the prevalence of image-derived phenotypes.

In addition to the lectures, there were also week-long projects that were developed, where students worked in teams in an assigned task for two hours each for the first four days, followed by a presentation of their findings on the final day. Andre Altmann lead a project on Imaging Genetics, which gave project members a hands-on introduction to imaging genetics analyses, both using more classic case-cotrol approaches as well as multivariate methods. David Cash and Haroon Chughtai, a member of the Research Software Development Group who works closely with IDEAS, created a project that covered the principles of data management and using reproducibility friendly technology, such as Docker or Singularity containers, to run the analysis. We also examined how the reproducibility of results is affected by removing the facial features (known as "de-facing") from a brain MRI image, a data protection strategy often implemented by researchers. This is because these facial features can potentially identify an individual, currently only in very controlled circumstances, However, these defacing algorithms are not completely benign; they can have unintended consequences to the popular neuroimaging pipelines that are often used to analyse the data. So the project attendees ran image analysis before and after de-facing to see how much de-facing affected the results, thus getting an appreciation for the balance between data protection and reproducibility.

IDEAS will be heavily involved in this years UCL Medical Imaging Computing Summer School, with two of its members, Kris Thielemans and Maria Tziraki, part of the three team organising committee, as well as contributing lectures and projects for the attendees.
Year(s) Of Engagement Activity 2021
URL https://medicss.cs.ucl.ac.uk/
 
Description Email list for announcing courses 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact We have created a JISC-hosted email list at HEALTHBIOSCIENCE-IDEAS-NEWS@JISCMAIL.AC.UK to handle announcements for upcoming courses of interest to our subscribers. There are links to subscription available on the website and social media channels. Curreently there are 23 subscribers and we will ramp up engagement onthis as more courses come online.
Year(s) Of Engagement Activity 2021
URL https://www.jiscmail.ac.uk/HEALTHBIOSCIENCE-IDEAS-NEWS
 
Description Health and Bioscience IDEAS website 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We have created a website for the project, which is where courses will be announced and information about the project can be found. This has been established using the GitHub pages functionality within GitHub, This provides free hosting of the website, and it allows the content to be edited through simple interaction with plain text Markdown files. We also use it to demonstrate some of the practices around software development in research that we wish to promote, like the use of a source code versioning system, and effective continuous integration testing to ensure that the website will always be functional after edits are deployed.
Year(s) Of Engagement Activity 2021
URL https://healthbioscienceideas.github.io/
 
Description Invited lecture for early career researchers in artificial intelligence for Alzehimer's disease. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We were invited by early career researchers from the DEMON network (https://demondementia.com/) to give a talk/tutorial as part of their online workshop entitled "ECR DEMON workshop: combining imaging data using machine learning". We provided an interactive hands-on lecture on data harmonisation - the need for harmonising across different MRI scanners, and strategies for achieving harmonisation.
Year(s) Of Engagement Activity 2021
URL https://healthbioscienceideas.github.io/demon-imaging-harmonisation/
 
Description Twitter handle for the project. 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact We have set up a project twitter handle to promote awareness of the project as well as the courses that we will run as well as relevant efforts hosted by other groups.
Year(s) Of Engagement Activity 2021
URL https://twitter.com/HealthBioIdeas
 
Description XNAT workshop 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The workshop was attended by over 200 participants from 19 countries (United Kingdom, United States, Australia, Colombia, Germany, Netherlands, Japan, Canada, France, Italy, Spain, China, Greece, Switzerland, Portugal, Belgium, Ukraine, Singapore, India). Attendees came from many disciplines: developers of the platform, researchers who use it to manage and analyse their data, principal investigators who rely on it for their clinical research studies, and development operations specialists whos job it is to install and maintain instances at their institutions. The format was extremely interactive. While there were a few keynote speakers, there were also round table discussions with expert speakers and town hall forums where attendees were allowed to have their say on the current state of the platform and its direction in the future. Use of the online Q&A and chat functions were heavy. provided for lots of discussions during the sessions. There was also a virtual poster hall and poster session using the online spatial conferencing platform Spatial Chat.

The videos of the talks from the session as well as the posters from online poster session are still available to view from the above URL. Many of these recorded elements are there to provide training and advice to potential, new and active users of the system, of which the United Kingdom represents the second largest group of users, just behind the United States.
Year(s) Of Engagement Activity 2021
URL https://wiki.xnat.org/workshop-2021/