Metalearning for Health AI
Lead Research Organisation:
University of Oxford
Abstract
In its infancy is the development of new AI methods for "learning how to learn"; that is, improving the learning process involved in training very complex models using huge healthcare datasets. Healthcare AI is characterised by having access to very large, mostly "unlabelled" datasets, in the sense that it is infeasible to ask a domain expert to label data - this contrasts with traditional areas of AI (typically imaging), in which labels can be obtained from, for example, crowdsourcing or other means of engaging non-experts. The complexity of healthcare time series are such that often the only labels that exist are "hard end-points", such as the eventual death of a patient - this lack of labels occurs because labelled clinical time series (upwards of 100-variate) is not at all trivial, even for a clinical expert. For example, clinical collaborators find it difficult to characterise the many possible manifestations of disease trajectory, because of the considerable inter- and intra-patient variability that exists in this challenging domain. Further exacerbating the labelling problem is that a particular application may often focus on a disease or a patient group for which very little data have been collected. The fact that a patient may be healing, or attempting to respond to treatment, adds yet further complexity to the labelling problem. Existing methods of unsupervised / semi-supervised / active learning typically perform poorly as a result of the above limitations. While transfer learning has proven successful in many imaging problems (because conventional deep learning methods can boost performance significantly on a small, target task by having access to data from a larger imaging task), it typically fails in the non-imaging healthcare AI domain, due to the complexity and variability of patient time-series data.This theme of research within the doctoral programme will propose novel "oracle-free" methods of active learning, recognising the clinical reality that labels are either entirely absent or that, at best, a clinician can only give an opinion on time-series data form a very small subset of patients, and typically for a small subset of the total number of variables available. Building on our past experience of curriculum learning (ICML 2020, in which committees of "student" complex networks are overseen by "teacher" networks), we will propose student-teacher-principal approaches. This hierarchy corresponds to students learning the required clinical task, while teachers learn transferable summaries of latent representations of the students. (This will initially use the massively-multivariate state of the student networks as input to the teachers, from which latent representations of the students will be learned.) The "principal" is the highest level of the hierarchy, which will oversee the transfer of these latent summaries between healthcare tasks, with the aim of allowing us to exploit the value of large publicly-available and patient-confidential datasets in synchrony. This is a critical new direction for Health AI, where typically every new dataset involves "starting again" ab initio. With such new methods, tasks with similar dynamics (not necessarily from analogous physiological conditions or acquisition modalities) can be identified automatically, and where the principal will encourage learning across teachers trained on tasks of similar dynamics.Secondly, datasets are owned by hospital networks; while we have gained access to data on a hospital-by-hospital basis, this is often only appropriate for constructing proof-of-principle models. For AI to make a real impact at national scale, technologies must be developed for operating on data in a federated manner, transmitting models / model updates between a "hub" and the various "spokes" throughout This project falls within the EPSRC Digital Economies, Healthcare Technologies, and ICT research areas. collaborations.
Planned Impact
In the same way that bioinformatics has transformed genomic research and clinical practice, health data science will have a dramatic and lasting impact upon the broader fields of medical research, population health, and healthcare delivery. The beneficiaries of the proposed training programme, and of the research that it delivers and enables, will include academia, industry, healthcare, and the broader UK economy.
Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.
Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.
Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.
Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.
Academia: Graduates of the training programme will be well placed to start their post-doctoral careers in leading academic institutions, engaging in high-impact multi-disciplinary research, helping to build training and research capacity, sharing their experience within the wider academic community.
Industry: Partner organisations will benefit from close collaboration with leading researchers, from the joint exploration of research priorities, and from the commercialisation of arising intellectual property. Other organisations will benefit from the availability of highly-qualified graduates with skills in big health data analytics.
Healthcare: Healthcare organisations and patients will benefit from the results of enabled and accelerated health research, leading to new treatments and technologies, and an improved ability to identify and evaluate potential improvements in practice through the analysis of real-world health data.
Economy: The life sciences sector is a key component of the UK economy. The programme will provide partner companies with direct access to leading-edge research. Graduates of the programme will be well-qualified to contribute to economic growth - supporting health research and the development of new products and services - and will be able to inform policy and decision making at organisational, regional, and national levels.
Organisations
People |
ORCID iD |
| Jacob Armstrong (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S02428X/1 | 31/03/2019 | 29/09/2027 | |||
| 2279625 | Studentship | EP/S02428X/1 | 30/09/2019 | 06/01/2024 | Jacob Armstrong |
| EP/W524311/1 | 30/09/2022 | 29/09/2028 | |||
| 2279625 | Studentship | EP/W524311/1 | 30/09/2019 | 06/01/2024 | Jacob Armstrong |