Infection-AID: AI assisted genomic profiling to inform the Diagnosis, personalised treatment and control of infections
Lead Research Organisation:
London School of Hygiene and Tropical Medicine
Department Name: Infectious and Tropical Diseases
Abstract
Characterising the genetic code ("genome") of an organism can inform on its ability to survive, tolerate drugs and treatments, and its likely geographical source. Researchers can investigate the genome of an organism, and its important mutations (genome "spelling mistakes"), through applying sequencing technologies to its DNA. Cost-effective and rapid sequencing technologies are now being rolled-out in hospitals and clinics to identify important mutations, and thereby prevent disease, diagnose, and personalise treatment of patients. Genome sequencing has become an important diagnostic tool in infectious disease settings, including to identify microorganisms causing infections ("pathogens") and their resistance to drugs, and to track outbreaks. Such knowledge is revolutionizing clinical decision making, public health surveillance and infection control; as demonstrated during the COVID-19 pandemic, where rapid sequencing of the causal SARS-CoV-2 viral genomes has assisted the detection of clinically important mutations (e.g., omicron variants) and informed on their geographical spread ("transmission patterns"). To assist the analysis of the large datasets arising from the sequencing of pathogens, it is important to identify key mutations linked to (severe) patient outcomes, drug resistance, likely geographical source, and other important "barcoding" information that can provide a "profile" of the pathogen underlying any infection. Computer software tools have been developed (e.g., our TB-Profiler and Malaria-Profiler software) that can rapidly analyse sequence data to provide such pathogen profiles, for easy interpretation by medical doctors and infection control specialists.
With the increasing use of sequencing technologies in hospitals and clinics, there is a need for Artificial Intelligence (AI) computational methods to analyse the resulting "big data" in real time, including to update the lists of barcoding genetic mutations and to identify if the pathogen genome
is related to those previously sequenced i.e., it is being transmitted. We have previously applied AI methods to identify known and novel genetic mutations linked to drug resistance and transmission, as well as created computing repositories (e.g., TB-ML) where the underlying software can be stored, allowing comparisons between statistical models and AI approaches. Our proposed project will integrate these AI-based tools into our profiling software to reveal drug resistance mutation and transmission patterns, and generate informative reports for clinical and infection control decision making. Working within established collaborations involving The UK Health Security Agency and Health ministries in Asia (Bangladesh, Philippines, Thailand, Vietnam), which are routinely using sequencing technologies to inform clinical diagnosis, we will attempt to implement the resulting AI systems software in the UK and overseas settings endemic for infectious diseases. We will initially focus on three main infectious diseases of high global burden, tuberculosis, malaria and Klebsiella infections, with the potential to extend the work to other infections. All sequence data and software developed will be made publicly accessible, leading to their use by other biomedical researchers and healthcare stakeholders. Ultimately, the implementation of such AI-based tools will reduce the burden of infectious diseases, leading to healthier populations and associated economic benefits.
With the increasing use of sequencing technologies in hospitals and clinics, there is a need for Artificial Intelligence (AI) computational methods to analyse the resulting "big data" in real time, including to update the lists of barcoding genetic mutations and to identify if the pathogen genome
is related to those previously sequenced i.e., it is being transmitted. We have previously applied AI methods to identify known and novel genetic mutations linked to drug resistance and transmission, as well as created computing repositories (e.g., TB-ML) where the underlying software can be stored, allowing comparisons between statistical models and AI approaches. Our proposed project will integrate these AI-based tools into our profiling software to reveal drug resistance mutation and transmission patterns, and generate informative reports for clinical and infection control decision making. Working within established collaborations involving The UK Health Security Agency and Health ministries in Asia (Bangladesh, Philippines, Thailand, Vietnam), which are routinely using sequencing technologies to inform clinical diagnosis, we will attempt to implement the resulting AI systems software in the UK and overseas settings endemic for infectious diseases. We will initially focus on three main infectious diseases of high global burden, tuberculosis, malaria and Klebsiella infections, with the potential to extend the work to other infections. All sequence data and software developed will be made publicly accessible, leading to their use by other biomedical researchers and healthcare stakeholders. Ultimately, the implementation of such AI-based tools will reduce the burden of infectious diseases, leading to healthier populations and associated economic benefits.
Organisations
- London School of Hygiene and Tropical Medicine (Lead Research Organisation)
- PUBLIC HEALTH ENGLAND (Collaboration)
- MINISTRY OF PUBLIC HEALTH (Collaboration)
- Research Institute for Tropical Medicine (Project Partner)
- Ministry of Public Health, Thailand (Project Partner)
- International Centre for Diarrhoeal Disease Research (ICDDR,B) (Project Partner)
- National Institute of Nutrition (Project Partner)
Publications
Asghar M
(2024)
Exploring the Antimicrobial Resistance Profile of Salmonella typhi and Its Clinical Burden.
in Antibiotics (Basel, Switzerland)
Billows N
(2024)
Large-scale statistical analysis of Mycobacterium tuberculosis genome sequences identifies compensatory mutations associated with multi-drug resistance.
in Scientific reports
Elias R
(2025)
Dissemination of arr-2 and arr-3 is associated with class 1 integrons in Klebsiella pneumoniae clinical isolates from Portugal.
in Medical microbiology and immunology
Higgins M
(2024)
New reference genomes to distinguish the sympatric malaria parasites, Plasmodium ovale curtisi and Plasmodium ovale wallikeri
in Scientific Reports
Khan MF
(2024)
Exploring optimal drug targets through subtractive proteomics analysis and pangenomic insights for tailored drug design in tuberculosis.
in Scientific reports
| Description | We have developed methods to systematically download and analyse sequence data across infectious diseases (e.g., malaria, TB) and integrate these into AI models that predict key clinical and epidemiological insights, such as drug resistance, geographic origin, strain types, and transmission dynamics. These models, along with the key predictive mutations they identify, are currently undergoing validation through additional sequencing efforts, with three manuscripts in preparation. The malaria AI model is being incorporated into UKHSA workflows, while the TB model is being implemented within Thailand's health systems. |
| Exploitation Route | The AI models, along with the underlying software and data, are being made accessible to the research community. We plan to seek follow-up funding to expand implementations to additional countries and pathogens. As noted, the UKHSA and Thailand Ministry of Public Health are integrating these AI and informatics tools into their systems and can serve as key advocates for future initiatives. |
| Sectors | Digital/Communication/Information Technologies (including Software) Education Healthcare |
| Description | Generated sequence data have been processed through AI models to characterise pathogen genotypic profiles, including drug resistance and geographic origin. These insights have supported the UKHSA in cryptic malaria investigations and informed clinical decision-making and outbreak investigations within the Thailand Ministry of Public Health. |
| First Year Of Impact | 2024 |
| Sector | Digital/Communication/Information Technologies (including Software),Healthcare |
| Impact Types | Economic Policy & public services |
| Title | Bioinformatic and AI tools |
| Description | We have established bioinformatic pipelines for all the pathogens considered in this project (e.g., Mycobacterium tuberculosis, Klebsiella, Plasmodium species), which process raw sequences into variants that are used in the machine learning models. To assist the application of the machine learning models, we have developed Docker containers that are functional software modules that cover data inputs, processing and outputs. These allow for the comparison of different machine learning methods and models across datasets. We propose to share this framework, linked to a scientific publication in preparation. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2024 |
| Provided To Others? | No |
| Impact | The use of dockers means that we have a framework for sharing computing code and outputs from the implementation of different machine learning methods. |
| Title | Infection genomics datasets |
| Description | We have been automatically downloading sequence and meta data linked to the pathogens of interest in our project (e.g., Mycobacterium, Plasmodium, Klebsiella), and passing them through our bioinformatic pipelines. This is resulting in large datasets for each pathogen (e.g., M. tuberculosis n>100K), which we then apply in our machine learning approaches. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2023 |
| Provided To Others? | No |
| Impact | This approach means that we have growing datasets to inform and validate our machine learning models, which in turn provide insights into mutations linked to drug resistance, strain-types and geographical source. The raw data are mostly in the public domain, but through combining them and developing machine learning models, these resources will be useful to those without computational expertise, but can use them to drive their research. |
| Description | Thailand Ministry of Public Health - Sequence data and informatics |
| Organisation | Ministry of Public Health |
| Country | Thailand |
| Sector | Public |
| PI Contribution | We have developed the bioinformatic pipelines and adapted our informatic tools (e.g., TB-Profiler) for use by the MOPH. |
| Collaborator Contribution | The MOPH are sharing TB sequence and AMR phenotypic data that is being used to update our machine learning models. They are also assessing the mutations being found by our machine learning models, for their biological and potential clinical relevance. |
| Impact | Outputs include: (1) >1,200 M. tuberculosis with whole genome sequencing data to date; (2) TB-Profiler installed at the MOPH, and generating outputs in the Thai language. |
| Start Year | 2023 |
| Description | Thailand Ministry of Public Health - Sequence data and informatics |
| Organisation | Ministry of Public Health |
| Country | Thailand |
| Sector | Public |
| PI Contribution | We have developed the bioinformatic pipelines and adapted our informatic tools (e.g., TB-Profiler) for use by the MOPH. |
| Collaborator Contribution | The MOPH are sharing TB sequence and AMR phenotypic data that is being used to update our machine learning models. They are also assessing the mutations being found by our machine learning models, for their biological and potential clinical relevance. |
| Impact | Outputs include: (1) >1,200 M. tuberculosis with whole genome sequencing data to date; (2) TB-Profiler installed at the MOPH, and generating outputs in the Thai language. |
| Start Year | 2023 |
| Description | UK Health Security Agency |
| Organisation | Public Health England |
| Country | United Kingdom |
| Sector | Public |
| PI Contribution | We are working with the UKHSA Malaria reference laboratory (UKHSA-MRL) to sequence isolate DNA sourced from clinical cases, to infer parasite species and drug resistance. These data are being used in our machine learning models. |
| Collaborator Contribution | The UKHSA-MRL are contributing Plasmodium DNA and linked anonymised clinical and parasitology data. |
| Impact | To date, we have accrued sequence data and drug resistance phenotypes from 300 Plasmodium parasites sourced from the UKHSA. When used in our machine learning models, we are detecting mutations that are linked to geographical source and drug resistance. Follow-up experimental validation of drug resistance mutations by UKHS-MRL is ongoing. |
| Start Year | 2023 |
| Description | Workshop on Genomics in Bangkok |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Postgraduate students |
| Results and Impact | 60 researchers attended training on genomic and AI data analysis, which strengthens capacity in genomics-based investigations. |
| Year(s) Of Engagement Activity | 2025 |
