New advances in insecticide resistance genomics: using Machine Learning to predict resistance phenotype from large-scale genomic data.
Lead Research Organisation:
Liverpool School of Tropical Medicine
Department Name: Vector Biology
Abstract
Malaria is a parasitic tropical disease which kills hundreds of thousands of people every year, predominantly children in Sub-Saharan Africa (SSA). The disease is transmitted by mosquitoes who acquire the parasites after taking a blood meal from an infected person. Elimination of malaria therefore relies on effectively reducing mosquito numbers to break the cycle of transmission. This is primarily achieved through the application of insecticides, either by spraying the walls of houses in which mosquitoes bite or by protecting humans with insecticide-treated bed nets. The documented evolution of resistance to insecticides in mosquitoes that carry malaria is therefore of great concern, and the continued effectiveness of control programmes requires knowledge of the insecticides to which a mosquito population is susceptible. Currently, this is achieved by experimentally exposing mosquitoes to insecticides to directly measure their resistance, but this process is slow and laborious, and not a good indicator of impact. Ideally, it should be possible to screen a mosquito population for key genes involved resistance, but our understanding of the genetics of insecticide resistance is still limited to a handful of genes.
The scientific community is currently at the advent of an exciting era in genomics where modern genome sequencing capacity is rapidly increasing the scale at which genomic data can be produced. What have been lacking are analytical techniques that can utilise the huge scale of data and integrate all of the information it contains to predict resistance phenotypes. Machine learning is an approach that allows computers to use existing data to "learn" how to analyse new data and use it to make predictions. For example, given a sufficiently large dataset of mosquitoes whose genetics and resistance characteristics are known, machine learning tools can find associations between genetics and resistance, which can then be used to measure resistance using only genetics. Machine learning tools have yet to be applied to the field of insecticide resistance because they require large amounts of data from which to "learn", and the necessary genomic data have been lacking.
We and our collaborators are currently amassing the largest collection of any species to date combining both genome-wide sequencing data and measures of insecticide resistance, producing unprecedented amounts of resistance-associated genomic data. We will leverage these data, using machine learning to improve our ability to estimate the insecticide resistance profile of a mosquito using genomic data.
Most importantly, this project will help improve our ability to screen mosquito populations for insecticide resistance and will inform malaria control policy as a result. In collaboration with our partners in SSA who are closely involved with the mosquito control programmes, we will identify areas where our method can be most effectively applied to help improve the control of malaria.
The scientific community is currently at the advent of an exciting era in genomics where modern genome sequencing capacity is rapidly increasing the scale at which genomic data can be produced. What have been lacking are analytical techniques that can utilise the huge scale of data and integrate all of the information it contains to predict resistance phenotypes. Machine learning is an approach that allows computers to use existing data to "learn" how to analyse new data and use it to make predictions. For example, given a sufficiently large dataset of mosquitoes whose genetics and resistance characteristics are known, machine learning tools can find associations between genetics and resistance, which can then be used to measure resistance using only genetics. Machine learning tools have yet to be applied to the field of insecticide resistance because they require large amounts of data from which to "learn", and the necessary genomic data have been lacking.
We and our collaborators are currently amassing the largest collection of any species to date combining both genome-wide sequencing data and measures of insecticide resistance, producing unprecedented amounts of resistance-associated genomic data. We will leverage these data, using machine learning to improve our ability to estimate the insecticide resistance profile of a mosquito using genomic data.
Most importantly, this project will help improve our ability to screen mosquito populations for insecticide resistance and will inform malaria control policy as a result. In collaboration with our partners in SSA who are closely involved with the mosquito control programmes, we will identify areas where our method can be most effectively applied to help improve the control of malaria.
Technical Summary
Malaria prevalence in Sub-Saharan Africa (SSA) has been reduced by 50% since 2000, primarily due to insecticide-based mosquito control measures. However, progress has stagnated in recent years, and has even reversed in some areas, due in part to the rise of insecticide resistance in mosquitoes. High-throughput genetics offers the prospect of accurate and reliable methods for large-scale characterisation of resistance in mosquito populations using routine collections, avoiding the laborious work of phenotypically testing mosquito resistance.
The scientific community is currently at the advent of an exciting era in genomics where whole-genome sequencing can be performed at a rapidly increasing scale. These very large genomic datasets require novel analytical methods to deal with challenges of auto-correlation and over-fitting. Machine learning is a promising approach that has been used extensively to produce powerful classifiers in big data analyses, but is has yet to be applied to the field of vector-borne disease control because the necessary genomic data have been lacking.
Through our involvement and leadership of the Vector Observatory and Genomics for African Anopheles Resistance Diagnostics projects, we will be whole-genome sequencing thousands of mosquitoes with known insecticide-resistance phenotypes from across SSA. The proposed project will leverage these unprecedented amounts of data to develop genomic predictions of insecticide resistance, and to design low-cost assays that can be used to determine resistance profiles using routine mosquito collections. We will experiment with different machine learning approaches and explore the possibility of combining a range of different information, for example combining gene expression and genomic data in a single analysis. We will also work with collaborators in SSA to design use-case scenarios for the implementation of our genetic screens in malaria-endemic areas.
The scientific community is currently at the advent of an exciting era in genomics where whole-genome sequencing can be performed at a rapidly increasing scale. These very large genomic datasets require novel analytical methods to deal with challenges of auto-correlation and over-fitting. Machine learning is a promising approach that has been used extensively to produce powerful classifiers in big data analyses, but is has yet to be applied to the field of vector-borne disease control because the necessary genomic data have been lacking.
Through our involvement and leadership of the Vector Observatory and Genomics for African Anopheles Resistance Diagnostics projects, we will be whole-genome sequencing thousands of mosquitoes with known insecticide-resistance phenotypes from across SSA. The proposed project will leverage these unprecedented amounts of data to develop genomic predictions of insecticide resistance, and to design low-cost assays that can be used to determine resistance profiles using routine mosquito collections. We will experiment with different machine learning approaches and explore the possibility of combining a range of different information, for example combining gene expression and genomic data in a single analysis. We will also work with collaborators in SSA to design use-case scenarios for the implementation of our genetic screens in malaria-endemic areas.
Planned Impact
The main non-academic beneficiaries of this project will be malaria control programme policy makers, communities in malaria-endemic areas and the general public, as described in detail in the Pathways to Impact statement.
Malaria control programme policy makers will benefit from the ability to make more informed decisions, based on improved resistance diagnostics. This will in turn translate into benefits to communities in malaria-endemic areas, who will benefit from the resulting increased efficiency and effectiveness of malaria control. Furthermore, through engagement with the field teams carrying out the mosquito collections, as described in the Pathways to Impact statement, communities will benefit from increased understanding of the importance of insecticide resistance diagnostics in vector control measures.
The general public will be interested in the application of machine learning tools to the benefit of public health. Machine learning and artificial intelligence are predominantly publicised in the context of marketing or playing games such as Chess and Go. High-profile cases of its application to disease control will reinforce understanding of the positive applications of these methods. De-mystifying machine learning through layman-orientated science communications and using malaria-control and a concrete example, as described in the Pathways to Impact statement, will also help improve understanding of this in the public eye.
Malaria control programme policy makers will benefit from the ability to make more informed decisions, based on improved resistance diagnostics. This will in turn translate into benefits to communities in malaria-endemic areas, who will benefit from the resulting increased efficiency and effectiveness of malaria control. Furthermore, through engagement with the field teams carrying out the mosquito collections, as described in the Pathways to Impact statement, communities will benefit from increased understanding of the importance of insecticide resistance diagnostics in vector control measures.
The general public will be interested in the application of machine learning tools to the benefit of public health. Machine learning and artificial intelligence are predominantly publicised in the context of marketing or playing games such as Chess and Go. High-profile cases of its application to disease control will reinforce understanding of the positive applications of these methods. De-mystifying machine learning through layman-orientated science communications and using malaria-control and a concrete example, as described in the Pathways to Impact statement, will also help improve understanding of this in the public eye.
Publications
Grau-Bové X
(2020)
Evolution of the Insecticide Target Rdl in African Anopheles Is Driven by Interspecific and Interkaryotypic Introgression.
in Molecular biology and evolution
Grau-Bové X
(2021)
Resistance to pirimiphos-methyl in West African Anopheles is spreading via duplication and introgression of the Ace1 locus.
in PLoS genetics
Torres M
(2021)
Protein function prediction for newly sequenced organisms
in Nature Machine Intelligence
Lucas ER
(2021)
A gene expression panel for estimating age in males and females of the sleeping sickness vector Glossina morsitans.
in PLoS neglected tropical diseases
Galeano D
(2022)
Machine learning prediction of side effects for drugs in clinical trials.
in Cell reports methods
Sacramento CQ
(2022)
Unlike Chloroquine, Mefloquine Inhibits SARS-CoV-2 Infection in Physiologically Relevant Cells.
in Viruses
Gliozzo J
(2022)
Heterogeneous data integration methods for patient similarity networks.
in Briefings in bioinformatics
Santos SS
(2022)
Machine learning and network medicine approaches for drug repositioning for COVID-19.
in Patterns (New York, N.Y.)
Casiraghi E
(2023)
A method for comparing multiple imputation techniques: A case study on the U.S. national COVID cohort collaborative.
in Journal of biomedical informatics
Dyer NA
(2023)
Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression.
in bioRxiv : the preprint server for biology
Lucas ER
(2023)
Genome-wide association studies reveal novel loci associated with pyrethroid and organophosphate resistance in Anopheles gambiae and Anopheles coluzzii.
in Nature communications
Paccanaro A
(2023)
LanDis: The Disease Landscape Explorer
Lucas E
(2024)
Copy number variants underlie major selective sweeps in insecticide resistance genes in Anopheles arabiensis
in PLOS Biology
Nagi SC
(2024)
Parallel evolution in mosquito vectors - a duplicated esterase locus is associated with resistance to pirimiphos-methyl in An. gambiae.
in bioRxiv : the preprint server for biology
Nagi SC
(2024)
Parallel Evolution in Mosquito Vectors-A Duplicated Esterase Locus is Associated With Resistance to Pirimiphos-methyl in Anopheles gambiae.
in Molecular biology and evolution
Caniza H
(2024)
LanDis: the disease landscape explorer.
in European journal of human genetics : EJHG
Dyer NA
(2024)
Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele-specific expression.
in Proceedings. Biological sciences
De Siqueira Santos S
(2025)
Host centric drug repurposing for viral diseases
in PLOS Computational Biology
| Description | Genome-based diagnostics for mapping, monitoring and management of insecticide resistance in major African malaria vectors |
| Amount | $2,270,913 (USD) |
| Funding ID | R01AI116811 |
| Organisation | National Institutes of Health (NIH) |
| Department | National Institute of Allergy and Infectious Diseases (NIAID) |
| Sector | Public |
| Country | United States |
| Start | 05/2022 |
| End | 05/2027 |
| Title | Data from: Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression |
| Description | Malaria control relies on insecticides targeting the mosquito vector, but this is increasingly compromised by insecticide resistance, which can be achieved by elevated expression of detoxifying enzymes that metabolize the insecticide. In diploid organisms, gene expression is regulated both in cis, by regulatory sequences on the same chromosome, and by trans acting factors, affecting both alleles equally. Differing levels of transcription can be caused by mutations in cis-regulatory modules (CRM), but few of these have been identified in mosquitoes. We crossed bendiocarb resistant and susceptible Anopheles gambiae strains to identify cis-regulated genes that might be responsible for the resistant phenotype using RNAseq, and cis-regulatory module sequences controlling gene expression in insecticide resistance relevant tissues were predicted using machine learning. We found 115 genes showing allele specific expression in hybrids of insecticide susceptible and resistant strains, suggesting cis regulation is an important mechanism of gene expression regulation in Anopheles gambiae. The genes showing allele specific expression included a higher proportion of Anopheles specific genes on average younger than genes with balanced allelic expression. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://datadryad.org/stash/dataset/doi:10.5061/dryad.3n5tb2rr1 |
| Title | Supplementary Methods and Figures from Mechanisms of transcriptional regulation in Anopheles gambiae revealed by allele specific expression |
| Description | This file contains supplementary figures 1 to 5 and the supplementary methods |
| Type Of Material | Database/Collection of data |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| URL | https://rs.figshare.com/articles/dataset/Supplementary_Methods_and_Figures_from_Mechanisms_of_transc... |
