New advances in insecticide resistance genomics: using Machine Learning to predict resistance phenotype from large-scale genomic data.

Lead Research Organisation: Liverpool School of Tropical Medicine
Department Name: Vector Biology


Malaria is a parasitic tropical disease which kills hundreds of thousands of people every year, predominantly children in Sub-Saharan Africa (SSA). The disease is transmitted by mosquitoes who acquire the parasites after taking a blood meal from an infected person. Elimination of malaria therefore relies on effectively reducing mosquito numbers to break the cycle of transmission. This is primarily achieved through the application of insecticides, either by spraying the walls of houses in which mosquitoes bite or by protecting humans with insecticide-treated bed nets. The documented evolution of resistance to insecticides in mosquitoes that carry malaria is therefore of great concern, and the continued effectiveness of control programmes requires knowledge of the insecticides to which a mosquito population is susceptible. Currently, this is achieved by experimentally exposing mosquitoes to insecticides to directly measure their resistance, but this process is slow and laborious, and not a good indicator of impact. Ideally, it should be possible to screen a mosquito population for key genes involved resistance, but our understanding of the genetics of insecticide resistance is still limited to a handful of genes.

The scientific community is currently at the advent of an exciting era in genomics where modern genome sequencing capacity is rapidly increasing the scale at which genomic data can be produced. What have been lacking are analytical techniques that can utilise the huge scale of data and integrate all of the information it contains to predict resistance phenotypes. Machine learning is an approach that allows computers to use existing data to "learn" how to analyse new data and use it to make predictions. For example, given a sufficiently large dataset of mosquitoes whose genetics and resistance characteristics are known, machine learning tools can find associations between genetics and resistance, which can then be used to measure resistance using only genetics. Machine learning tools have yet to be applied to the field of insecticide resistance because they require large amounts of data from which to "learn", and the necessary genomic data have been lacking.

We and our collaborators are currently amassing the largest collection of any species to date combining both genome-wide sequencing data and measures of insecticide resistance, producing unprecedented amounts of resistance-associated genomic data. We will leverage these data, using machine learning to improve our ability to estimate the insecticide resistance profile of a mosquito using genomic data.

Most importantly, this project will help improve our ability to screen mosquito populations for insecticide resistance and will inform malaria control policy as a result. In collaboration with our partners in SSA who are closely involved with the mosquito control programmes, we will identify areas where our method can be most effectively applied to help improve the control of malaria.

Technical Summary

Malaria prevalence in Sub-Saharan Africa (SSA) has been reduced by 50% since 2000, primarily due to insecticide-based mosquito control measures. However, progress has stagnated in recent years, and has even reversed in some areas, due in part to the rise of insecticide resistance in mosquitoes. High-throughput genetics offers the prospect of accurate and reliable methods for large-scale characterisation of resistance in mosquito populations using routine collections, avoiding the laborious work of phenotypically testing mosquito resistance.

The scientific community is currently at the advent of an exciting era in genomics where whole-genome sequencing can be performed at a rapidly increasing scale. These very large genomic datasets require novel analytical methods to deal with challenges of auto-correlation and over-fitting. Machine learning is a promising approach that has been used extensively to produce powerful classifiers in big data analyses, but is has yet to be applied to the field of vector-borne disease control because the necessary genomic data have been lacking.

Through our involvement and leadership of the Vector Observatory and Genomics for African Anopheles Resistance Diagnostics projects, we will be whole-genome sequencing thousands of mosquitoes with known insecticide-resistance phenotypes from across SSA. The proposed project will leverage these unprecedented amounts of data to develop genomic predictions of insecticide resistance, and to design low-cost assays that can be used to determine resistance profiles using routine mosquito collections. We will experiment with different machine learning approaches and explore the possibility of combining a range of different information, for example combining gene expression and genomic data in a single analysis. We will also work with collaborators in SSA to design use-case scenarios for the implementation of our genetic screens in malaria-endemic areas.

Planned Impact

The main non-academic beneficiaries of this project will be malaria control programme policy makers, communities in malaria-endemic areas and the general public, as described in detail in the Pathways to Impact statement.

Malaria control programme policy makers will benefit from the ability to make more informed decisions, based on improved resistance diagnostics. This will in turn translate into benefits to communities in malaria-endemic areas, who will benefit from the resulting increased efficiency and effectiveness of malaria control. Furthermore, through engagement with the field teams carrying out the mosquito collections, as described in the Pathways to Impact statement, communities will benefit from increased understanding of the importance of insecticide resistance diagnostics in vector control measures.

The general public will be interested in the application of machine learning tools to the benefit of public health. Machine learning and artificial intelligence are predominantly publicised in the context of marketing or playing games such as Chess and Go. High-profile cases of its application to disease control will reinforce understanding of the positive applications of these methods. De-mystifying machine learning through layman-orientated science communications and using malaria-control and a concrete example, as described in the Pathways to Impact statement, will also help improve understanding of this in the public eye.


10 25 50