Where coronaviruses hide, where novel strains are generated, and how they get to us: Predicting reservoirs, recombination, and geographical hotspots

Lead Research Organisation: University of Liverpool
Department Name: Evolution, Ecology and Behaviour

Abstract

Novel pathogenic coronaviruses, including SARS-CoV and SARS-CoV-2, arise by genetic recombining of two different coronavirus strains co-infecting an animal host. These viruses circulate in reservoir animal populations before spillover to humans.
Understanding, monitoring, and mitigating both recombination and spillover requires identifying hosts that are susceptible to each coronavirus and hosts susceptible to multiple coronavirus strains (termed recombination hosts).
However, the majority of coronavirus-host associations, and therefore reservoirs and recombination hosts, remain unidentified. This has led to an underappreciation of the potential scale of novel coronavirus generation and spillover.
Here, we aim to predict all host species which act as SARS-CoV-2 reservoirs and recombination hosts (WP1), by expanding our tried-and-tested machine-learning framework to include avian hosts. This will enable monitoring of SARS-CoV-2 reservoirs during the pandemic, and hosts in which SARS-CoV-2 could recombine to generate novel pathogenic viruses.
Geographical overlap of host species is a key predictor of between-species viral sharing. By constructing a species-level ecological contact network and integrating it with our framework we will further refine our predictions. This will enable us to identify geographical hotspots of coronaviruses recombination (WP2), and therefore enable specific spatially-targeted surveillance and mitigation efforts.
Many coronavirus hosts interact with humans, either naturally by e.g. geographic/habitat overlap, or are used by humans as e.g. pets/food. Using geographical data from WP2 and host species utilisation data from open-access sources, we will (WP3) estimate the in situ likelihood of spillover from hosts identified in WP1. This will inform policy makers of species and hotspots for spillover mitigation efforts.
 
Description The generation and emergence of three novel respiratory coronaviruses from mammalian reservoirs into human populations in the last 20 years, including one which has achieved pandemic status, suggests that one of the most pressing current research questions is: in which reservoirs could the next novel coronaviruses be generated and emerge from in future?

Coronaviruses are a family of RNA viruses, which can cause an array of diseases. In humans, these range from mild cold-like illnesses to lethal respiratory tract infections. Seven coronaviruses are known to infect humans1, SARS-CoV, MERS-CoV and SARS-CoV-2 causing severe disease, while HKU1, NL63, OC43 and 229E tend towards milder symptoms in most patients.

Homologous recombination is a natural process, which brings together new combinations of genetic material, and hence new viral strains, from two similar non-identical parent strains of virus. This recombination occurs when different strains co-infect an individual animal, with sequences from each parent strain in the genetic make-up of progeny virus. Homologous recombination has previously been shown in Coronavirues, and is implicated in the generation of SARS-CoV-2.

The most fundamental requirement for homologous recombination to take place is the co-infection of a single host with multiple coronaviruses. However, our understanding of which hosts are permissive to which coronaviruses, the prerequisite to identifying which hosts are potential sites for this recombination (henceforth termed 'recombination hosts'), was extremely limited.

Here, we utilise our similarity-based machine-learning pipeline to address this significant knowledge gap. Our approach predicted associations between coronaviruses and their potential mammalian hosts by integrating three points of view, encompassing: (1) genomic features depicting different aspects of coronaviruses; (2) ecological, phylogenetic and geospatial traits of potential mammalian hosts; and (3) characteristics of the network that describes the linkage of coronaviruses to their observed hosts.

Using this pipeline, we demonstrated that there is currently a large underappreciation of the potential scale of novel coronavirus generation in wild and domesticated animals. Specifically, we predicted there are 11.5-fold more coronavirus-host associations, over 30-fold more potential SARS-CoV-2 recombination hosts, and over 40-fold more host species with four or more different subgenera of coronaviruses than have been observed to date (at >0.5 mean probability cut-off).

We discussed the large number of candidate hosts in which homologous recombination of coronaviruses could result in the generation of novel pathogenic strains, as well as the substantial underestimation of the range of viruses which could recombine based on observed data.


Give that coronaviruses frequently undergo homologous recombination when they co-infect a host, and that SARS-CoV-2 is highly infectious to humans, the most immediate threat to public health is recombination of other coronaviruses with SARS-CoV-2. Such recombination could readily produce further novel viruses with both the infectivity of SARS-CoV-2 and additional pathogenicity from other Coronaviruses.

Taking only observed data, there are four non-human mammalian hosts known to associate with SARS-CoV-2 and at least one other coronavirus, and a total of 504 different unique interactions between SARS-CoV-2 and other coronaviruses. However, when we add in our model's predicted interactions this becomes 126 SARS-CoV-2 hosts and 2544 total unique interactions; indicating that observed data are missing 31.5-fold of the total number of predicted recombination hosts, and 5.05-fold increase of the predicted unique associations. Furthermore, we predicted that there are 11.5-fold more general coronavirus-host associations, and over 40-fold more host species with four or more different subgenera of coronaviruses than have been observed to date. Our results demonstrate the large underappreciation of the potential scale of novel coronavirus generation in wild and domesticated animals.

With the greater understanding of the extent of mammalian host reservoirs and the potential recombination hosts we identify here, a targeted surveillance programme is now possible which would allow for this generation to be observed as it is happening and before a major outbreak. Such information could help inform prevention and mitigation strategies and provide a vital early warning system for future novel coronaviruses.
Exploitation Route With the greater understanding of the extent of mammalian host reservoirs and the potential recombination hosts we identify here, a targeted surveillance programme is now possible which would allow for this generation to be observed as it is happening and before a major outbreak. Such information could help inform prevention and mitigation strategies and provide a vital early warning system for future novel coronaviruses.

As an example of the utilisation of our model from the perspective of likely future viral homologous recombination events, Banerjee et al. bioinformatically identified potential genomic regions of homologous recombination between MERS-CoV and SARS-CoV-2. They highlighted a significant risk of the highly human-to-human transmissible SARS-CoV-2 acquiring the considerably more pathogenic phenotypes of MERS-CoV. The work presented here identifies 102 potential recombination hosts (excluding humans and laboratory rodents) of the two viruses. Together, our work and Banerjee et al., we provide evidence for both the possible production of a potentially severe future recombinant coronavirus and identify the hosts in which this threat is most likely to be generated. We recommend monitoring for this event.
Sectors Environment,Healthcare

URL https://www.nature.com/articles/s41467-021-21034-5
 
Description 74 news stories/radio interviews have been written/performed about this work. Through collaborators Species360, we are currently discussing a policy document with the WHO regarding trade of animal hosts of the WHO blueprint diseases
First Year Of Impact 2021
Sector Agriculture, Food and Drink,Environment,Healthcare
Impact Types Policy & public services

 
Description Global trade of coronavirus hosts: bringing geographically isolated hosts and viruses together risks novel recombination and spillover to humans
Amount £117,406 (GBP)
Funding ID BB/W00402X/1 
Organisation Biotechnology and Biological Sciences Research Council (BBSRC) 
Sector Public
Country United Kingdom
Start 06/2021 
End 05/2022
 
Description Vector in the machine: How accurately can mosquito transmission of viruses be predicted by machine learning?
Amount £181,248 (GBP)
Organisation Medical Research Council (MRC) 
Sector Public
Country United Kingdom
Start 10/2022 
End 09/2026
 
Title Data and code for: "Monkeypox virus shows potential to infect a diverse range of native animal species across Europe, indicating high risk of becoming endemic in the region." 
Description Background: Monkeypox is a zoonotic virus which persists in animal reservoirs and periodically spills over into humans, causing outbreaks. During the current 2022 outbreak, monkeypox virus has persisted via human-human transmission, across all major continents and for longer than any previous record. This unprecedented spread creates the potential for the virus to 'spillback' into local susceptible animal populations. Persistent transmission amongst such animals raises the prospect of monkeypox virus becoming enzootic in new regions. However, the full and specific range of potential animal hosts and reservoirs of monkeypox remains unknown, especially in newly at-risk non-endemic areas. Methods: Here, our pipeline utilises ensembles of classifiers comprising different class balancing techniques and incorporating instance weights, to identify which animal species are potentially susceptible to monkeypox virus. Subsequently, we generate spatial distribution maps to highlight high-risk geographic areas at high resolution. Findings: We show that the number of potentially susceptible species is currently underestimated by 2.4 to 4.3-fold. We show a high density of susceptible wild hosts in Europe. We provide lists of these species, and highlight high-risk hosts for spillback and potential long-term reservoirs, which may enable monkeypox virus to become endemic. 
Type Of Material Computer model/algorithm 
Year Produced 2022 
Provided To Others? Yes  
Impact N/A 
URL https://figshare.com/articles/software/Blagrove_et_al_2022_poxvriuses_data_and_code/20485332
 
Title Divide-and-conquer: data and codes 
Description Data and codes associated with: Divide-and-conquer: Wardeh, M., Blagrove, M.S.C., Sharkey, K.J. et al. Divide-and-conquer: machine-learning integrates mammalian and viral traits with network features to predict virus-mammal associations. Nat Commun 12, 3954 (2021). https://doi.org/10.1038/s41467-021-24085-w. Abstract: Our knowledge of viral host ranges remains limited. Completing this picture by identifying unknown hosts of known viruses is an important research aim that can help identify and mitigate zoonotic and animal-disease risks, such as spill-over from animal reservoirs into human populations. To address this knowledge-gap we apply a divide-and-conquer approach which separates viral, mammalian and network features into three unique perspectives, each predicting associations independently to enhance predictive power. Our approach predicts over 20,000 unknown associations between known viruses and susceptible mammalian species, suggesting that current knowledge underestimates the number of associations in wild and semi-domesticated mammals by a factor of 4.3, and the average potential mammalian host-range of viruses by a factor of 3.2. In particular, our results highlight a significant knowledge gap in the wild reservoirs of important zoonotic and domesticated mammals' viruses: specifically, lyssaviruses, bornaviruses and rotaviruses. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? Yes  
Impact The multi-perspective host-pathogen predictive framework undperins the following research awards: Where coronaviruses hide, where novel strains are generated, and how they get to us: Predicting reservoirs, recombination, and geographical hotspots (NE/W002302/1); Global trade of coronavirus hosts: bringing geographically isolated hosts and viruses together risks novel recombination and spillover to humans (BB/W00402X/1); and Predicting mammalian and avian reservoirs of coronaviruses: identifying current reservoirs and co-infection hosts in which future novel coronavirus could be generated (BBSRC IAA COVID - 168478) 
URL https://doi.org/10.6084/m9.figshare.13270304
 
Description Sapienza University of Rome 
Organisation Sapienza University of Rome
Country Italy 
Sector Academic/University 
PI Contribution Exploring the mammalian virome to detect patterns of compatibility between mammal species and viruses at a global scale, identifying eco-biological profiles of viral carriers along the fast-slow continuum of mammalian life-history.
Collaborator Contribution Provided insight into virus-mammal interactions, and role virus traits have on the transmission/spill-over of viruses.
Impact Publication: Identifying patterns along the fast-slow continuum of mammalian viral carriers (in prep/under review)
Start Year 2022
 
Description Species360 - Impact of global trade in wildlife on virus spread. 
Organisation IDAs og Berg-Nielsens Studie-og støttefond
Country Denmark 
Sector Charity/Non Profit 
PI Contribution This partnership aims at quantifying the impact of global trade in wild animals on the potential spread of emerging infectious zoonses prioritized by the WHO Research and Development Blueprint Strategy. Our group provided data on animal species in which these zoonoses has been found to date; and the ecological role of these animals in the transmission of these pathogens (such as reservoirs; dead-end; and amplifying hosts), as well as information on how these pathogens manifest in the animal host (mortality, morbidity, minor symptoms, or no disease).
Collaborator Contribution Our partners are leading on the analyses to identify geographical patterns of trade in the animals identified above, and the key data gaps that need to be resolved to fully assess risks from international wildlife trade and put this information in the context of other drivers of zoonotic diseases.
Impact NA
Start Year 2020
 
Title Blagrove_et_al_2022_poxvriuses data and code 
Description Data and code for: "Monkeypox virus shows potential to infect a diverse range of native animal species across Europe, indicating high risk of becoming endemic in the region." Authors: Marcus SC Blagrove*1, Jack Pilgrim1, Aurelia Kotsiri1, Melody Hui1, Matthew Baylis1, Maya Wardeh*1,2 * = corresponding authors 1) Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK 2) Department of Mathematical Sciences, University of Liverpool, Liverpool, UK Abstract: Background: Monkeypox is a zoonotic virus which persists in animal reservoirs and periodically spills over into humans, causing outbreaks. During the current 2022 outbreak, monkeypox virus has persisted via human-human transmission, across all major continents and for longer than any previous record. This unprecedented spread creates the potential for the virus to 'spillback' into local susceptible animal populations. Persistent transmission amongst such animals raises the prospect of monkeypox virus becoming enzootic in new regions. However, the full and specific range of potential animal hosts and reservoirs of monkeypox remains unknown, especially in newly at-risk non-endemic areas. Methods: Here, our pipeline utilises ensembles of classifiers comprising different class balancing techniques and incorporating instance weights, to identify which animal species are potentially susceptible to monkeypox virus. Subsequently, we generate spatial distribution maps to highlight high-risk geographic areas at high resolution. Findings: We show that the number of potentially susceptible species is currently underestimated by 2.4 to 4.3-fold. We show a high density of susceptible wild hosts in Europe. We provide lists of these species, and highlight high-risk hosts for spillback and potential long-term reservoirs, which may enable monkeypox virus to become endemic. 
Type Of Technology Software 
Year Produced 2022 
Impact None yet 
URL https://figshare.com/articles/software/Blagrove_et_al_2022_poxvriuses_data_and_code/20485332/2
 
Title Blagrove_et_al_2022_poxvriuses data and code 
Description Data and code for: "Monkeypox virus shows potential to infect a diverse range of native animal species across Europe, indicating high risk of becoming endemic in the region." Authors: Marcus SC Blagrove*1, Jack Pilgrim1, Aurelia Kotsiri1, Melody Hui1, Matthew Baylis1, Maya Wardeh*1,2 * = corresponding authors 1) Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK 2) Department of Mathematical Sciences, University of Liverpool, Liverpool, UK Abstract: Background: Monkeypox is a zoonotic virus which persists in animal reservoirs and periodically spills over into humans, causing outbreaks. During the current 2022 outbreak, monkeypox virus has persisted via human-human transmission, across all major continents and for longer than any previous record. This unprecedented spread creates the potential for the virus to 'spillback' into local susceptible animal populations. Persistent transmission amongst such animals raises the prospect of monkeypox virus becoming enzootic in new regions. However, the full and specific range of potential animal hosts and reservoirs of monkeypox remains unknown, especially in newly at-risk non-endemic areas. Methods: Here, our pipeline utilises ensembles of classifiers comprising different class balancing techniques and incorporating instance weights, to identify which animal species are potentially susceptible to monkeypox virus. Subsequently, we generate spatial distribution maps to highlight high-risk geographic areas at high resolution. Findings: We show that the number of potentially susceptible species is currently underestimated by 2.4 to 4.3-fold. We show a high density of susceptible wild hosts in Europe. We provide lists of these species, and highlight high-risk hosts for spillback and potential long-term reservoirs, which may enable monkeypox virus to become endemic. 
Type Of Technology Software 
Year Produced 2022 
Impact None yet 
URL https://figshare.com/articles/software/Blagrove_et_al_2022_poxvriuses_data_and_code/20485332/1
 
Title Blagrove_et_al_2022_poxvriuses data and code 
Description Data and code for: "Monkeypox virus shows potential to infect a diverse range of native animal species across Europe, indicating high risk of becoming endemic in the region." Authors: Marcus SC Blagrove*1, Jack Pilgrim1, Aurelia Kotsiri1, Melody Hui1, Matthew Baylis1, Maya Wardeh*1,2 * = corresponding authors 1) Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool, Liverpool, UK 2) Department of Mathematical Sciences, University of Liverpool, Liverpool, UK Abstract: Background: Monkeypox is a zoonotic virus which persists in animal reservoirs and periodically spills over into humans, causing outbreaks. During the current 2022 outbreak, monkeypox virus has persisted via human-human transmission, across all major continents and for longer than any previous record. This unprecedented spread creates the potential for the virus to 'spillback' into local susceptible animal populations. Persistent transmission amongst such animals raises the prospect of monkeypox virus becoming enzootic in new regions. However, the full and specific range of potential animal hosts and reservoirs of monkeypox remains unknown, especially in newly at-risk non-endemic areas. Methods: Here, our pipeline utilises ensembles of classifiers comprising different class balancing techniques and incorporating instance weights, to identify which animal species are potentially susceptible to monkeypox virus. Subsequently, we generate spatial distribution maps to highlight high-risk geographic areas at high resolution. Findings: We show that the number of potentially susceptible species is currently underestimated by 2.4 to 4.3-fold. We show a high density of susceptible wild hosts in Europe. We provide lists of these species, and highlight high-risk hosts for spillback and potential long-term reservoirs, which may enable monkeypox virus to become endemic. 
Type Of Technology Software 
Year Produced 2022 
Impact None yet 
URL https://figshare.com/articles/software/Blagrove_et_al_2022_poxvriuses_data_and_code/20485332
 
Description AI used to 'predict the next coronavirus'. 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact BBC Radio 4, Inside science
Year(s) Of Engagement Activity 2021
URL http://www.bbc.co.uk/news/science-environment-56076716?fbclid=IwAR0AIa3il2XTl6QbrIlP7aCovc79To-tjPIa...
 
Description Potential for New Coronaviruses May Be Greater Than Known. 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Media (as a channel to the public)
Results and Impact Newspaper report in NYT
Year(s) Of Engagement Activity 2021
URL https://www.nytimes.com/2021/02/16/science/Covid-reemerging-viruses.html?fbclid=IwAR0AIa3il2XTl6QbrI...
 
Description Quand l'IA part à la chasse au prochain coronavirus chez les mammifères. 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Newspaper article in France24
Year(s) Of Engagement Activity 2021
URL https://www.france24.com/fr/%C3%A9co-tech/20210218-quand-l-ia-part-%C3%A0-la-chasse-au-prochain-coro...
 
Description Virologists use AI to work on next pandemic outbreak. 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Radio interview in New Zealand
Year(s) Of Engagement Activity 2021
URL https://www.rnz.co.nz/national/programmes/first-up/audio/2018783912/virologists-use-ai-to-work-on-ne...