Dynamic stochastic block modelling for analysing recombination in HIV

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Biological Sciences

Abstract

MRC : Heather Grant : MR/N013166/1

HIV is still a huge burden world-wide, with 1.7 million new infections each year (UNAIDS, 2019). The roll out of anti-retroviral therapies (ART) has worked to reduce the numbers of AIDS related deaths and onward transmissions, but to curb further infections still, UNAIDS goals are that 95% of the population should know their status, 95% of those should be on treatment, and 95% of those should be virally supressed. Characterising drivers of new infections will help to identify gaps to be closed.

Comparing viral sequences from different patients can be used for epidemiological studies. HIV sequence data for the polymerase gene (pol) is routinely collected for drug-resistance testing, but can then be used secondarily for these purposes, once anonymized, keeping only basic demographic information. Genetic distance (that is, the number of mutational differences between any two viruses) can be used to link closely related viruses together. (A lower genetic distance suggests they shared a common ancestor more recently). HIV mutations are introduced into the genome with each replication cycle. Mutation is said to have its own 'clock' so that changes builds up, on average, in a predictable way over time. Therefore, the genetic distance and time of sampling, can be used to draw linkage, infer networks, patterns of transmission, and other characterisations of the network such as degree distribution. These insights tied with demographic information can inform public health policy. For instance, individuals from groups deemed at high-risk might be advised to take pre-exposure prophylaxis (PrEP).

HIV diversity is extremely high, since the virus has been evolving in humans for maybe a hundred years, long before it was first described. It is classified into major lineages (subtypes) that formed early on during its expansion. Where an individual is infected with more than one HIV variant, recombination between the two can occur, creating a hybrid virus, and thus more diversity. This almost certainly happen between two identical viruses from the same infection, but will be undetectable since the new virus is the same as both parents. Where highly divergent viruses recombine, (such as those from different subtypes), this becomes more obvious as there is enough signal to distinguish the two parental viruses.

This process of recombination between divergent viruses breaks apart linkages, where one half of the genome might link to the first parental virus, and the other half to the second. Now, if the whole sequence was to be considered in a linkage analysis, no connections would be made as the new sequence is now sufficiently different to both parents. As HIV moves along the transmission network, it will occasionally find itself part of a dual infection, and may take part in a recombination event. This could happen at any time point in time, making it more difficult to spot, as other mutations build up, and the molecular clock moves the virus forward.

Dynamic Stochastic Block Modelling is a way of modelling network data, and in our case will be used to find groups or communities of similar viruses over time. This approach will better classify HIV diversity and model networks over time; highly appropriate for a fast-evolving recombinogenic virus. Simulation experiments will be carried out to test the principle and validate the approach. Finally, we will apply this to near-full genome HIV data from Uganda. This research will be undertaken under the supervision of Associate Professor Art Poon in the Department of Pathology and Laboratory Medicine at Western University, Ontario, Canada.

Publications

10 25 50