Scaling up simulation based inference to whole genome data
Lead Research Organisation:
University of Bristol
Department Name: Mathematics
Abstract
This research explores the use of simulation-based inference (SBI) methods to infer parameters for genetic data. SBI techniques circumvent the challenge of intractable likelihoods by using a simulator that generates data given parameter values. This is particularly useful in population genetics, where the likelihood is often intractable.
Recent advances in SBI leverage neural networks, such as neural likelihood estimation (NLE), to improve efficiency and quality of inference. However, applying these methods to high-dimensional genetic sequences remains computationally expensive. This project seeks to improve efficiency by incorporating composite likelihoods, an approach that divides genetic sequences into smaller, equal-sized batches. Instead of training a neural network on entire sequences, the model estimates likelihoods for each batch then combines them to approximate the full likelihood. Through using composite likelihoods, this research aims to advance the scalability of neural network-based inference in genetics, making the task of inference more computationally feasible.
By enabling more efficient inference, this method will contribute to a better understanding of population history, including migration patterns, population growth rates and evolutionary relationships between populations. Such insights are valuable in fields like evolutionary biology, conservation genetics and medical genetics, where they inform research on human history, species conservation and genetic diseases.
Recent advances in SBI leverage neural networks, such as neural likelihood estimation (NLE), to improve efficiency and quality of inference. However, applying these methods to high-dimensional genetic sequences remains computationally expensive. This project seeks to improve efficiency by incorporating composite likelihoods, an approach that divides genetic sequences into smaller, equal-sized batches. Instead of training a neural network on entire sequences, the model estimates likelihoods for each batch then combines them to approximate the full likelihood. Through using composite likelihoods, this research aims to advance the scalability of neural network-based inference in genetics, making the task of inference more computationally feasible.
By enabling more efficient inference, this method will contribute to a better understanding of population history, including migration patterns, population growth rates and evolutionary relationships between populations. Such insights are valuable in fields like evolutionary biology, conservation genetics and medical genetics, where they inform research on human history, species conservation and genetic diseases.
Organisations
People |
ORCID iD |
| Grace Yan (Student) |
Studentship Projects
| Project Reference | Relationship | Related To | Start | End | Student Name |
|---|---|---|---|---|---|
| EP/S023569/1 | 31/03/2019 | 29/09/2027 | |||
| 2879419 | Studentship | EP/S023569/1 | 30/09/2023 | 29/09/2027 | Grace Yan |