Using antibody next generation sequencing to aid antibody engineering

Lead Research Organisation: University of Oxford
Department Name: Interdisciplinary Bioscience DTP


"Antibodies are proteins produced by immune systems of jawed vertebrates that recognize and eliminate pathogenic molecules from the organism. Their binding malleability arises as a result of their great diversity. The estimated total unique human antibody repertoire diversity lies in the range of 1011. The advent of Next Generation Sequencing (NGS) has made it possible to produce snapshots of this diversity. Here we describe our work with a large NGS dataset provided by UCB Pharma comprising 13.5m heavy and light chains from ~500 individuals. By studying this dataset we aim to establish a set of descriptors that will allow us to formally interrogate the properties of within and across species immune repertoires. Furthermore, UCB Pharma have comprehensive NGS datasets across a number of species, including human, mouse, llama and rabbit. Deeper understanding of inter-species antibody ontogeny would highlight commonalities and differences. Such insight can lead to a definition of what construes a 'human' antibody and better understanding of the mechanics of antibody maturation. Thorough analysis of antibody repertoires can shed light on immune system maturation paving the way for improved vaccine design and drug discovery. Mapping of NGS datasets of immunized subjects to the antibody structure space will interrogate cascades of the antibody ontogeny process and aid to create an antibody maturation model.
BBSRC priorities: Bioscience for Health, Data driven biology, New therapeutic approaches to industrial biotechnology, Systems approaches to the bioscienc"



BB/M011224/1 01/10/2015 30/09/2023
1801486 Studentship BB/M011224/1 01/10/2016 30/09/2020 Aleksandr Kovaltsuk
Description I develop the first database that holds more than billion sequences of antibodies generating using next-generation sequencing technologies. There I cleaned, numbered and annotated antibody sequences with metadata. This database is publicly available
I also develop a tool that identifies and cleans erroneous sequences in antibody next-generation sequencing datasets. The tool is available for download at

In 2020, I have published a research paper in a peer review journal (PLOS Computational Biology). In this paper, we develop a rapid method to structurally characterise immune repertoires in humans. We made this tool open source which is available on github (
Exploitation Route Researchers from both academia and industry actively use the database that I developed.
Sectors Pharmaceuticals and Medical Biotechnology