📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

PopSeqle: Software for Population Sequence data to Lower Errors

Lead Research Organisation: Earlham Institute
Department Name: UNLISTED

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

PopSeqle is a fast, user-friendly software tool to perform quality control (QC) checks of population-based sequence data. The PopSeqle project aims to develop software (PopSeqle) that can perform these checks on multiple individuals and populations using a population genetic framework to identify potentially irregularities in NGS genome/transcriptome assemblies. The software is being developed in the new programming language 'Julia', and aims to quantify the signal present in multiple sequence alignments caused by evolutionary forces and separate this from the signal caused by sequencing errors.

PopSeqle works by identifying errors using 'wavelet transform analyses' to locate peaks and valleys in a signal of population genetic summary statistics across the sequence space. Sliding-window approaches are used to help identify and visualise any outlying regions or regions of interest in the data. Other key developments are the ability to produce publication-ready graphics within the software interface, and the flexibility to accept and output different sequence data formats. As part of ensuing that the software is accessible to all users, an intuitive graphic user interface (GUI) is also being developed to allow for interactive analysis of data and outputs.

Simulated and empirical datasets will be used to evaluate, improve and customise the software algorithms, followed by extensive user testing by research groups on-site at the Norwich Research Park. PopSeqle will then be introduced to the wider research community through training workshops, webinars, and a training video which will be made available along with a manual and associated publications. The software itself will be hosted by the Earlham Institute and will be freely available and open-source.

Planned Impact

unavailable

Publications

10 25 50