PopSeqle: Software for Population Sequence data to Lower Errors

Lead Research Organisation: Earlham Institute
Department Name: UNLISTED

Abstract

PopSeqle is a fast, user-friendly software tool to perform quality control (QC) checks of population-based sequence data. The PopSeqle project aims to develop software (PopSeqle) that can perform these checks on multiple individuals and populations using a population genetic framework to identify potentially irregularities in NGS genome/transcriptome assemblies. The software is being developed in the new programming language 'Julia', and aims to quantify the signal present in multiple sequence alignments caused by evolutionary forces and separate this from the signal caused by sequencing errors.

PopSeqle works by identifying errors using 'wavelet transform analyses' to locate peaks and valleys in a signal of population genetic summary statistics across the sequence space. Sliding-window approaches are used to help identify and visualise any outlying regions or regions of interest in the data. Other key developments are the ability to produce publication-ready graphics within the software interface, and the flexibility to accept and output different sequence data formats. As part of ensuing that the software is accessible to all users, an intuitive graphic user interface (GUI) is also being developed to allow for interactive analysis of data and outputs.

Simulated and empirical datasets will be used to evaluate, improve and customise the software algorithms, followed by extensive user testing by research groups on-site at the Norwich Research Park. PopSeqle will then be introduced to the wider research community through training workshops, webinars, and a training video which will be made available along with a manual and associated publications. The software itself will be hosted by the Earlham Institute and will be freely available and open-source.

Publications

10 25 50