PopSeqle: Software for Population Sequence data to Lower Errors

Lead Research Organisation: Earlham Institute

Department Name: UNLISTED

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Technical Summary

PopSeqle is a fast, user-friendly software tool to perform quality control (QC) checks of population-based sequence data. The PopSeqle project aims to develop software (PopSeqle) that can perform these checks on multiple individuals and populations using a population genetic framework to identify potentially irregularities in NGS genome/transcriptome assemblies. The software is being developed in the new programming language 'Julia', and aims to quantify the signal present in multiple sequence alignments caused by evolutionary forces and separate this from the signal caused by sequencing errors.

PopSeqle works by identifying errors using 'wavelet transform analyses' to locate peaks and valleys in a signal of population genetic summary statistics across the sequence space. Sliding-window approaches are used to help identify and visualise any outlying regions or regions of interest in the data. Other key developments are the ability to produce publication-ready graphics within the software interface, and the flexibility to accept and output different sequence data formats. As part of ensuing that the software is accessible to all users, an intuitive graphic user interface (GUI) is also being developed to allow for interactive analysis of data and outputs.

Simulated and empirical datasets will be used to evaluate, improve and customise the software algorithms, followed by extensive user testing by research groups on-site at the Norwich Research Park. PopSeqle will then be introduced to the wider research community through training workshops, webinars, and a training video which will be made available along with a manual and associated publications. The software itself will be hosted by the Earlham Institute and will be freely available and open-source.

Planned Impact

unavailable

Funded Value:

£6,277

Funded Period:

Sep 16 - Mar 17

Funder:

BBSRC

Project Status:

Closed

Project Category:

Institute Project

Project Reference:

BBS/E/T/000GP098

Principal Investigator:

Federica Di Palma

Research Topic:

Unclassified

Organisations

Earlham Institute (Lead Research Organisation)

People	ORCID iD
Federica Di Palma (Principal Investigator)
Graham Etherington (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50