Establishment of International Plant and Insect Pathogen Sequence Database (IPIPSD) Using Existing Deep Sequencing Data

Lead Research Organisation: University of Oxford
Department Name: Zoology


This is an international collaboration project aiming to exploit existing datasets using new bioinformatics tools of pathogen screening and identification. The existing datasets are newly generated by two area-leading international consortia in the 1000 Plant (1KP) transcriptome and 1000 Insect Transcriptome Evolution (1KITE) projects. Ongoing bioinformatics analyses are co-ordinate by the China National Genebank (CNGB). The newly available technology tool for pathogen screening is established in a previous NERC funded project to the PI in the Technology Proof of Concept programme. The new opportunity is based on the mutual benefits for conducting a pathogen screening study using the two large sequencing datasets and the new bioinformatics pipeline. The objective is to set up an International Plant and Insect Pathogen Database (IPIPSD) that can archive, sort and represent pathogen sequences from datasets generated by the Next Generation Sequencing (NGS) technology. The 1KP and 1KITE projects were designed to make gene and gene function discoveries in non-model species living in a wide variety of environmental/ecological niches. The shoot gun NGS sequencing strategy non-discriminately produced sequences derived from plants/insects as well as from pathogens naturally infected them. We propose to search, harvest, annotate and sort pathogen sequences from the 1KP and 1KITE libraries. IPIPSD will be largely composed of non-model species thus is unique to the other existing databases. The database will be published in peer reviewed scientific journals and made freely accessible to the public via the World Wide Web. To publicise IPIPSD and to promote the pathogen screening technology, an International Workshop of Pathogen Screening is proposed in the 9th International Conference of Genomics in November 2014. Once set up, IPIPSD will facilitate pathogen screening in environmental samples, providing a much needed knowledge advance for supporting environmental/ecological studies. Furthermore, IPIPSD will serve as a platform for large long term international collaborations, e.g., the Genome 10000 (Genome 10K) and 5000 Insect and Arthropod Genome Sequencing Initiative (i5K) which are also hosted in CNGB.

Planned Impact

The specific users are the academic community including pathologist, virologists, microbiologists, plant scientists, entomologists, biologists, bioinformaticians, ecologists, and environmental scientists. This project is to set up an International Plant and Insect Pathogen Sequence Database (IPIPSD) that facilitate pathogen screens in plants and insects. The database will be based on existing datasets produced by two current international projects: the 1000 Plant (1KP) transcriptome and the 1000 Insect Transcriptome Evolution (1KITE) projects. The vast majority of species used in these two projects are non-model species living in the natural conditions. Thus the IPIPSD will particularly have application potentials in environmental and ecological research. In the long term, IPIPSD will assist ecologists to assess plant and insect anti-pathogen immunities in the natural conditions. To biologists, a "bigger" view of pathogen profiles in the experimental systems offers the opportunity to extend hypotheses testing in natural conditions. To environmental scientists, environment change has impacts on emerging disease issues not only to humans but also to the plant and animal communities that support the global ecosystem. Bioinformatics becomes increasingly important in all aspects in biological sciences. We will collaborate with the strong bioinformatics team in the China National Genebank (CNGB), China, to set up IPIPSD and to make it freely available to the worldwide research communities via the World Wide Web. IPIPSD will be published in peer reviewed scientific journals. Before submission of IPIPSD for publication, an International Workshop on Pathogen Screening will be organised in the 9th International Conference on Genomics to popularise IPIPSD and to obtain comments and feedbacks from experts as well as wider users.

Non-academic users who will benefit from this project in the long term are: agricultural and horticultural managers, farmers, and conservation managers. Policy makers and the general public will also benefit from the scientific problems/questions solved by using the IPIPSD and associated bioinformatics tools. We expect to discover and record previously unknown pathogen prevalence in the species used in 1KP and 1KITE projects. We do not expect that all these pathogens pose immediate threats to the environment and human health, but we would hypothesize that the balance between infections and their host immunities is important to maintain the environment health and wealth. All data generated in this project will be made to the NERC Environment Information Data Centre (EIDC) for releasing to the general public and for reuse.
We are aware that there may be commercial opportunities when IPIPSD is completed, particularly in the area of pathogen survey in commercial goods that include live/raw plant and animal materials. The investigators have worked together with the NERC technology transfer team and will work with the team again if opportunities develop.
The proposed IPIPSD offers an excellent opportunity to contribute the UK science excellence to the international research community. It also enhances the capability of screening pathoegns in live/raw plant/food/insect exports/imports, potentially increases the nation's health, wealth, and economic competitiveness.
Description We developed a virus screening pipeline for searching virus signals in the existing datasets generated by the next generation sequencing technology. By using the pipeline, many unexpected virus infections were detected in experimental samples and some of the findings have been published in peer reviewed scientific journals.
Exploitation Route We have developed a web-based system which is hosted by the China National Gene Bank. The website is
Sectors Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Pharmaceuticals and Medical Biotechnology

Description Advisor for virus screening
Geographic Reach Local/Municipal/Regional 
Policy Influence Type Influenced training of practitioners or researchers
Title Virus screening 
Description A bioinformatics pipeline was developed for screening virus sequences from the next generation sequencing datasets 
Type Of Material Technology assay or reagent 
Year Produced 2015 
Provided To Others? Yes  
Impact This method reduced the computering resources required for screening viruses in deep sequencing datasets. 
Title Virus database 
Description The database provides information of viruses identified in the 1000 plant transcription project. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact The database will be included in future publications. 
Description Virus screening in China National Gene Bank 
Organisation Beijing Genomics Institute
Country China 
Sector Academic/University 
PI Contribution I designed and supervised the virus screening activities carried out by the collaborator, China National Gene Bank.
Collaborator Contribution The China National Gene Bank provided datasets and facilities.
Impact The establishment of virus screening pipeline enabled the Chinese National Gene Bank to screen virus infections in its plant and insect datasets.
Start Year 2014
Description Virus screening workshop (Shenzhen) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Five international leading researchers gave talks for virus screening and anti-viral strategies in the virus workshop in the 9th International Conference of Genomics, 09-12 Sept 2014, Shenzhen, China. The workshop attracted 40-70 academic and industrial researchers.
Year(s) Of Engagement Activity 2014