THE FLYDATA PROJECT: Decision Support and Semantic Organization of Laboratory Data in Drosophila Gene Expression Experiments (Revised)

Lead Research Organisation: University of Oxford
Department Name: Zoology

Abstract

In biology, images are increasingly important. Advances in microscopy now allow us to see when and where individual genes are active in tissues. In Oxford, we are undertaking experiments to investigate the expression of more than a thousand genes in the testis of the fruit fly Drosophila, one of the 'model' organisms whose genome sequence has been determined, in order to discover their role in sperm formation. For this we use a technique known as in situ hybridization, that permits us to image the location at which each selected gene is expressed along the sperm differentiation pathway. As they become available, our results will be published in a publicly accessible database, the Drosophila Testis Gene Expression Database, so others can use them. Because many of these genes are similar to those in humans, such work could have importance for the clinical treatment of male infertility. Because we wish to study genes critical for normal sperm formation, we first measure the degree to which expression of each of the ~14,000 Drosophila genes is altered in mutant fly strains with abnormal sperm development, and relate these data with on-line information from FlyBase, a public database of genetic information about Drosophila. Only then can we compare the properties of all these genes and group them into functional classes from which to select representative individuals for in situ imaging. Not surprisingly, we find it impossible to hold all this information in our heads at one time, making these decisions difficult. Just keeping track of all the information is a complex and time-consuming task. The purpose of the FlyData Project is to develop a simple but powerful computer-based information management and decision support system that will help us (a) to organise all the laboratory data arising from our Drosophila gene expression experiments in meaningful ways, (b) to relate them to on-line information, and (c) to obtain different views into this multi-dimensional information space using a standard web browser, by means of a set of carefully-designed graphical user interfaces, giving us just the right combinations of information about one gene, or a group of genes, that we need to support our decision-making processes. The FlyData system will thus help us navigate the mass of data that confronts us. Further, it will record our research decisions, who made them, and when and why they were made, thus creating a complete record of our research 'journey'. By enabling us to annotate the raw data, it will both assist our subsequent writing of reports and scientific papers, and will also enable us to automate the publication of our results to the Drosophila Testis Gene Expression Database. As a consequence of the close interdisciplinary collaboration between biologists and computer scientists, the FlyData system we develop will exactly meet our research needs. It will use open standards to export original scientific observations and descriptive annotations in a computer-processable form, providing a foundation for exchange with other information management systems worldwide. We will use freely available, lightweight software tools and agile software development methods to build and test this system, which will be focused on immediate needs yet readily adapted to handle evolving requirements, such as the need to include an additional type of information. We hope that, with further support, we will be able to enhance the FlyData system into a general-purpose, community-supported laboratory information management and decision support system for biological microscopy research.

Technical Summary

1 The FlyData information management system will provide lab-based decision support and semantic organization of data arising from our Drosophila gene expression experiments, using small lightweight software components, loosely coupled in the Representational State Transfer (REST) style. 2 The data structures and their inter-relationships will be described by a Drosophila Data Ontology written in OWL, that will be developed in consultation with the FlyData and FlyMine communities. 3 Agile software development will be undertaken using Python with Turbogears or Ruby on Rails to create a Model/View/Controller framework that decouples data access from business logic, and from data presentation and user interactions. This will be enhanced by the use of AJAX, enabling rapid development of customized links between web front ends and the underlying database server. 4 The system will have the ability automatically to harvest data and metadata from lab equipment, and to use gene identifiers to interrogate external databases (Affimetrix and FlyBase) to return relevant gene-specific data. 5 Hand-crafted user interfaces accessible via standard web browsers will enable biological researchers to input additional data, annotations and provenance information, to query the stored data along different semantic dimensions, and to inspect all the information relating to a particular gene in a single integrated tabbed interface. 6 All data objects will initially be stored in native format, uniquely identified using Life Science Identifiers. 7 The FlyData system will export data objects as RDF when required, and will permit queries using SPARQL. This will enable data sharing with colleagues elsewhere, and will facilitate the automated population of the public-facing Drosophila Testis Gene Expression Database.

Publications

10 25 50