Gene coexpression network for the study of Rhizobium leguminosarum

Lead Research Organisation: University of Oxford
Department Name: Statistics

Abstract

Description and impact:
Rhizobium leguminosarum is a bacterium that associated with legumes fixes atmospheric Nitrogen. The ammonia salts produced by the bacteria are consumed by the plants they are associated with.
Knowing this process and all the genes and proteins involved is fundamental in order to improve
crops' growth. This improvement will lead to sustainable agriculture and to reduce fertilizer inputs
as well as CO2 and N2O emissions.
Aims and objectives:
The objective of the project is to carry out a statistical and network analysis of cutting-edge gene
expression and global mutational data. The network analysis will enable genes needed for superior symbiotic performance to be identified and introduced into commercial inocula. These inocula are used both in the UK and around the world, enabling substantial yield gains and contributing to sustainable agriculture with reduced fertilizer inputs as well as reduced CO2 and N2O emissions.
Novelty of the research methodology
To create the gene co-expression network of the bacteria, we are using gene coexpression data
(microarrays) from the bacteria under different growth conditions. This data is extremely rich and noise. In order to get as much precise information as possible from the original data, we have employed different data-preprocessing techniques. Some examples are the use of different normalization procedures (eg. quantile normalization) and the removal of the lowest expressed values from each microarray. In addition, we also studied the effect of excluding from the analysis different "genes" such as pseudogenes and genes that were not included across all the microarrays.
The main idea of the analysis is calculating the correlation between the expression of each pair of studied genes. We then imposed a threshold to select only the strongest relationships. We employed different well-known correlation measures as Pearson, Kendall and Spearman correlation. To select which method works the best, we use Monte Carlo-based methods. We use biological information available in databases as KEGG, BioCyc, OperonDB and STRING to select biological-related groups of genes. We compute the number of edges between each group and we compare that value
the result of taking random genes from the network.
Lastly, to identify interesting groups of genes, we perform community detection on the network (Louvain Method - Configuration model). We have been able to detect clusters of genes involved in the same biological process (eg. metabolic pathways). The long-term objective is use to use this approach to find in the network those genes that are related with the nitrogen fixation process.
Alignment to EPSRC's strategies and research areas:
- Living With Environmental Change (LWEC): By using Rhizobium inocula as fertilizers, the fertilizer inputs, as well as CO2 and N2O emissions, would be reduced. That would be useful in order to deal with environmental change.
- Mathematical sciences: In order to develop the interaction network, we are using and developing different mathematical and statistical tools that would be useful to perform equivalent studies.
Companies or collaborators involved:
Nottingham based Legume Technology Ltd, Units 3C & 3D, Eastbridgford Business Park, Kneeton Rd, Eastbridgford, Notts, NG13 8PJ. Legume Technology Ltd will provide advice from the beginning of the project onwards

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/R512333/1 01/10/2017 30/09/2021
1950255 Studentship EP/R512333/1 01/10/2017 30/09/2021 Javier Pardo Diaz