Gene coexpression network for the study of Rhizobium leguminosarum
Lead Research Organisation:
University of Oxford
Department Name: Statistics
Abstract
Description and impact:
Rhizobium leguminosarum is a bacterium that associated with legumes fixes atmospheric Nitrogen. The ammonia salts produced by the bacteria are consumed by the plants they are associated with.
Knowing this process and all the genes and proteins involved is fundamental in order to improve
crops' growth. This improvement will lead to sustainable agriculture and to reduce fertilizer inputs
as well as CO2 and N2O emissions.
Aims and objectives:
The objective of the project is to carry out a statistical and network analysis of cutting-edge gene
expression and global mutational data. The network analysis will enable genes needed for superior symbiotic performance to be identified and introduced into commercial inocula. These inocula are used both in the UK and around the world, enabling substantial yield gains and contributing to sustainable agriculture with reduced fertilizer inputs as well as reduced CO2 and N2O emissions.
Novelty of the research methodology
To create the gene co-expression network of the bacteria, we are using gene coexpression data
(microarrays) from the bacteria under different growth conditions. This data is extremely rich and noise. In order to get as much precise information as possible from the original data, we have employed different data-preprocessing techniques. Some examples are the use of different normalization procedures (eg. quantile normalization) and the removal of the lowest expressed values from each microarray. In addition, we also studied the effect of excluding from the analysis different "genes" such as pseudogenes and genes that were not included across all the microarrays.
The main idea of the analysis is calculating the correlation between the expression of each pair of studied genes. We then imposed a threshold to select only the strongest relationships. We employed different well-known correlation measures as Pearson, Kendall and Spearman correlation. To select which method works the best, we use Monte Carlo-based methods. We use biological information available in databases as KEGG, BioCyc, OperonDB and STRING to select biological-related groups of genes. We compute the number of edges between each group and we compare that value
the result of taking random genes from the network.
Lastly, to identify interesting groups of genes, we perform community detection on the network (Louvain Method - Configuration model). We have been able to detect clusters of genes involved in the same biological process (eg. metabolic pathways). The long-term objective is use to use this approach to find in the network those genes that are related with the nitrogen fixation process.
Alignment to EPSRC's strategies and research areas:
- Living With Environmental Change (LWEC): By using Rhizobium inocula as fertilizers, the fertilizer inputs, as well as CO2 and N2O emissions, would be reduced. That would be useful in order to deal with environmental change.
- Mathematical sciences: In order to develop the interaction network, we are using and developing different mathematical and statistical tools that would be useful to perform equivalent studies.
Companies or collaborators involved:
Nottingham based Legume Technology Ltd, Units 3C & 3D, Eastbridgford Business Park, Kneeton Rd, Eastbridgford, Notts, NG13 8PJ. Legume Technology Ltd will provide advice from the beginning of the project onwards
Rhizobium leguminosarum is a bacterium that associated with legumes fixes atmospheric Nitrogen. The ammonia salts produced by the bacteria are consumed by the plants they are associated with.
Knowing this process and all the genes and proteins involved is fundamental in order to improve
crops' growth. This improvement will lead to sustainable agriculture and to reduce fertilizer inputs
as well as CO2 and N2O emissions.
Aims and objectives:
The objective of the project is to carry out a statistical and network analysis of cutting-edge gene
expression and global mutational data. The network analysis will enable genes needed for superior symbiotic performance to be identified and introduced into commercial inocula. These inocula are used both in the UK and around the world, enabling substantial yield gains and contributing to sustainable agriculture with reduced fertilizer inputs as well as reduced CO2 and N2O emissions.
Novelty of the research methodology
To create the gene co-expression network of the bacteria, we are using gene coexpression data
(microarrays) from the bacteria under different growth conditions. This data is extremely rich and noise. In order to get as much precise information as possible from the original data, we have employed different data-preprocessing techniques. Some examples are the use of different normalization procedures (eg. quantile normalization) and the removal of the lowest expressed values from each microarray. In addition, we also studied the effect of excluding from the analysis different "genes" such as pseudogenes and genes that were not included across all the microarrays.
The main idea of the analysis is calculating the correlation between the expression of each pair of studied genes. We then imposed a threshold to select only the strongest relationships. We employed different well-known correlation measures as Pearson, Kendall and Spearman correlation. To select which method works the best, we use Monte Carlo-based methods. We use biological information available in databases as KEGG, BioCyc, OperonDB and STRING to select biological-related groups of genes. We compute the number of edges between each group and we compare that value
the result of taking random genes from the network.
Lastly, to identify interesting groups of genes, we perform community detection on the network (Louvain Method - Configuration model). We have been able to detect clusters of genes involved in the same biological process (eg. metabolic pathways). The long-term objective is use to use this approach to find in the network those genes that are related with the nitrogen fixation process.
Alignment to EPSRC's strategies and research areas:
- Living With Environmental Change (LWEC): By using Rhizobium inocula as fertilizers, the fertilizer inputs, as well as CO2 and N2O emissions, would be reduced. That would be useful in order to deal with environmental change.
- Mathematical sciences: In order to develop the interaction network, we are using and developing different mathematical and statistical tools that would be useful to perform equivalent studies.
Companies or collaborators involved:
Nottingham based Legume Technology Ltd, Units 3C & 3D, Eastbridgford Business Park, Kneeton Rd, Eastbridgford, Notts, NG13 8PJ. Legume Technology Ltd will provide advice from the beginning of the project onwards
People |
ORCID iD |
G Reinert (Primary Supervisor) | |
Javier Pardo Diaz (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/R512333/1 | 01/10/2017 | 30/09/2021 | |||
1950255 | Studentship | EP/R512333/1 | 01/10/2017 | 30/09/2021 | Javier Pardo Diaz |
Description | We are aiming to generate a gene coexpression network for Rhizobium leguminosarum. In this network, the nodes are the genes and they are connected if they are coexpressed across the different samples we have in our input. The input is a collection of microarrays which measure the expression of each gene. The state of the art method to generate gene coexpression networks is based on the use of the Pearson correlation. We have found out that distance correlation retrieves better results than Pearson correlation when generating gene coexpression networks. The networks based on distance correlation are more stable and capture more biological information. We have constructed a gene coexpression network for Rhizobium leguminosarum using distance correlation. Using this network we have been able to identify groups of genes that are involved in the same molecular processes. |
Exploitation Route | We are preparing an R package which allows other users to generate their own gene coexpression networks using our methodology. We are also planning to submit the current results and the code used so that they are open to the scientific community. |
Sectors | Agriculture, Food and Drink,Digital/Communication/Information Technologies (including Software),Environment,Healthcare,Pharmaceuticals and Medical Biotechnology |
Description | 21st Congress on Nitrogen Fixation - 10th-15th Oct 2019, Wuhan, China - Philip Poole |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Phil gave a talk at this international conference. He had many questions on his work and spent time exchanging ideas with colleagues in this research area. |
Year(s) Of Engagement Activity | 2019 |
URL | http://2019icnf.csp.escience.cn/dct/page/65580 |
Description | ComplexNetworks 2019 Conference Poster presentation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Poster presentation at the 8th conference on Complex Networks and their Applications |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.complexnetworks.org/ |
Description | ISMB/ECCB 2019 Conference Poster presentation and short talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation of a poster and a short talk at the ISMB/ECCB 2019 Conference |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.iscb.org/ismbeccb2019 |
Description | International COSTNET18 Conference Poster presentation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Study participants or study members |
Results and Impact | Presentation of a poster in the COSTNET18 Conference |
Year(s) Of Engagement Activity | 2018 |
URL | http://costnet18.wzim.sggw.pl/ |
Description | International COSTNET19 Conference Poster presentation and short talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Poster presentation and short talk given at the COSTNET19 Conference |
Year(s) Of Engagement Activity | 2019 |
URL | https://costnetbilbao.wordpress.com/ |
Description | Keble College Graduate Discussion Evening |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | Presentation about my research to members of Keble College |
Year(s) Of Engagement Activity | 2020 |
Description | Oxford Networks Seminar talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | Talk at the Oxford Networks Seminar |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.maths.ox.ac.uk/groups/networks/networks-seminar |
Description | VII International Symposium SRUK/CERU Poster presentation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Poster presentation at the VII International Symposium SRUK/CERU |
Year(s) Of Engagement Activity | 2019 |
URL | https://sruk2019.com/ |