Open Science needs Open Data: Automated semantic description of biological data through Machine Learning
Lead Research Organisation:
University of East Anglia
Department Name: Graduate Office
Abstract
Biological research is undergoing a data revolution, where huge amounts of data are being generated every day. This is happening alongside increasing demands from funders and publishers to make these data available. Making data Findable, Accessible, Interoperable, and Reusable (FAIR) requires a great deal of a researcher's effort to ensure that their data is described well enough so that others can search for and reuse it. Although more and more metadata standards are being produced to help researchers describe their data, the use of these standards is not widespread or easy. Biological data needs to be liberated.
This project will give the applicant an exciting opportunity to address the problem of badlydescribed life science datasets by developing new machine learning algorithms to automate the annotation of life science data with ontology terms. This project aims to develop modern computational methods to help biologists to better describe their data. This will hopefully improve the quality of data, allowing other researchers to access FAIR data more easily, and with a greater amount of metadata that can power meaningful data integration.
A wide variety of scientific approaches and methodologies to Machine Learning, data management, ontologies, and community software development will be learned.
This project will give the applicant an exciting opportunity to address the problem of badlydescribed life science datasets by developing new machine learning algorithms to automate the annotation of life science data with ontology terms. This project aims to develop modern computational methods to help biologists to better describe their data. This will hopefully improve the quality of data, allowing other researchers to access FAIR data more easily, and with a greater amount of metadata that can power meaningful data integration.
A wide variety of scientific approaches and methodologies to Machine Learning, data management, ontologies, and community software development will be learned.
People |
ORCID iD |
Robert Davey (Primary Supervisor) | |
Lorcan Pigott-Dix (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
BB/M011216/1 | 01/10/2015 | 31/03/2024 | |||
2243628 | Studentship | BB/M011216/1 | 01/10/2019 | 30/09/2023 | Lorcan Pigott-Dix |
Description | A multi-domain ontology term recognition tool has been created, that can assist with data annotation. |
Exploitation Route | The tool could be used to assist data stewards to scale their data curation efforts. |
Sectors | Chemicals,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology,Other |
Description | Sponsorship for Student Symposium |
Organisation | PCR Biosystems |
Country | United Kingdom |
Sector | Private |
PI Contribution | Organised the inaugural Earlham Institute student symposium. |
Collaborator Contribution | Provided money towards cost of catering for the symposium |
Impact | Further funding Engagement activities |
Start Year | 2022 |
Title | lorcanpd/adorNER: First release |
Description | adorNER is a deep-learning-based tool for ontology term identification in text. Takes multiple OBO ontologies as training inputs, and outputs a model for identifying terms. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | A pipeline for creating a multi-ontology neural dictionary for identifying ontology terms in text |
URL | https://zenodo.org/record/7373101 |
Description | 30 minute presentation @ SWAT4HCLS 2023 - Basel |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | I gave a 30 minute presentation, to academics and industry, on the research I had undertaken to develop a multi-domain ontology concept recognition tool. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.swat4ls.org/workshops/basel2023/scientific-programme-2023/ |