Open Science needs Open Data: Automated semantic description of biological data through Machine Learning

Lead Research Organisation: University of East Anglia
Department Name: Graduate Office

Abstract

Biological research is undergoing a data revolution, where huge amounts of data are being generated every day. This is happening alongside increasing demands from funders and publishers to make these data available. Making data Findable, Accessible, Interoperable, and Reusable (FAIR) requires a great deal of a researcher's effort to ensure that their data is described well enough so that others can search for and reuse it. Although more and more metadata standards are being produced to help researchers describe their data, the use of these standards is not widespread or easy. Biological data needs to be liberated.
This project will give the applicant an exciting opportunity to address the problem of badlydescribed life science datasets by developing new machine learning algorithms to automate the annotation of life science data with ontology terms. This project aims to develop modern computational methods to help biologists to better describe their data. This will hopefully improve the quality of data, allowing other researchers to access FAIR data more easily, and with a greater amount of metadata that can power meaningful data integration.
A wide variety of scientific approaches and methodologies to Machine Learning, data management, ontologies, and community software development will be learned.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
BB/M011216/1 01/10/2015 31/03/2024
2243628 Studentship BB/M011216/1 01/10/2019 30/09/2023 Lorcan Pigott-Dix
 
Description A multi-domain ontology term recognition tool has been created, that can assist with data annotation.
Exploitation Route The tool could be used to assist data stewards to scale their data curation efforts.
Sectors Chemicals,Environment,Healthcare,Pharmaceuticals and Medical Biotechnology,Other

 
Description Sponsorship for Student Symposium 
Organisation PCR Biosystems
Country United Kingdom 
Sector Private 
PI Contribution Organised the inaugural Earlham Institute student symposium.
Collaborator Contribution Provided money towards cost of catering for the symposium
Impact Further funding Engagement activities
Start Year 2022
 
Title lorcanpd/adorNER: First release 
Description adorNER is a deep-learning-based tool for ontology term identification in text. Takes multiple OBO ontologies as training inputs, and outputs a model for identifying terms. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact A pipeline for creating a multi-ontology neural dictionary for identifying ontology terms in text 
URL https://zenodo.org/record/7373101
 
Description 30 minute presentation @ SWAT4HCLS 2023 - Basel 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact I gave a 30 minute presentation, to academics and industry, on the research I had undertaken to develop a multi-domain ontology concept recognition tool.
Year(s) Of Engagement Activity 2023
URL https://www.swat4ls.org/workshops/basel2023/scientific-programme-2023/