ConCur: Knowledge Base Construction and Curation
Lead Research Organisation:
University of Oxford
Department Name: Computer Science
Abstract
Knowledge graphs are graph-structured knowledge resources which are often expressed as triples such as ("UK", "hasCapital", "London") and ("London", "instanceOf", "City"). As well as such basic "facts", knowledge graphs often include structural knowledge about the domain, typically based on a hierarchy of entity types (AKA classes or concepts); e.g., ("City", "subClassOf", "HumanSettlement"). A knowledge graph that consist largely or wholly of structural knowledge is often called an ontology.
Some knowledge graphs are general purpose, such as Wikidata and the Google knowledge graph, while others are developed for specific domains such as medicine. They are rapidly gaining in importance and are playing a key role in many applications. For example, Google uses its knowledge graph for search, question answering and Google Assistant, while Amazon and Apple also use knowledge graphs to power their personal assistants Alexa and Siri, respectively. Knowledge graphs are widely used in the domain of health and wellbeing, e.g., for organising and exchanging information and to power clinical artificial intelligence (AI). One example is FoodOn, an ontology representing food knowledge such as fine-grained food product categorization, nutrition and allergens, as well as related activities such as agriculture.
Knowledge graph construction and maintenance is, however, very challenging, and may require a considerable amount of human effort. Notwithstanding the high cost of knowledge creation, knowledge graphs are often still biased, incomplete or too coarse-grained. Take HeLis, an ontology for health and lifestyle, as an example. Its food knowledge is quite simple and often represents many different variants with a single entity (e.g., "Banana" for all kinds and derivatives of bananas), and its knowledge of health is highly incomplete when compared with dedicated biomedical ontologies. In addition, it is hard to avoid errors such as incorrect facts and categorisations in knowledge graphs; e.g., FoodOn categorises soy milk as a kind of milk, but not as a kind of soy product. Such errors may be inherited from the information source or be caused by the construction procedure. These issues significantly impact the usefulness of knowledge graphs and the reliability of the systems that use them; e.g., the categorisation of soy milk could be dangerous if the knowledge graph were used in a food allergen alert system.
Therefore, effective knowledge graph construction and curation is urgently required and will play a critical role in exploiting the full value of knowledge graphs. As there are now many available knowledge resources, one possible approach is to use multiple sources to address both coverage and quality issues, e.g., via integration and cross-checking. For example, integrating HeLis with FoodOn would combine fine-grained categorization of food products (including bananas) with lifestyle knowledge. Moreover, cross-checking FoodOn with HeLis will reveal the problem with soy milk, which is correctly categorized as a soy product in HeLis. Automating the integration of knowledge resources is challenging, but combining semantic and learning-based techniques seems to be a very promising approach, and we have already obtained some encouraging preliminary results in this direction.
The proposed research will therefore study a range of semantic and machine learning techniques, and how to combine them to support knowledge graph construction and curation. As well as its application to knowledge graph construction and curation, this research will also contribute to the development of new neural-symbolic theories, paradigms and methods, such as deep semantic embedding for learning representations for expressive knowledge, and knowledge-guided learning for addressing sample shortage problems. These techniques promise to revolutionize many AI and big data technologies.
Some knowledge graphs are general purpose, such as Wikidata and the Google knowledge graph, while others are developed for specific domains such as medicine. They are rapidly gaining in importance and are playing a key role in many applications. For example, Google uses its knowledge graph for search, question answering and Google Assistant, while Amazon and Apple also use knowledge graphs to power their personal assistants Alexa and Siri, respectively. Knowledge graphs are widely used in the domain of health and wellbeing, e.g., for organising and exchanging information and to power clinical artificial intelligence (AI). One example is FoodOn, an ontology representing food knowledge such as fine-grained food product categorization, nutrition and allergens, as well as related activities such as agriculture.
Knowledge graph construction and maintenance is, however, very challenging, and may require a considerable amount of human effort. Notwithstanding the high cost of knowledge creation, knowledge graphs are often still biased, incomplete or too coarse-grained. Take HeLis, an ontology for health and lifestyle, as an example. Its food knowledge is quite simple and often represents many different variants with a single entity (e.g., "Banana" for all kinds and derivatives of bananas), and its knowledge of health is highly incomplete when compared with dedicated biomedical ontologies. In addition, it is hard to avoid errors such as incorrect facts and categorisations in knowledge graphs; e.g., FoodOn categorises soy milk as a kind of milk, but not as a kind of soy product. Such errors may be inherited from the information source or be caused by the construction procedure. These issues significantly impact the usefulness of knowledge graphs and the reliability of the systems that use them; e.g., the categorisation of soy milk could be dangerous if the knowledge graph were used in a food allergen alert system.
Therefore, effective knowledge graph construction and curation is urgently required and will play a critical role in exploiting the full value of knowledge graphs. As there are now many available knowledge resources, one possible approach is to use multiple sources to address both coverage and quality issues, e.g., via integration and cross-checking. For example, integrating HeLis with FoodOn would combine fine-grained categorization of food products (including bananas) with lifestyle knowledge. Moreover, cross-checking FoodOn with HeLis will reveal the problem with soy milk, which is correctly categorized as a soy product in HeLis. Automating the integration of knowledge resources is challenging, but combining semantic and learning-based techniques seems to be a very promising approach, and we have already obtained some encouraging preliminary results in this direction.
The proposed research will therefore study a range of semantic and machine learning techniques, and how to combine them to support knowledge graph construction and curation. As well as its application to knowledge graph construction and curation, this research will also contribute to the development of new neural-symbolic theories, paradigms and methods, such as deep semantic embedding for learning representations for expressive knowledge, and knowledge-guided learning for addressing sample shortage problems. These techniques promise to revolutionize many AI and big data technologies.
Publications
Benedikt M
(2022)
Rewriting the Infinite Chase
Benedikt M
(2022)
Rewriting the infinite chase
in Proceedings of the VLDB Endowment
Chen J
(2023)
Zero-Shot and Few-Shot Learning With Knowledge Graphs: A Comprehensive Survey
in Proceedings of the IEEE
Chen J
(2021)
OWL2Vec*: embedding of OWL ontologies
in Machine Learning
Chen J
(2023)
Contextual semantic embeddings for ontology subsumption prediction
in World Wide Web
Description | * We have developed techniques based on language model fine-tuning and prompt engineering to support a range of knowledge engineering tasks including ontology alignment and subsumption prediction. Initially we have used the BERT model, and we have used these techniques to implement the BERTmap and BERTsubs tools for ontology alignment and subsumption prediction. We are currently investigating the use of more modern/capable LLMs like Flan-T5-XXL and GPT-* for such tasks. * We have explored the use of LLMs to extend ontologies. One direction is to discover new concepts mentioned in text by extending entity linking with NIL, and insert them in the ontology by finding out the right subsumption edges. Another direction is to use the structure of existing taxonomies and ontologies by identifying missing concepts. In both cases, we explored pre-trained language models together with LLMs for inventing names for new concepts and for inserting them in the taxonomy/ontology. * We have investigated the extent to which LMs capture ontological knowledge. We have developed OntoLAMA, a tool for probing language models using knowledge including subsumptions with named and complex concepts extracted from ontologies. * We have developed a general and transferable hierarchy embedding technique named HiT: it is learned by projecting each concept's textual representation of a LM to a Poincare ball. It could be applied to ontology curation such as subsumption prediction, and has the potential to be extended for Subsumption-based Semantic Search. * We have implemented many of the above methodologies in DeepOnto, a new Python-based library for supporting ontology engineering with deep learning algorithms/tools, especially LMs. |
Exploitation Route | DeepOnto is being used by several other groups, including at Samsung. |
Sectors | Aerospace Defence and Marine Energy Financial Services and Management Consultancy Healthcare Manufacturing including Industrial Biotechology |
URL | https://krr-oxford.github.io/DeepOnto/ |
Description | We have a close colaboration with Samsung and our DeepOnto library is being used for ontology development at Samsung. |
First Year Of Impact | 2023 |
Sector | Digital/Communication/Information Technologies (including Software) |
Impact Types | Economic |
Description | Collaboration with Bosch |
Organisation | Bosch Group |
Department | Bosch |
Country | Germany |
Sector | Private |
PI Contribution | PhD research |
Collaborator Contribution | Real-life problems and funding for PhD student |
Impact | PhD funding |
Start Year | 2021 |
Description | Collaboration with Samsung Research UK |
Organisation | Samsung |
Department | Samsung, UK |
Country | United Kingdom |
Sector | Private |
PI Contribution | Collaboration with Samsung Research UK |
Collaborator Contribution | Research problems and funding for PhD students and PDRAs |
Impact | Publications and funding |
Start Year | 2019 |
Description | Collaboration with Siemens |
Organisation | Siemens AG |
Country | Germany |
Sector | Private |
PI Contribution | PhD research |
Collaborator Contribution | Real-life problems and funding for PhD student |
Impact | PhD funding |
Start Year | 2019 |
Description | Keynote talk at Declarative AI conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Keynote at DeclarativeAI conference about our research and spin-out activities on knowledge graphs |
Year(s) Of Engagement Activity | 2022 |
Description | Keynote talk at LDAC conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Keynote at LDAC to present our research and spin-out activities on knowledge graphs |
Year(s) Of Engagement Activity | 2022 |
Description | Presentation at Huawei |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Talk at Huawei to inform them about our research and spin-out activities on knowledge graphs |
Year(s) Of Engagement Activity | 2022 |
Description | Presentation at SAP |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Talk at SAP to inform them about our research and spin-out activities on knowledge graphs |
Year(s) Of Engagement Activity | 2022 |