Innovative tools to enable exploration of complex and specialised data sets

Lead Research Organisation: University of Essex
Department Name: Computer Sci and Electronic Engineering

Abstract

In the age of Big Data, knowledge workers - individuals, companies and organisations whose primary focus is knowledge
and information extraction and usage - find it increasingly difficult to search for and identify accurate and relevant
information. In particular, in the domain of scientific literature and IP search, where the underlying corpora are growing at a
huge rate, this is a daunting task and human expertise and involvement remain critical. This project aims to develop a suite
of techniques and methods that will enable users to search for and identify relevant information within a corpus more
efficiently and effectively. The methods developed will deploy semantic-based analysis, domain and lexical linguistic
ontologies in order to first understand the user needs based on the underlying domain of application and subsequently
enable more accurate information retrieval through enhanced search and cross-reference of information. In addition, the
project aims to offer advanced user services through sharing of search strategies which will be identified by observing and
understanding patterns in users' search behaviours.

Planned Impact

The project represents a highly integrated research proposal that will generate significant impact both for the company and
within the intended application domain, but also more generally within the area of semantic information extraction from
multiple information resources.
Commercial Impact
The project will have direct impact for CambridgeIP and their work in providing efficient and effective services in the area of
scientific search including patent search and analysis. Both partners have significant expertise and experience in the areas
of document and corpora analysis and information extraction, but the combination of skills of CambridgeIP members and
the research expertise and experience of the academic staff from the University of Essex, who are also the leading
members of the ESRC Data Research Centre for Smart Analytics, is envisaged to bring about significant and tangible
outcomes.
More specifically, the project is envisaged to bring about significant improvements to the user services that CambridgeIP
currently delivers, increase the efficiency of the provided services as well as potentially increase their user base
significantly including providing access to SMEs to services and products that were previously accessible only to wellfunded
companies and organisations. At the heart, we do propose to develop methods and techniques which will allow firms to gain access to timely and affordable intelligence about their technological space. Enhanced access to intelligence
will strengthen a firm's R&D strategy and accelerate the commercialisation of inventions which would be dormant
otherwise.
Systems developed in this project will initially be demonstrated in the areas of Bioinformatics and Electronics to start with,
but we envisage extending it to the Automotive, Telecoms and Nanotechology (including Graphene) technology areas.
These are areas where clusters of inventors already exist in the UK and elsewhere. To this end, CambridgeIP's client base
in this space will be direct beneficiaries, but we will also work within the Eastern Academic Research Consortium (Eastern
ARC: this is a research collaboration which involves the Universities of Essex, Kent and East Anglia) and the University of
East Anglia in particular to identify other potential users and beneficiaries.
There will be significant interaction between CambridgeIP staff and University of Essex researchers throughout the project
as well as other practitioners, companies and organisation via the ESRC Data Research Centre for Smart Analytics.
Beyond the impact of the provision of the CambridgeIP services, the developed methods and techniques would be generic
in nature and may be applied to other application areas where information from multiple resources needs to be extracted
and linked (e-commerce, e-government and healthcare applications for instance).
Academic impact
Key academic groups this research will impact on are research groups in Information Retrieval in the Universities of
Sheffield and Glasgow as well as the University of Strathclyde. The University of Essex through the Language and
Computation Group as well as the ESRC Data Research Centre for Smart Analytics have links and relationships with these
research groups and we are planning to disseminate our research results to researchers in these institutions through
dedicated seminars which will provide the opportunity to meet other key academics in this research space. Seminars and
workshops, which will be organised within the ESRC Data Research Centre for Smart Analytics, will provide another means
to ensure academic impact to other key partners, researchers and practitioners. As the techniques developed will also be
more generic they also have the potential to impact on the work carried out by the UK Data Archive which is hosted at the
University of Essex.

Publications

10 25 50
 
Description The primary aim of the project has been to develop new techniques and methods that would enable efficient and effective information retrieval in domains where users need to search through typically large amounts of technical information in the form of documents. Such users may not be experts, but are still required to shift through large amounts of documents in order to identify relevant information/knowledge.
We have developed methods that can be used for knowledge and information extraction and for search in large scientific literature sources that are based on semantic web technologies. These exploit a deep understanding of the domain (for instance bioinformatics) via the use of taxonomies and domain and lexical linguistic ontologies as well as the documents within the domain. The user can then be assisted and guided in his/her search for information through providing suggestions to refine the search based on the understanding of the domain provided via the use of multiple semantic-based resources. This enables faster and more efficient information retrieval of information that is more relevant to the user needs.
A prototype system has been developed which demonstrates the use of these methods in the domain of bioinformatics.
Exploitation Route The methods that we have developed as part of this project may have wider applicability beyond the domain that we have used to demonstrate them (bioinformatics). We have shown how information retrieval in complex domains can be enhanced if semantic-based resources such as ontologies, taxonomies and others can be utilized and reasoned over. These methods can be combined with standard search techniques.
Potential beneficiaries and users of this work can be organisations/knowledge workers that are required to shift through huge amounts of information such as documents without necessarily being experts. Our methods can be integrated and combined with standard search techniques to enhance information retrieval.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description The project focused on developing novel exploration/search strategies for complex datasets and developing complex queries. The project was undertaken in its latter part in collaboration with Linguamatics (acquired by IQVIA in 2019). Many of the tools developed during the project were incorporated in the company's software tools made available to clients. Work such as the term variation options developed under the project are still widely used by customers of IQVIA. The ability to reuse search strategies has been particularly successful and has been further developed over time and most of the library queries that the company supplies to its customers are now built out of embedded sub-components. The work of the project has allowed the company to provide more easily maintainable, higher quality strategies and has been embedded within the services that the company delivers to its customers.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description Contributed to workshop organised by the African Institute for Mathematical Sciences on the use of artificial intelligence techniques for business purposes 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact I delivered a session as part of a workshop/event organised by the African Institute for Mathematical Sciences on the applications of artificial intelligence techniques including machine learning and recommendation technologies to a group of students from various countries in Africa. The event took place in South Africa.
Year(s) Of Engagement Activity 2017
 
Description Roundtable discussion organised by ObjectiveIT 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact This was a presentation given to about 20-25 business representatives on the use of artificial intelligence techniques including machine learning and recommendation technologies to support decision making in industry.
Year(s) Of Engagement Activity 2018