Interpreting and integrating mismatched data on the fly

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

This project is concerned with the problem of interpreting information and requests which are received during communication: for example, during an emergency response event. If the data sources of the sender of the information/request and the data source of the receiver are not fully compatible (which is highly likely even within an organisation and almost certain between different organisations) then the received information or request will not be understood by the receiver. Our matching system is designed to explore the receiver's data source to find one or more terms which appear to be similar to the received term, in order to allow the receiver to interpret this term with respect to its own data. The implications of this match (for example, that a more general term is being used, that certain attributes of a term are being ignored or introduced, and so on) are explained to a user representing the receiver organisation, who can then approve the proposed match, and interaction will proceed as if the two different terms were equivalent.

The matching system is local to every participant that wishes to make use of it and so it is only able to access the data of the organisation which is using it. This therefore avoids privacy and security issues by ensuring that only data that has been approved for sharing in the current context can be accessed by others.

The aims of the project are:
- to investigate the kinds of mismatches that have been encountered in real emergency response scenarios and within real data sources, which will be provided by our collaborators (in Dstl, the Met Office, Scottish Resilience and Strathclyde FireRscue).
- to implement a system which will apply techniques based on existing matching techniques to these data sources, to enable appropriate interpretation of unknown data, and to provide feedback to users so that they can approve these matches (or in situations where the consequences of errors is low, these can proceed automatically without user approval).
- to evaluate the system against real data, with feedback from expert users as to how useful and appropriate the suggested matches are.

This project will be exploratory and is designed to evaluate the potential of the approach. There are several factors which should be addressed before the system is ready to be used in the field that are outside the scope of this project, specifically:
- Carry out further analysis of data, and of mismatches that have actually occurred in emergency response situations. Performing this will be a key feature of this project, but this can only be an initial study;
- Design and implement new matching techniques to address any possible mismatches which are not addressed by those currently implemented. Depending on the complexity of the data, it is not anticipated that it will be possible to cover every possible mismatch within this project;
- Provide a carefully designed web interface for user interaction with the system. This would require specialist design advice and will not be attempted during the project; instead, a simple interface will be provided.

The key output of this project will be an understanding of the potential of this approach, based on the evaluation of the implementation, and a roadmap for how to develop this prototype work into a system which can be used in the field.

Planned Impact

The ability to share information quickly and efficiently is essential for real time communication and collaboration. This need is especially obvious in situations such as emergency response, where time is often critical and where many different organisation with data that is likely to be incompatible need to communicate to provide an effective response. The system we are proposing to develop will provide a way to enable this communication, by automatically matching unknown terms (from someone else's data source) to terms they appear to approximate (within one's own data source), and provide effective feedback to users that allows them to understand the impact of using
this match. The system will be able to interpret data which is mismatched both semantically (by which we mean that different words which mean more or less the same thing are used) and structurally (where similar terms have different attributes) and will therefore be usable when complex information is being shared between participants who may or may not have interacted previously.

Because this matching is done locally - i.e., every participant can have the matching system installed on their own devices - there is no problem with security. A participant will receive information and/or a request to provide information, and, using the matching system, they will be able to assimilate this information or provide an appropriate response even if the information or request is phrased in terms which are not compatible with their data. There is no requirement to allow others to access their data.

There are several key areas of impact for this project.

- A set of matching techniques and a prototype tool.
Beneficiaries: both key responders who interact with one another frequently (including our collaborators: Dstl, the Met Office, Scottish Resilience and Strathclyde Fire & Rescue) and anyone who may become involved in a disaster response. Additionally, it could easily be adapted to those interested in automated interaction in different situations.
The prototype we create will enable sharing of data between separate groups and individuals. The tool will be open source and available to all. It will have been developed through interaction with end-users so that it will be suitable for their needs.

- Enabling wider participation.
Beneficiaries: anyone potentially involved in the response, such as non-local emergency responders, other public and private services, local and national businesses and the general public.
It will be straightforward for all such responders to share information and for this information to be quickly integrated into existing plans. The need to facilitate such interactions was strongly emphasised in the Pitt report.

- Improved resilience in aftermath of disaster.
Beneficiaries: anyone potentially affected, socially or economically, by a major natural disaster such as flooding.
An increased ability to respond to the requirements of the situation will result in damage prevention and limitation.

- Continued development of a community that spans academia, industry and government, where those that deal with different aspects of these problems can interact and share ideas.
Beneficiaries: academics in these fields; on-the-ground emergency-responders, interested participants in industry and government.
Academic techniques will become available to these participants and they will be educated about the possibilities being developed. Equally, it provides academics with a clearer understanding of the requirements of on-the-ground responders, and in industry and government in general, and provides an excellent test-bed for these academic techniques, so large-scale, real-world evaluation is possible (often very difficult for techniques such as ontology matching).

Publications

10 25 50
publication icon
Bundy A (2013) The interaction of representation and reasoning. in Proceedings. Mathematical, physical, and engineering sciences

publication icon
Fiona McNeill (Author) (2012) Dynamic Data Sharing from Large Data Sources

 
Description We have developed techniques for aligning pairs of databases, so that the owner of one database can query another one for information not available in its database. We have focused on automatic, partial alignment, fast and on demand to answer a specific question, in contrast to most alignment work, where the work is done manually for complete databases. These techniques were implemented in the CHAIN system. We have conducted preliminary evaluation of CHAIN in the domain of emergency response. This will allow different agencies to share information, as required during an emergency without prior knowledge of which agencies will be involved or prior planning to share information. Problems in sharing information between agencies during emergencies is frequently cited in post emergency reports. The CHAIN system currently works with RDF sources and relational databases, and can be extended to other representations.
Exploitation Route The primary target of the research is the emergency response domain, and we have generated much interest from people in that domain in working with us to develop the techniques to a stage where they could be usable in the field. Additionally, the techniques are generic and potentially of use outwith emergency response, e.g., for communication between online services when planning a composite service.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Healthcare,Security and Diplomacy,Other

 
Description This project was only a one-year pilot. As a result of this project we have joined the ESSENCE Marie Curie Network with an RA/PhD and a brief to apply the techniques developed in the project to the emergency response domain. Safer Scotland and DSTL are Government partners in this project and are keen to adopt its results. The University of Trento is an academic partner. It has also formed the basis for an EPSRC fellowship application. Safer Scotland and DSTL were also partners on this, as was the Chief Fire Officers' Association, who were brought in based on their interest in the results of the project. Academic collaborators on this proposal are the University of Trento, CSIC at IIIA Barcelona, the University of Newcastle and the University of Glasgow.
First Year Of Impact 2017
Sector Security and Diplomacy
Impact Types Policy & public services