Panda Alert Research Proposal

Lead Research Organisation: University of Cambridge

Department Name: Linguistics

Abstract

The Panda Alert System aims to improve the state of the art in early disease outbreak
detection by incorporating linguistic features into the language models which were
previously not known/seen to indicate any risk to human health. The project will
develop a fully functioning, real-time alerting and mapping system.

This kind of research project assumes significant knowledge/use of software
engineering methodologies combined with novel research into unsupervised or
semi-supervised NLP models. We propose a layered, modular architecture
detachable from the core NLP engine, which should demonstrate a universal
detection capability easily transferable to new domains.

The most likely approach would be a semi-supervised, bootstrapping model, which
learns from a small amount of training data to generalise over the unknown domain.
Harnessing the latest NLP machine learning methods such as neural networks (deep
learning), the model analyses news reports to flag any risks to public health. All tools
and data sets from this research will be open sourced and available for download
from public repositories such as GitHub or Google Code.

It is vital for the future reusability of the core NLP engine to be domain-agnostic so
that it can be extended and adapted via a clear programming interface to new event
detection tasks. The real-time analysis system will have to handle hundreds of
thousands of news articles and social media items per day in multiple languages.
The research part of the project will take the state of the art in NLP and AI and
extend it to establish novel methods of NLP general discourse pattern detection in
order to analyse statements for known and unknown features potentially indicative
of the fact-bearing target knowledge. The new technique should be capable of
handling the identification and reporting of other target knowledge/facts from casual
online user/news activity with minimum or no training.

In year one, I shall be mainly focusing on toponym resolution and geoparsing. These
techniques comprise NLP techniques for identification of place names in text and
their subsequent resolution to geographical coordinates. This task is easy for humans
to perform, however for a machine to tell which London (UK, Canada, etc.) was
mentioned in a text is a non-trivial job. I aim to improve on the existing baselines by
researching a novel method of geoparsing. We aim to publish the findings next year
at a respected conference.

Student:

Milan Gritta

Period of Study:

Sep 15 - Sep 18

Funder:

NERC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1649558

Research Topic:

Unclassified

Organisations

University of Cambridge (Lead Research Organisation)

People	ORCID iD
Nigel Collier (Primary Supervisor)
Milan Gritta (Student)

Publications

Author Name Title Publication

Date Published

10 25 50

Gritta M (2018) What's missing in geographical parsing? in Language resources and evaluation

Gritta M (2020) A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics. in Language resources and evaluation

Gritta M (2017) What's missing in geographical parsing?

Gritta M (2017) Vancouver Welcomes You! Minimalist Location Metonymy Resolution

Gritta Milan (2019) Where are you talking about? : advances and challenges of geographic analysis of text with application to disease monitoring

Gritta, M. (2017) Vancouver Welcomes You! Minimalist Location Metonymy Resolution

Gritta M (2018) Which Melbourne? Augmenting Geocoding with Maps

Gritta, M. (2018) Which Melbourne? Augmenting Geocoding with Maps

Gritta M (2019) A Pragmatic Guide to Geoparsing Evaluation

Milan Gritta (2018) Which Melbourne? Augmenting Geocoding with Maps

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
NE/M009009/1			04/10/2015	31/12/2022
1649558	Studentship	NE/M009009/1	30/09/2015	29/09/2018	Milan Gritta

Key Findings
Impact Summary
Policy Influence
Research Databases and Models
Research Tools and Methods
Software and Technical Products
Engagement Activities


Description	We discovered novel ways to increase the effectiveness of geographic text analysis. This is particularly relevant for event monitoring such as disease outbreaks. In order to accurately monitor breaking news events, we need to be able to deploy the latest techniques in Natural Language Processing and Artificial Intelligence. This was the aim of the thesis, to show how these novel methods can significantly improve existing approaches to disease monitoring for the benefit of public health.
Exploitation Route	The audience will come from a mixture of technical and policy research background. The findings show a path to more accurate information extraction for disease monitoring or any other event monitoring for that matter. The thesis can be consulted for ways to bring existing public health monitoring systems up to date with the latest techniques in artificial intelligence and computational linguistics.
Sectors	Agriculture, Food and Drink,Communities and Social Services/Policy,Education,Environment,Healthcare,Government, Democracy and Justice


Description	The findings of this thesis are expected to be used by international Public Health agencies (JRC Europe, PHAC Canada) maintaining an automatic disease monitoring system using Natural Language Processing technology. This involves mostly technical advice and material for the development and maintenance of such technology. The aim is to increase the capability and effectiveness of NLP monitoring systems for the benefit of public health.
First Year Of Impact	2019
Sector	Environment,Healthcare,Government, Democracy and Justice
Impact Types	Societal,Economic,Policy & public services


Description	Technical Research for Disease Monitoring
Geographic Reach	Multiple continents/international
Policy Influence Type	Influenced training of practitioners or researchers


Title	GitHub Resources
Description	NLP/AI Resources for Geoparsing and beyond. Descriptions in the repository.
Type Of Material	Improvements to research infrastructure
Year Produced	2016
Provided To Others?	Yes
Impact	Availing oneself of the SOTA tools and resources for geographic text analysis.
URL	https://github.com/milangritta


Title	My GitHub Page
Description	This is where I store most of the code and resources generate by my research including links to further resources.
Type Of Material	Data handling & control
Year Produced	2015
Provided To Others?	Yes
Impact	Allows anyone to see what I'm researching, try the code, download the data and replicate experiments.
URL	https://github.com/milangritta


Title	Research data supporting "Vancouver Welcomes You! Minimalist Location Metonymy Resolution"
Description	Complete supporting/replication data and code for the ACL Publication. The paper was published in August 2017 at www.acl2017.org
Type Of Material	Database/Collection of data
Year Produced	2017
Provided To Others?	Yes


Title	Research data supporting "What's missing in geographical parsing?"
Description	Full code and data required for replication and experimentation.
Type Of Material	Database/Collection of data
Year Produced	2017
Provided To Others?	Yes


Title	Research data supporting "Which Melbourne? Augmenting Geocoding with Maps"
Description	Please unzip the files and read the README file for more instructions. Also visit my GitHub account for more information (milangritta)
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes


Title	Software supporting 'A Pragmatic Guide to Geoparsing Evaluation'
Description	Code and data for the NCRF++ model described in the paper. For more information, download the file to view the README files within.
Type Of Technology	Software
Year Produced	2019


Description	Joint Research Centre Visit
Form Of Engagement Activity	Participation in an open day or visit at my research institution
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Visited the Joint Research Centre at the European Commission's Science Hub in Ispra, Italy. The purpose was to share research with the European Media Monitor research group and to gather experience and observe "Science in Action". I learnt how to create a case study for my PhD thesis.
Year(s) Of Engagement Activity	2018

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects