Panda Alert Research Proposal
Lead Research Organisation:
University of Cambridge
Department Name: Linguistics
Abstract
The Panda Alert System aims to improve the state of the art in early disease outbreak
detection by incorporating linguistic features into the language models which were
previously not known/seen to indicate any risk to human health. The project will
develop a fully functioning, real-time alerting and mapping system.
This kind of research project assumes significant knowledge/use of software
engineering methodologies combined with novel research into unsupervised or
semi-supervised NLP models. We propose a layered, modular architecture
detachable from the core NLP engine, which should demonstrate a universal
detection capability easily transferable to new domains.
The most likely approach would be a semi-supervised, bootstrapping model, which
learns from a small amount of training data to generalise over the unknown domain.
Harnessing the latest NLP machine learning methods such as neural networks (deep
learning), the model analyses news reports to flag any risks to public health. All tools
and data sets from this research will be open sourced and available for download
from public repositories such as GitHub or Google Code.
It is vital for the future reusability of the core NLP engine to be domain-agnostic so
that it can be extended and adapted via a clear programming interface to new event
detection tasks. The real-time analysis system will have to handle hundreds of
thousands of news articles and social media items per day in multiple languages.
The research part of the project will take the state of the art in NLP and AI and
extend it to establish novel methods of NLP general discourse pattern detection in
order to analyse statements for known and unknown features potentially indicative
of the fact-bearing target knowledge. The new technique should be capable of
handling the identification and reporting of other target knowledge/facts from casual
online user/news activity with minimum or no training.
In year one, I shall be mainly focusing on toponym resolution and geoparsing. These
techniques comprise NLP techniques for identification of place names in text and
their subsequent resolution to geographical coordinates. This task is easy for humans
to perform, however for a machine to tell which London (UK, Canada, etc.) was
mentioned in a text is a non-trivial job. I aim to improve on the existing baselines by
researching a novel method of geoparsing. We aim to publish the findings next year
at a respected conference.
detection by incorporating linguistic features into the language models which were
previously not known/seen to indicate any risk to human health. The project will
develop a fully functioning, real-time alerting and mapping system.
This kind of research project assumes significant knowledge/use of software
engineering methodologies combined with novel research into unsupervised or
semi-supervised NLP models. We propose a layered, modular architecture
detachable from the core NLP engine, which should demonstrate a universal
detection capability easily transferable to new domains.
The most likely approach would be a semi-supervised, bootstrapping model, which
learns from a small amount of training data to generalise over the unknown domain.
Harnessing the latest NLP machine learning methods such as neural networks (deep
learning), the model analyses news reports to flag any risks to public health. All tools
and data sets from this research will be open sourced and available for download
from public repositories such as GitHub or Google Code.
It is vital for the future reusability of the core NLP engine to be domain-agnostic so
that it can be extended and adapted via a clear programming interface to new event
detection tasks. The real-time analysis system will have to handle hundreds of
thousands of news articles and social media items per day in multiple languages.
The research part of the project will take the state of the art in NLP and AI and
extend it to establish novel methods of NLP general discourse pattern detection in
order to analyse statements for known and unknown features potentially indicative
of the fact-bearing target knowledge. The new technique should be capable of
handling the identification and reporting of other target knowledge/facts from casual
online user/news activity with minimum or no training.
In year one, I shall be mainly focusing on toponym resolution and geoparsing. These
techniques comprise NLP techniques for identification of place names in text and
their subsequent resolution to geographical coordinates. This task is easy for humans
to perform, however for a machine to tell which London (UK, Canada, etc.) was
mentioned in a text is a non-trivial job. I aim to improve on the existing baselines by
researching a novel method of geoparsing. We aim to publish the findings next year
at a respected conference.
Organisations
People |
ORCID iD |
Nigel Collier (Primary Supervisor) | |
Milan Gritta (Student) |
Publications
Gritta M
(2018)
What's missing in geographical parsing?
in Language resources and evaluation
Gritta M
(2020)
A pragmatic guide to geoparsing evaluation: Toponyms, Named Entity Recognition and pragmatics.
in Language resources and evaluation
Gritta M
(2017)
What's missing in geographical parsing?
Gritta M
(2017)
Vancouver Welcomes You! Minimalist Location Metonymy Resolution
Gritta, M.
(2017)
Vancouver Welcomes You! Minimalist Location Metonymy Resolution
Gritta M
(2018)
Which Melbourne? Augmenting Geocoding with Maps
Gritta, M.
(2018)
Which Melbourne? Augmenting Geocoding with Maps
Gritta M
(2019)
A Pragmatic Guide to Geoparsing Evaluation
Milan Gritta
(2018)
Which Melbourne? Augmenting Geocoding with Maps
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
NE/M009009/1 | 04/10/2015 | 31/12/2022 | |||
1649558 | Studentship | NE/M009009/1 | 30/09/2015 | 29/09/2018 | Milan Gritta |
Description | We discovered novel ways to increase the effectiveness of geographic text analysis. This is particularly relevant for event monitoring such as disease outbreaks. In order to accurately monitor breaking news events, we need to be able to deploy the latest techniques in Natural Language Processing and Artificial Intelligence. This was the aim of the thesis, to show how these novel methods can significantly improve existing approaches to disease monitoring for the benefit of public health. |
Exploitation Route | The audience will come from a mixture of technical and policy research background. The findings show a path to more accurate information extraction for disease monitoring or any other event monitoring for that matter. The thesis can be consulted for ways to bring existing public health monitoring systems up to date with the latest techniques in artificial intelligence and computational linguistics. |
Sectors | Agriculture, Food and Drink,Communities and Social Services/Policy,Education,Environment,Healthcare,Government, Democracy and Justice |
Description | The findings of this thesis are expected to be used by international Public Health agencies (JRC Europe, PHAC Canada) maintaining an automatic disease monitoring system using Natural Language Processing technology. This involves mostly technical advice and material for the development and maintenance of such technology. The aim is to increase the capability and effectiveness of NLP monitoring systems for the benefit of public health. |
First Year Of Impact | 2019 |
Sector | Environment,Healthcare,Government, Democracy and Justice |
Impact Types | Societal,Economic,Policy & public services |
Description | Technical Research for Disease Monitoring |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Influenced training of practitioners or researchers |
Title | GitHub Resources |
Description | NLP/AI Resources for Geoparsing and beyond. Descriptions in the repository. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | Availing oneself of the SOTA tools and resources for geographic text analysis. |
URL | https://github.com/milangritta |
Title | My GitHub Page |
Description | This is where I store most of the code and resources generate by my research including links to further resources. |
Type Of Material | Data handling & control |
Year Produced | 2015 |
Provided To Others? | Yes |
Impact | Allows anyone to see what I'm researching, try the code, download the data and replicate experiments. |
URL | https://github.com/milangritta |
Title | Research data supporting "Vancouver Welcomes You! Minimalist Location Metonymy Resolution" |
Description | Complete supporting/replication data and code for the ACL Publication. The paper was published in August 2017 at www.acl2017.org |
Type Of Material | Database/Collection of data |
Year Produced | 2017 |
Provided To Others? | Yes |
Title | Research data supporting "What's missing in geographical parsing?" |
Description | Full code and data required for replication and experimentation. |
Type Of Material | Database/Collection of data |
Year Produced | 2017 |
Provided To Others? | Yes |
Title | Research data supporting "Which Melbourne? Augmenting Geocoding with Maps" |
Description | Please unzip the files and read the README file for more instructions. Also visit my GitHub account for more information (milangritta) |
Type Of Material | Database/Collection of data |
Year Produced | 2018 |
Provided To Others? | Yes |
Title | Software supporting 'A Pragmatic Guide to Geoparsing Evaluation' |
Description | Code and data for the NCRF++ model described in the paper. For more information, download the file to view the README files within. |
Type Of Technology | Software |
Year Produced | 2019 |
Description | Joint Research Centre Visit |
Form Of Engagement Activity | Participation in an open day or visit at my research institution |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Visited the Joint Research Centre at the European Commission's Science Hub in Ispra, Italy. The purpose was to share research with the European Media Monitor research group and to gather experience and observe "Science in Action". I learnt how to create a case study for my PhD thesis. |
Year(s) Of Engagement Activity | 2018 |