Machine Learning Methods for Personalised, Abstractive Summarisation of Consumer-Generated Media

Lead Research Organisation: University of Sheffield

Department Name: Computer Science

Abstract

The success of Web 2.0 and CGM is based on tapping into the social nature of human interactions, by making it possible for people to voice their opinion, become part of a virtual community and collaborate remotely. If we take micro-blogging as an example, the growth in Twitter visits between 2008 and 2009 was over 1,000% and it is projected that by 2010 around 10% of all internet users will be on Twitter. This unprecedented rise in the volume and importance of online content has resulted in companies and individuals spending ever increasing amounts of time trying to keep up with relevant CGM. It is estimated that 700 person hours per year is the absolute minimum that companies and public services need to spend on CGM monitoring, online user engagement, and discovery of new information. This fellowship is about helping people to cope with the resulting information overload, through automatic methods that are capable of adapting to individual's information seeking goals and summarising briefly the relevant media and thus supporting information interpretation and decision making. Automatic text summarisation is key to our goal and consists of compressing the meaning of text documents while preserving the relevant information contained within them. While there has been a lot of research on well-authored texts such as news, summarisation of social media is still in its infancy, with research focused on product reviews. A key experimental finding has been that due to the characteristics of social media (product reviews in particular) it is better first to abstract the relevant information from the different documents and sites and then to use natural language generation to create a fluent text based on this information.In this fellowship I will investigate and evaluate new machine learning methods for personalised, abstractive multi-document summarisation across different social media. For example, diachronic summaries that combine Twitter posts, blog articles, and Facebook wall messages on a given topic. In contrast to previous work, we will pursue an inter-disciplinary approach, which will help us study the social dimension of CGM summarisation and establish actual user needs. The second research challenge is that the algorithms need to be robust in the face of this noisy, jargon-full and dynamic content, as well as needing models capable of representing the contradictory and strongly temporal nature of CGM. A key novel contribution of our work is personalising the summaries, based on a model of user interests, goals, and social context. Issues such as trustworthiness, privacy, and online communities (with their hubs and authorities) will also play an important role. The fourth research challenge is to generate personalised abstractive summaries that can help users with sensemaking and content interpretation. An exciting element of my research will be in studying the different kinds of summaries that are useful for a variety of real users (companies, journalists, and the general public) through multi-disciplinary collaborations with the Press Association, British Telecom, the Oxford Internet Institute, and Sheffield's Department of Journalism. A key project deliverable will be a publicly available browser plugin that provides easy access to the automatically generated summaries. This will allow me to evaluate the project results with real users, on a large scale. It will also provide a new evaluation challenge for the Natural Language Generation community, as researchers will be able to compare their summarisers against those delivered by our open-source algorithms. Last but not least, the fellowship covers not only foundational multi-disciplinary research but it also tests the results in several Digital Economy pilot experiments involving commercial partners (The Press Association, British Telecom, Fizzback).

Planned Impact

Since consumer-generated media have revolutionised our personal lives, the economy, and society as whole, this research has wide-ranging implications and relevance. The five year duration of this fellowship will give us the necessary flexibility, time and resources to undertake dissemination, exploitation, and training activities aimed at the multiple beneficiaries detailed below. Firstly, in order to achieve maximum impact, project results will be made open-source, which will include both computational algorithms and the data collections on which they were tested and evaluated. Having open-source results is needed not only because we aim to create an active research community around the project results, but also since we want to promote use by non-governmental organisations, educational institutions, and other non-profit users. Another important project result will be a free tool to help users become aware of the dangers of social media and allow them to monitor continuously what is being disclosed about them online (by themselves, family, and friends). In terms of economic impact, we will target: (i) the areas used as Digital Economy pilots, namely digital journalism, voice-of-the-customer service providers, and online brand, product, and reputation management; (ii) companies providing internet privacy and security services; (iii) companies providing product comparison services to consumers. Major beneficiaries are the Digital Economy sectors identified above. In addition to the project partners (BT, PA, Fizzback, and Nokia), through UK and international research projects the PI has built successful industrial collaborations with other large companies (Yahoo, Atos, Dassault Aviation, Elsevier, MPS Bank, Creditreform, NetInfo) and SMEs (Garlik, Innovantage, Ontotext, Matrixware, Mondeca, Ontoprise, ISOCO), where our previous research showed the potential of intelligent summarisation technology for knowledge management and business intelligence. Further knowledge transfer opportunities arise through the Sheffield University connections to the digital and new media industries in the Sheffield city region, which are growing at a faster rate than anywhere else in the UK in terms of specialist companies and new jobs. A unique opportunity arises also from the 100m South Yorkshire Digital Region project, which will pilot the Next Generation Broadband and thus provide the required infrastructure for advanced digital economy applications. The research proposed here also has significant societal relevance. The first dimension is user privacy and trust, which are an important element of our work. We plan to promote the new technology to users and companies with identity theft products (e.g. Garlik, with whom we have worked in the past), in order to help people with tracking personal data divulged on public web sites and social networks. This research is relevant to policies such as: Digital Britain, EU's e2010, Global Focus on ICT in Development. We will reach out towards policy makers through the new UK Ethical ICT and young people stakeholder group and through Sheffield's Digital World outreach activities. Last, but not least, this fellowship will impact significantly the research careers of all team members. It will provide the PI with the unique opportunity to realise a step-change in her career, in order to join the 7% of female grade A staff in engineering and technology. The inter-disciplinary nature of our research team will help researchers to gain knowledge of fields complementary to their primary expertise and to develop new cross-disciplinary research skills. The PhD students will gain transferable research and presentation skills and be prepared for post-doctoral positions in academic or applied research.

Funded Value:

£591,754

Funded Period:

Oct 10 - May 18

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/I004327/1

Principal Investigator:

Kalina Bontcheva

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (40%)

Human-Computer Interactions (20%)

Information & Knowledge Mgmt (40%)

Organisations

People	ORCID iD
Kalina Bontcheva (Principal Investigator)

Publications

Author Name Title

Publication Date Published

|< < 1 2 3 4 5 6 > >|

10 25 50

Wang Hang (Author) (2011) Transition of Legacy Systems to Semantically Enabled Applications: TAO Method and Tools in Semantic Web

Supnithi T (2015) Toward Collaborative LCA Ontology Development: a Scenario-Based Recommender System for Environmental Data Qualification

Gorrell G (2015) The Semantic Web. Latest Advances and New Domains - 12th European Semantic Web Conference, ESWC 2015, Portoroz, Slovenia, May 31 -- June 4, 2015. Proceedings

Lytos Anastasios (2019) The evolution of argumentation mining: From models to social media and emerging tools in arXiv e-prints

Srijith P (2017) Sub-story detection in Twitter with hierarchical Dirichlet processes in Information Processing & Management

Srijith P. K. (2016) Sub-Story Detection in Twitter with Hierarchical Dirichlet Processes in arXiv e-prints

Augenstein Isabelle (2016) Stance Detection with Bidirectional Conditional Encoding in arXiv e-prints

Derczynski, L (2014) Spatio-temporal grounding of claims made on the web, in PHEME

Bontcheva Kalina (2013) Social Media and Information Overload: Survey Results in arXiv e-prints

Gorrell G (2018) Social Informatics - 10th International Conference, SocInfo 2018, St. Petersburg, Russia, September 25-28, 2018, Proceedings, Part I

Key Findings
Impact Summary
Policy Influence
Further Funding
Collaboration
Software and Technical Products
Engagement Activities


Description	The research has focused on analysing and summarising large-volume, high velocity social media content, e.g. during election campaigns. A number of open-source software tools and datasets have been developed and made available for reuse. Successful collaborations and follow-up projects have been established, alongside a set of high profile impact outcomes around analysing online misinformation and fake news.
Exploitation Route	Citations of the scientific publications; use of the software tools and datasets created; reports on the work by media organisations; take-up by companies and associated consulting opportunities.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	The social media analysis and online abuse analysis tools developed as part of this award have been used to produce policy white papers and pursue impact-oriented projects with DCMS and FCDO.
First Year Of Impact	2017
Sector	Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice
Impact Types	Societal,Economic,Policy & public services


Description	Public Witness at Enquiry on Fake News by the UK Digital, Culture, Media and Sport Parliamentary Committee
Geographic Reach	National
Policy Influence Type	Contribution to a national consultation/review
URL	http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/digital-culture-med...


Description	Research was cited in this DCMS report
Geographic Reach	National
Policy Influence Type	Citation in other policy documents
URL	https://publications.parliament.uk/pa/cm201719/cmselect/cmcumeds/363/363.pdf


Description	2018 Faculty Research Award
Amount	$56,000 (USD)
Organisation	Google
Sector	Private
Country	United States
Start	05/2018
End	12/2020


Description	DecarboNet
Amount	€ 401,593 (EUR)
Funding ID	610829
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	10/2013
End	09/2016


Description	EnviLOD
Amount	£55,816 (GBP)
Organisation	Jisc
Sector	Public
Country	United Kingdom
Start	10/2012
End	04/2013


Description	GATE Cloud Exploratory: Adapting the General Architecture for Text Engineering to Cloud Computing
Amount	£71,677 (GBP)
Funding ID	EP/I034092/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	01/2011
End	09/2011


Description	Horizon 2020 - SoBigData
Amount	€ 986,125 (EUR)
Funding ID	654024
Organisation	European Commission H2020
Sector	Public
Country	Belgium
Start	09/2015
End	08/2019


Description	Horizon 2020 COMRADES
Amount	€ 321,250 (EUR)
Funding ID	687847
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	01/2016
End	12/2018


Description	Horizon 2020 KNOWMAK
Amount	€ 249,750 (EUR)
Funding ID	726992
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	01/2017
End	12/2019


Description	Horizon 2020 WeVerify
Amount	€ 2,931,000 (EUR)
Funding ID	825297
Organisation	European Commission H2020
Sector	Public
Country	Belgium
Start	12/2018
End	11/2021


Description	PHEME
Amount	€ 2,916,000 (EUR)
Funding ID	No. 611233
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	01/2014
End	12/2016


Description	This Digital News Initiative Innovation Funding
Amount	€ 50,000 (EUR)
Organisation	Google
Sector	Private
Country	United States
Start	01/2019
End	12/2019


Description	TrendMiner
Amount	€ 621,707 (EUR)
Funding ID	287863
Organisation	European Commission
Sector	Public
Country	European Union (EU)
Start	11/2011
End	10/2014


Description	uComp: Embedded Human Computation for Knowledge Extraction and Evaluation
Amount	£375,621 (GBP)
Funding ID	EP/K017896/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	11/2012
End	11/2015


Description	Buzzfeed UK - Analysis of misinformation and political abuse on Twitter during the UK EU membership referendum and the 2017 General Election
Organisation	BuzzFeed, Inc.
Country	United States
Sector	Private
PI Contribution	Analysed topics, voting intentions, online abuse, and suspected bot accounts:
Collaborator Contribution	They did manual post-analysis of the data and data visualisations, for presentation to general readers.
Impact	- https://www.buzzfeed.com/jamesball/3-million-brexit-tweets-reveal-leave-voters-talked-about-imm? - https://www.buzzfeed.com/tomphillips/twitter-abuse-of-mps-during-the-election-doubled-after-the - https://www.buzzfeed.com/tomphillips/we-found-45-suspected-bot-accounts-sharing-pro-trump-pro
Start Year	2016


Description	Collaboration with Asimina Vasilau, University of Birmingham
Organisation	University of Birmingham
Country	United Kingdom
Sector	Academic/University
PI Contribution	Collaboration around analysis and summarisation of privacy issues discussed in social media
Start Year	2012


Description	Collaboration with British Library on their Envia project
Organisation	The British Library
Country	United Kingdom
Sector	Public
PI Contribution	? A short feasibility study for the British Library?s Envia project, around use of text mining to improve the quality of metadata in their Envia digital library of environmental science content. A bid for short follow-up collaborative project has been submitted to the JISC Digital Infrastructure programme .
Start Year	2011


Description	Collaboration with British Telecom
Organisation	BT Group
Country	United Kingdom
Sector	Private
PI Contribution	Cllaboration around next-generation business intelligence and trend tracking in social media. In the final year of the project, once the CGM summarisation algorithms are mature and tested, we will work together on using these to analyse information from micro-blogs and forums, in order to improve customer relationship management.
Start Year	2010


Description	Collaboration with Fizzback
Organisation	Fizzback
Country	United Kingdom
Sector	Private
PI Contribution	Fizzback provide customer engagement services to UK companies in transport, communications, and retail. In the forth project year, we will implement a DE pilot to study summarisation of customer feedback on transport services, received through noisier channels such as text messages and Twitter.
Start Year	2010


Description	Collaboration with Ontotext
Organisation	Ontotext
Country	Bulgaria
Sector	Private
PI Contribution	Ontotext is a Bulgarian SME specialising in semantic technology. I have been collaborating with them around the challenge of using Linked Data and their award-winning OWLIM semantic repository as knowledge sources for the CGM text mining and summarisation algorithms.
Start Year	2010


Description	Collaboration with Rob Procter, Univ. of Warwick
Organisation	University of Warwick
Country	United Kingdom
Sector	Academic/University
PI Contribution	The open-source social media analysis and summarisation tools
Collaborator Contribution	Qualitative studies of social media and annotated datasets
Impact	Workshop co-organised at WWW'2015. Multidisciplinary collaboration with computational social science researchers
Start Year	2013


Description	Collaboration with the NTAP project (http://ntap.no/)
Organisation	University of Bergen
Country	Norway
Sector	Academic/University
PI Contribution	The NTAP project is developing methods and tools to detect, analyse and visualize the distribution, flow and development of knowledge and opinions across online social networks. This is a collaboration between the Department of Information Science and Media Science at the University of Bergen and Uni Computing, a department in Uni Research (a university-owned research company); the project has several international and industrial partners. NTAP is funded by the Norwegian Research Council, VERDIKT program, and runs from January 2012 ? July 2015.
Start Year	2013


Description	Collaboration with the Oxford Internet Institute (OII)
Organisation	University of Oxford
Department	Oxford Internet Institute
Country	United Kingdom
Sector	Private
PI Contribution	I am collaborating with social scientists from the OII, due to their expertise in social network analysis, trust and privacy, and visualising online networks. We work together on using these to build models of users? digital identity, based on information disclosed by them in social media. Our current focus is on geo-locating wikipedia contributors and twitter users.
Start Year	2010


Description	Collaboration with the Press Association
Organisation	Press Association
Country	United Kingdom
Sector	Private
PI Contribution	The PA is the UK's leading multi-media news and information provider and supplier of business-to-business media services. We are working together on CGM summarisation to support the needs of digital journalism. I have already interviewed several journalists as part of the user requirements gathering stage. Later on, they will be approached for evaluation. In addition, I have been training some of their staff in text mining tools, most notably using our GATE open-source text mining toolkit. Further knowledge transfer opportunities will be pursued through a related European project, which is currently under negotiation.
Start Year	2010


Description	Nesta (from 2015)
Organisation	Nesta
Country	United Kingdom
Sector	Charity/Non Profit
PI Contribution	Collaboration on Political Futures Tracker - analysing social media in the run up to the 2015 General Election to uncover the top political themes, how positive or negative people feel about them, and how far parties and politicians are looking to the future
Collaborator Contribution	Data visualisation and writing up of the blog posts
Impact	https://www.nesta.org.uk/news/political-futures-tracker
Start Year	2015


Description	TechCity UK (2018) - Analysis of Reddit posts
Organisation	Tech City UK
Country	United Kingdom
Sector	Public
PI Contribution	Analysis of social media content, to establish young people's attitudes towards careers in ICT - https://talent.techcityuk.com/future-talent-key-findings/
Collaborator Contribution	Manual data analysis, visualisation, and report write-up
Impact	https://talent.techcityuk.com/future-talent-key-findings/
Start Year	2017


Title	GATE Crowdsourcing Plugin
Description	The GATE Crowdsourcing plugin provides tools for two main types of crowdsourcing tasks: a). annotation: present crowdworkers with a snippet of text (e.g. a sentence) and ask them to mark all the mentions of a particular annotation type. b). classi?cation: present crowdworkers with a snippet of text containing an existing annotation with several possible labels (e.g. meanings of a word), and ask them to select the most appropriate label (or "none of the above").
Type Of Technology	Webtool/Application
Year Produced	2014
Impact	The application is open-source. Crowdsourcing is an increasingly popular, collaborative approach for acquiring annotated corpora. Despite this, reuse of corpus conversion tools and user interfaces between projects is still problematic, since these are not generally made available. The GATE Crowdsourcing plugin offers infrastructural support for mapping documents to crowdsourcing units and back, as well as automatically generating reusable crowdsourcing interfaces for NLP classification and sequence annotation tasks.
URL	https://gate.ac.uk/wiki/crowdsourcing.html


Title	GATE Twitter part-of-speech tagger
Description	The tagger is an adapted and augmented version of a leading CRF-based tagger, customised for English tweets. It's released as both a GATE PR and also a standalone command-line tool (Java, so any operating system). It achieves 91% accuracy on tokens on our evaluation set, which is very high for this genre. Importantly, it has relatively high whole-sentence-correct performance.
Type Of Technology	Webtool/Application
Year Produced	2013
Impact	The tagger is available open-source. Part-of-speech tagging tweets is hard. The tagger achieves competitive accuracy, and uses the Penn Treebank tagset, so that all other tools should integrate seamlessly.
URL	https://gate.ac.uk/wiki/twitter-postagger.html


Title	Rumour veracity classifier
Description	User generated content such as tweets often make claims that are unsubstantiated and possibly untrue. This service attempts to classify whether a text is discussing a rumour that is true, false or unverified. Our approach makes use of only the tweet content, which it passes through LSTM units that learn to distinguish between the three classes we aim to predict (true, false or unverifiable). However, the unique part of our approach is that prior to passing the tweet to the LSTM layer, it first looks within the tweet for some recurring information that is typically used by others to spread rumours, and makes adjustments on the input -- words carrying useful information are kept as they are, and others are downgraded in terms of contribution. This is achieved through attention layer implementation. We evaluated our approach on the RumourEval shared task 2017 test data and achieved over 60% accuracy, which is currently the state-of-the-art performance for this task.
Type Of Technology	Webtool/Application
Year Produced	2018
Impact	n/a
URL	https://cloud.gate.ac.uk/shopfront/displayItem/rumour-veracity


Title	The Political Futures Pipeline
Description	A pipeline designed to detect political topics, UK politician names (as valid at the 2017 General Election), abusive terms and sentiment, in addition to Twitter-specific data such as location (NUTS) where possible, hashtag, user names etc. It works best on tweets in the original Twitter JSON input format. Upload your own or harvest some with our Twitter Collector.
Type Of Technology	Webtool/Application
Year Produced	2018
Impact	n/a


Title	TwitIE
Description	TwitIE is an open-source GATE pipeline for Information Extraction over tweets, one of the noisiest forms of social media text.
Type Of Technology	Webtool/Application
Year Produced	2013
Impact	The application is open-source. NLP on social media data is hard. Content is often brief, contains mistakes, lacks context, and is uncurated - very different from the well-formed news text that tools typically operate over. TwitIE is a GATE pipeline for Information Extraction over tweets, one of the noisiest forms of social media text.
URL	https://gate.ac.uk/wiki/twitie.html


Title	Twitter user classification
Description	A pipeline to attempt to classify the author of a tweet as either a person, location or organization, based on clues found in their "user" profile metadata within the tweet. Within each broad "major type" a number of narrower "minor type" categories are also used. Output is given as an annotation AuthorClassification spanning the whole document, and when Twitter JSON is selected as the output format the classification is also added as a property "gate_classification" to the top-level "user" object in the tweet.
Type Of Technology	Webtool/Application
Year Produced	2018
Impact	n/a


Title	YODIE Named Entity Disambiguation (English)
Description	YODIE is a named entity recognition and disambiguation system that identifies various types of named entities in text and attempts to link them to the most appropriate concept label in DBpedia. This version of YODIE operates on documents in English.
Type Of Technology	Webtool/Application
Year Produced	2018
Impact	n/a


Description	3 Million Brexit Tweets Reveal Leave Voters Talked About Immigration More Than Anything Else
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Media reporting on our groundbreaking analysis of tweets on the EU membership referendum
Year(s) Of Engagement Activity	2016
URL	https://www.buzzfeed.com/jamesball/3-million-brexit-tweets-reveal-leave-voters-talked-about-imm


Description	9th GATE Training Course: Mining social media content with GATE
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Summer school lectures on social media analysis
Year(s) Of Engagement Activity	2016


Description	A workshop on fake news analysis
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Around 50 researchers, students, and participants from industry participated in the practical workshop, learning about tools and methods for analysing fake news and misinformation on social media.
Year(s) Of Engagement Activity	2017
URL	http://www.sobigdata.eu/computational-fake-news-analysis-practical-workshop


Description	AAAI 2015
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	'Towards detecting rumours in social media' - presented at AAAI Workshop on AI for Cities 2015
Year(s) Of Engagement Activity	2015


Description	Blog on On GATE, Text Analysis, Summarisation, Linked Data and Mining Social Media
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	A blog on my research on text mining and summarisation of social media, with focus on Twitter.
Year(s) Of Engagement Activity	2012,2014,2015,2016,2017


Description	Blog post on the Brexit Analyser
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	A blog post on our work on analysing tweets on the EU membership referendum
Year(s) Of Engagement Activity	2016
URL	http://gate4ugc.blogspot.co.uk/2016/07/the-tools-behind-our-brexit-analyser.html


Description	GATE for social media mining - a tutorial
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	A half-day tutorial on text mining and summarisation of Twitter, Facebook, and other social media .
Year(s) Of Engagement Activity	2013


Description	Invited talk - University of Cambridge
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Invited talk at the University of Cambridge, which resulted in questions and discussions afterwards with researchers in related fields.
Year(s) Of Engagement Activity	2015
URL	http://talks.cam.ac.uk/talk/index/53855


Description	Invited talk at British Telecom
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Industry/Business
Results and Impact	Invited talk at British Telecom, Ipswich, UK, on Natural Language Processing for Social Media
Year(s) Of Engagement Activity	2015


Description	Keynote talk at the CHIST-ERA 2011 conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	At present, 200 million Twitter users send 140 million tweets a day, Facebook has 750 million active users, who spend over 700 billion minutes per month on the site, and increasingly knowledge is generated utilising the "wisdom of the crowd", on Wikipedia, Quora, and other similar sites. This unprecedented rise in the volume and importance of online textual content has resulted in companies and individuals increasingly struggling with information overload, or, as Clay Shirky defines it - a filter failure. This talk will discuss how text analytics and natural language processing can help address these issues, through the development of methods capable of extracting useful knowledge from noisy, contradictory content; inferring an individual's information seeking goals; offering personalised information access, and making use of distributed human computation, by harnessing the knowledge of a large number of humans. I will also touch upon the challenge of developing research infrastructures for experimentation with large-scale Text-to-Knowledge (T2K) analytics, at affordable costs for research teams and companies. http://conference2011.chistera.eu/communication/kalina-bontcheva
Year(s) Of Engagement Activity	2011
URL	http://conference2011.chistera.eu/communication/kalina-bontcheva


Description	Oral evidence at the UK DCMS parliamentary committee enquiry on fake news
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Policymakers/politicians
Results and Impact	Gave oral oral evidence to the DCM&S parliamentary committee, based on my research on methods for analysing misinformation on social media. Questions included: What is 'fake news'? Where does biased but legitimate commentary shade into propaganda and lies? What impact has fake news on public understanding of the world, and also on the public response to traditional journalism? If all views are equally valid, does objectivity and balance lose all value? Is there any difference in the way people of different ages, social backgrounds, genders etc use and respond to fake news? Have changes in the selling and placing of advertising encouraged the growth of fake news, for example by making it profitable to use fake news to attract more hits to websites, and thus more income from advertisers?
Year(s) Of Engagement Activity	2017
URL	http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/digital-culture-med...


Description	PHEME booth at ICT 2015
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Opportunity to engage and network with the 6000+ participants at the event
Year(s) Of Engagement Activity	2015


Description	PROMISE Winter School 2013 Bridging between Information Retrieval and Databases
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The aim of the PROMISE Winter School 2013 on "Bridging between Information Retrieval and Databases" is to give participants a grounding in the core topics that constitute the multidisciplinary area of information access and retrieval to unstructured, semi-structured, and structured information. The school is a week-long event consisting of guest lectures from invited speakers who are recognized experts in the field. The school is intended for PhD students, Masters students or senior researchers such as post-doctoral researchers form the fields of databases, information retrieval, and related fields.
Year(s) Of Engagement Activity	2013
URL	http://www.promise-noe.eu/documents/10156/91889f81-86ba-46e4-9f50-3186d1be2340


Description	Participation in a Wilton Park dialogue on Computational propaganda
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Wilton Park dialogue: Computational propagandaWednesday 21 - Friday 23 March 2018 \| WP1605 A specially convened dialogue. This 'offline' event is invite only and was held under the 'Wilton Park Protocol', whereby all discussion is off the record.
Year(s) Of Engagement Activity	2018


Description	Presentation at EACL'2014
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Tutorial: Natural language processing for social media at EACL'2014
Year(s) Of Engagement Activity	2014
URL	http://eacl2014.org/tutorial-social-media


Description	RDSM Workshop 2015
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshop: Rumors and Deception on Social Media: Detection, Tracking, and Visualization at WWW 2015, RDSM Workshop
Year(s) Of Engagement Activity	2015


Description	Twitter account - https://twitter.com/kbontcheva
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Kalina Bontcheva tweets about her research
Year(s) Of Engagement Activity	2010,2011,2012,2013,2014,2015,2016,2017


Description	Workshop on Scalability in Natural Language Processing
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	This workshop, held with RANLP 2013, aims to introduce contemporary work and to discuss novel methods for natural language processing at a large scale, and explore how the resulting technology and methods can be reused in applications both on the Web and in the physical world.
Year(s) Of Engagement Activity	2013


Description	Workshops at 7th GATE summer school
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshops in Mining social media with GATE at the 7th GATE summer school.
Year(s) Of Engagement Activity	2013,2014
URL	https://gate.ac.uk/conferences/fig/fig7.html


Description	Written evidence submitted to the EC public consultation on fake news
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Policymakers/politicians
Results and Impact	The consultation collected written submissions answering the following questions: Definition of fake information and their spread online Assessment of measures already taken by platforms, news media companies and civil society organisations to counter the spread of fake information online Scope for future actions to strengthen quality information and prevent the spread of disinformation online.
Year(s) Of Engagement Activity	2017
URL	https://ec.europa.eu/info/consultations/public-consultation-fake-news-and-online-disinformation_en

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications