Machine Learning Methods for Personalised, Abstractive Summarisation of Consumer-Generated Media
Lead Research Organisation:
University of Sheffield
Department Name: Computer Science
Abstract
The success of Web 2.0 and CGM is based on tapping into the social nature of human interactions, by making it possible for people to voice their opinion, become part of a virtual community and collaborate remotely. If we take micro-blogging as an example, the growth in Twitter visits between 2008 and 2009 was over 1,000% and it is projected that by 2010 around 10% of all internet users will be on Twitter. This unprecedented rise in the volume and importance of online content has resulted in companies and individuals spending ever increasing amounts of time trying to keep up with relevant CGM. It is estimated that 700 person hours per year is the absolute minimum that companies and public services need to spend on CGM monitoring, online user engagement, and discovery of new information. This fellowship is about helping people to cope with the resulting information overload, through automatic methods that are capable of adapting to individual's information seeking goals and summarising briefly the relevant media and thus supporting information interpretation and decision making. Automatic text summarisation is key to our goal and consists of compressing the meaning of text documents while preserving the relevant information contained within them. While there has been a lot of research on well-authored texts such as news, summarisation of social media is still in its infancy, with research focused on product reviews. A key experimental finding has been that due to the characteristics of social media (product reviews in particular) it is better first to abstract the relevant information from the different documents and sites and then to use natural language generation to create a fluent text based on this information.In this fellowship I will investigate and evaluate new machine learning methods for personalised, abstractive multi-document summarisation across different social media. For example, diachronic summaries that combine Twitter posts, blog articles, and Facebook wall messages on a given topic. In contrast to previous work, we will pursue an inter-disciplinary approach, which will help us study the social dimension of CGM summarisation and establish actual user needs. The second research challenge is that the algorithms need to be robust in the face of this noisy, jargon-full and dynamic content, as well as needing models capable of representing the contradictory and strongly temporal nature of CGM. A key novel contribution of our work is personalising the summaries, based on a model of user interests, goals, and social context. Issues such as trustworthiness, privacy, and online communities (with their hubs and authorities) will also play an important role. The fourth research challenge is to generate personalised abstractive summaries that can help users with sensemaking and content interpretation. An exciting element of my research will be in studying the different kinds of summaries that are useful for a variety of real users (companies, journalists, and the general public) through multi-disciplinary collaborations with the Press Association, British Telecom, the Oxford Internet Institute, and Sheffield's Department of Journalism. A key project deliverable will be a publicly available browser plugin that provides easy access to the automatically generated summaries. This will allow me to evaluate the project results with real users, on a large scale. It will also provide a new evaluation challenge for the Natural Language Generation community, as researchers will be able to compare their summarisers against those delivered by our open-source algorithms. Last but not least, the fellowship covers not only foundational multi-disciplinary research but it also tests the results in several Digital Economy pilot experiments involving commercial partners (The Press Association, British Telecom, Fizzback).
Planned Impact
Since consumer-generated media have revolutionised our personal lives, the economy, and society as whole, this research has wide-ranging implications and relevance. The five year duration of this fellowship will give us the necessary flexibility, time and resources to undertake dissemination, exploitation, and training activities aimed at the multiple beneficiaries detailed below. Firstly, in order to achieve maximum impact, project results will be made open-source, which will include both computational algorithms and the data collections on which they were tested and evaluated. Having open-source results is needed not only because we aim to create an active research community around the project results, but also since we want to promote use by non-governmental organisations, educational institutions, and other non-profit users. Another important project result will be a free tool to help users become aware of the dangers of social media and allow them to monitor continuously what is being disclosed about them online (by themselves, family, and friends). In terms of economic impact, we will target: (i) the areas used as Digital Economy pilots, namely digital journalism, voice-of-the-customer service providers, and online brand, product, and reputation management; (ii) companies providing internet privacy and security services; (iii) companies providing product comparison services to consumers. Major beneficiaries are the Digital Economy sectors identified above. In addition to the project partners (BT, PA, Fizzback, and Nokia), through UK and international research projects the PI has built successful industrial collaborations with other large companies (Yahoo, Atos, Dassault Aviation, Elsevier, MPS Bank, Creditreform, NetInfo) and SMEs (Garlik, Innovantage, Ontotext, Matrixware, Mondeca, Ontoprise, ISOCO), where our previous research showed the potential of intelligent summarisation technology for knowledge management and business intelligence. Further knowledge transfer opportunities arise through the Sheffield University connections to the digital and new media industries in the Sheffield city region, which are growing at a faster rate than anywhere else in the UK in terms of specialist companies and new jobs. A unique opportunity arises also from the 100m South Yorkshire Digital Region project, which will pilot the Next Generation Broadband and thus provide the required infrastructure for advanced digital economy applications. The research proposed here also has significant societal relevance. The first dimension is user privacy and trust, which are an important element of our work. We plan to promote the new technology to users and companies with identity theft products (e.g. Garlik, with whom we have worked in the past), in order to help people with tracking personal data divulged on public web sites and social networks. This research is relevant to policies such as: Digital Britain, EU's e2010, Global Focus on ICT in Development. We will reach out towards policy makers through the new UK Ethical ICT and young people stakeholder group and through Sheffield's Digital World outreach activities. Last, but not least, this fellowship will impact significantly the research careers of all team members. It will provide the PI with the unique opportunity to realise a step-change in her career, in order to join the 7% of female grade A staff in engineering and technology. The inter-disciplinary nature of our research team will help researchers to gain knowledge of fields complementary to their primary expertise and to develop new cross-disciplinary research skills. The PhD students will gain transferable research and presentation skills and be prepared for post-doctoral positions in academic or applied research.
Organisations
- University of Sheffield (Lead Research Organisation)
- University of Bergen (Collaboration)
- BuzzFeed, Inc. (Collaboration)
- University of Warwick (Collaboration)
- NESTA (Collaboration)
- Ontotext (Bulgaria) (Collaboration)
- BT Group (Collaboration)
- UNIVERSITY OF OXFORD (Collaboration)
- The British Library (Collaboration)
- Fizzback (Collaboration)
- Tech City UK (Collaboration)
- Press Association (Collaboration, Project Partner)
- UNIVERSITY OF BIRMINGHAM (Collaboration)
- RELX Group (Netherlands) (Project Partner)
- BT Group (United Kingdom) (Project Partner)
- The Fizzback Group Ltd. (Project Partner)
- Nokia (Finland) (Project Partner)
- University of Oxford (Project Partner)
People |
ORCID iD |
Kalina Bontcheva (Principal Investigator) |
Publications
Wang Hang (Author)
(2011)
Transition of Legacy Systems to Semantically Enabled Applications: TAO Method and Tools
in Semantic Web
Lytos Anastasios
(2019)
The evolution of argumentation mining: From models to social media and emerging tools
in arXiv e-prints
Srijith P
(2017)
Sub-story detection in Twitter with hierarchical Dirichlet processes
in Information Processing & Management
Srijith P. K.
(2016)
Sub-Story Detection in Twitter with Hierarchical Dirichlet Processes
in arXiv e-prints
Augenstein Isabelle
(2016)
Stance Detection with Bidirectional Conditional Encoding
in arXiv e-prints
Derczynski, L
(2014)
Spatio-temporal grounding of claims made on the web, in PHEME
Bontcheva Kalina
(2013)
Social Media and Information Overload: Survey Results
in arXiv e-prints
Description | The research has focused on analysing and summarising large-volume, high velocity social media content, e.g. during election campaigns. A number of open-source software tools and datasets have been developed and made available for reuse. Successful collaborations and follow-up projects have been established, alongside a set of high profile impact outcomes around analysing online misinformation and fake news. |
Exploitation Route | Citations of the scientific publications; use of the software tools and datasets created; reports on the work by media organisations; take-up by companies and associated consulting opportunities. |
Sectors | Digital/Communication/Information Technologies (including Software) |
Description | The social media analysis and online abuse analysis tools developed as part of this award have been used to produce policy white papers and pursue impact-oriented projects with DCMS and FCDO. |
First Year Of Impact | 2017 |
Sector | Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Government, Democracy and Justice |
Impact Types | Societal,Economic,Policy & public services |
Description | Public Witness at Enquiry on Fake News by the UK Digital, Culture, Media and Sport Parliamentary Committee |
Geographic Reach | National |
Policy Influence Type | Contribution to a national consultation/review |
URL | http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/digital-culture-med... |
Description | Research was cited in this DCMS report |
Geographic Reach | National |
Policy Influence Type | Citation in other policy documents |
URL | https://publications.parliament.uk/pa/cm201719/cmselect/cmcumeds/363/363.pdf |
Description | 2018 Faculty Research Award |
Amount | $56,000 (USD) |
Organisation | |
Sector | Private |
Country | United States |
Start | 05/2018 |
End | 12/2020 |
Description | DecarboNet |
Amount | € 401,593 (EUR) |
Funding ID | 610829 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 10/2013 |
End | 09/2016 |
Description | EnviLOD |
Amount | £55,816 (GBP) |
Organisation | Jisc |
Sector | Public |
Country | United Kingdom |
Start | 10/2012 |
End | 04/2013 |
Description | GATE Cloud Exploratory: Adapting the General Architecture for Text Engineering to Cloud Computing |
Amount | £71,677 (GBP) |
Funding ID | EP/I034092/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 01/2011 |
End | 09/2011 |
Description | Horizon 2020 - SoBigData |
Amount | € 986,125 (EUR) |
Funding ID | 654024 |
Organisation | European Commission H2020 |
Sector | Public |
Country | Belgium |
Start | 09/2015 |
End | 08/2019 |
Description | Horizon 2020 COMRADES |
Amount | € 321,250 (EUR) |
Funding ID | 687847 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2016 |
End | 12/2018 |
Description | Horizon 2020 KNOWMAK |
Amount | € 249,750 (EUR) |
Funding ID | 726992 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2017 |
End | 12/2019 |
Description | Horizon 2020 WeVerify |
Amount | € 2,931,000 (EUR) |
Funding ID | 825297 |
Organisation | European Commission H2020 |
Sector | Public |
Country | Belgium |
Start | 12/2018 |
End | 11/2021 |
Description | PHEME |
Amount | € 2,916,000 (EUR) |
Funding ID | No. 611233 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2014 |
End | 12/2016 |
Description | This Digital News Initiative Innovation Funding |
Amount | € 50,000 (EUR) |
Organisation | |
Sector | Private |
Country | United States |
Start | 01/2019 |
End | 12/2019 |
Description | TrendMiner |
Amount | € 621,707 (EUR) |
Funding ID | 287863 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 11/2011 |
End | 10/2014 |
Description | uComp: Embedded Human Computation for Knowledge Extraction and Evaluation |
Amount | £375,621 (GBP) |
Funding ID | EP/K017896/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2012 |
End | 11/2015 |
Description | Buzzfeed UK - Analysis of misinformation and political abuse on Twitter during the UK EU membership referendum and the 2017 General Election |
Organisation | BuzzFeed, Inc. |
Country | United States |
Sector | Private |
PI Contribution | Analysed topics, voting intentions, online abuse, and suspected bot accounts: |
Collaborator Contribution | They did manual post-analysis of the data and data visualisations, for presentation to general readers. |
Impact | - https://www.buzzfeed.com/jamesball/3-million-brexit-tweets-reveal-leave-voters-talked-about-imm? - https://www.buzzfeed.com/tomphillips/twitter-abuse-of-mps-during-the-election-doubled-after-the - https://www.buzzfeed.com/tomphillips/we-found-45-suspected-bot-accounts-sharing-pro-trump-pro |
Start Year | 2016 |
Description | Collaboration with Asimina Vasilau, University of Birmingham |
Organisation | University of Birmingham |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Collaboration around analysis and summarisation of privacy issues discussed in social media |
Start Year | 2012 |
Description | Collaboration with British Library on their Envia project |
Organisation | The British Library |
Country | United Kingdom |
Sector | Public |
PI Contribution | ? A short feasibility study for the British Library?s Envia project, around use of text mining to improve the quality of metadata in their Envia digital library of environmental science content. A bid for short follow-up collaborative project has been submitted to the JISC Digital Infrastructure programme . |
Start Year | 2011 |
Description | Collaboration with British Telecom |
Organisation | BT Group |
Country | United Kingdom |
Sector | Private |
PI Contribution | Cllaboration around next-generation business intelligence and trend tracking in social media. In the final year of the project, once the CGM summarisation algorithms are mature and tested, we will work together on using these to analyse information from micro-blogs and forums, in order to improve customer relationship management. |
Start Year | 2010 |
Description | Collaboration with Fizzback |
Organisation | Fizzback |
Country | United Kingdom |
Sector | Private |
PI Contribution | Fizzback provide customer engagement services to UK companies in transport, communications, and retail. In the forth project year, we will implement a DE pilot to study summarisation of customer feedback on transport services, received through noisier channels such as text messages and Twitter. |
Start Year | 2010 |
Description | Collaboration with Ontotext |
Organisation | Ontotext |
Country | Bulgaria |
Sector | Private |
PI Contribution | Ontotext is a Bulgarian SME specialising in semantic technology. I have been collaborating with them around the challenge of using Linked Data and their award-winning OWLIM semantic repository as knowledge sources for the CGM text mining and summarisation algorithms. |
Start Year | 2010 |
Description | Collaboration with Rob Procter, Univ. of Warwick |
Organisation | University of Warwick |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | The open-source social media analysis and summarisation tools |
Collaborator Contribution | Qualitative studies of social media and annotated datasets |
Impact | Workshop co-organised at WWW'2015. Multidisciplinary collaboration with computational social science researchers |
Start Year | 2013 |
Description | Collaboration with the NTAP project (http://ntap.no/) |
Organisation | University of Bergen |
Country | Norway |
Sector | Academic/University |
PI Contribution | The NTAP project is developing methods and tools to detect, analyse and visualize the distribution, flow and development of knowledge and opinions across online social networks. This is a collaboration between the Department of Information Science and Media Science at the University of Bergen and Uni Computing, a department in Uni Research (a university-owned research company); the project has several international and industrial partners. NTAP is funded by the Norwegian Research Council, VERDIKT program, and runs from January 2012 ? July 2015. |
Start Year | 2013 |
Description | Collaboration with the Oxford Internet Institute (OII) |
Organisation | University of Oxford |
Department | Oxford Internet Institute |
Country | United Kingdom |
Sector | Private |
PI Contribution | I am collaborating with social scientists from the OII, due to their expertise in social network analysis, trust and privacy, and visualising online networks. We work together on using these to build models of users? digital identity, based on information disclosed by them in social media. Our current focus is on geo-locating wikipedia contributors and twitter users. |
Start Year | 2010 |
Description | Collaboration with the Press Association |
Organisation | Press Association |
Country | United Kingdom |
Sector | Private |
PI Contribution | The PA is the UK's leading multi-media news and information provider and supplier of business-to-business media services. We are working together on CGM summarisation to support the needs of digital journalism. I have already interviewed several journalists as part of the user requirements gathering stage. Later on, they will be approached for evaluation. In addition, I have been training some of their staff in text mining tools, most notably using our GATE open-source text mining toolkit. Further knowledge transfer opportunities will be pursued through a related European project, which is currently under negotiation. |
Start Year | 2010 |
Description | Nesta (from 2015) |
Organisation | Nesta |
Country | United Kingdom |
Sector | Charity/Non Profit |
PI Contribution | Collaboration on Political Futures Tracker - analysing social media in the run up to the 2015 General Election to uncover the top political themes, how positive or negative people feel about them, and how far parties and politicians are looking to the future |
Collaborator Contribution | Data visualisation and writing up of the blog posts |
Impact | https://www.nesta.org.uk/news/political-futures-tracker |
Start Year | 2015 |
Description | TechCity UK (2018) - Analysis of Reddit posts |
Organisation | Tech City UK |
Country | United Kingdom |
Sector | Public |
PI Contribution | Analysis of social media content, to establish young people's attitudes towards careers in ICT - https://talent.techcityuk.com/future-talent-key-findings/ |
Collaborator Contribution | Manual data analysis, visualisation, and report write-up |
Impact | https://talent.techcityuk.com/future-talent-key-findings/ |
Start Year | 2017 |
Title | GATE Crowdsourcing Plugin |
Description | The GATE Crowdsourcing plugin provides tools for two main types of crowdsourcing tasks: a). annotation: present crowdworkers with a snippet of text (e.g. a sentence) and ask them to mark all the mentions of a particular annotation type. b). classi?cation: present crowdworkers with a snippet of text containing an existing annotation with several possible labels (e.g. meanings of a word), and ask them to select the most appropriate label (or "none of the above"). |
Type Of Technology | Webtool/Application |
Year Produced | 2014 |
Impact | The application is open-source. Crowdsourcing is an increasingly popular, collaborative approach for acquiring annotated corpora. Despite this, reuse of corpus conversion tools and user interfaces between projects is still problematic, since these are not generally made available. The GATE Crowdsourcing plugin offers infrastructural support for mapping documents to crowdsourcing units and back, as well as automatically generating reusable crowdsourcing interfaces for NLP classification and sequence annotation tasks. |
URL | https://gate.ac.uk/wiki/crowdsourcing.html |
Title | GATE Twitter part-of-speech tagger |
Description | The tagger is an adapted and augmented version of a leading CRF-based tagger, customised for English tweets. It's released as both a GATE PR and also a standalone command-line tool (Java, so any operating system). It achieves 91% accuracy on tokens on our evaluation set, which is very high for this genre. Importantly, it has relatively high whole-sentence-correct performance. |
Type Of Technology | Webtool/Application |
Year Produced | 2013 |
Impact | The tagger is available open-source. Part-of-speech tagging tweets is hard. The tagger achieves competitive accuracy, and uses the Penn Treebank tagset, so that all other tools should integrate seamlessly. |
URL | https://gate.ac.uk/wiki/twitter-postagger.html |
Title | Rumour veracity classifier |
Description | User generated content such as tweets often make claims that are unsubstantiated and possibly untrue. This service attempts to classify whether a text is discussing a rumour that is true, false or unverified. Our approach makes use of only the tweet content, which it passes through LSTM units that learn to distinguish between the three classes we aim to predict (true, false or unverifiable). However, the unique part of our approach is that prior to passing the tweet to the LSTM layer, it first looks within the tweet for some recurring information that is typically used by others to spread rumours, and makes adjustments on the input -- words carrying useful information are kept as they are, and others are downgraded in terms of contribution. This is achieved through attention layer implementation. We evaluated our approach on the RumourEval shared task 2017 test data and achieved over 60% accuracy, which is currently the state-of-the-art performance for this task. |
Type Of Technology | Webtool/Application |
Year Produced | 2018 |
Impact | n/a |
URL | https://cloud.gate.ac.uk/shopfront/displayItem/rumour-veracity |
Title | The Political Futures Pipeline |
Description | A pipeline designed to detect political topics, UK politician names (as valid at the 2017 General Election), abusive terms and sentiment, in addition to Twitter-specific data such as location (NUTS) where possible, hashtag, user names etc. It works best on tweets in the original Twitter JSON input format. Upload your own or harvest some with our Twitter Collector. |
Type Of Technology | Webtool/Application |
Year Produced | 2018 |
Impact | n/a |
Title | TwitIE |
Description | TwitIE is an open-source GATE pipeline for Information Extraction over tweets, one of the noisiest forms of social media text. |
Type Of Technology | Webtool/Application |
Year Produced | 2013 |
Impact | The application is open-source. NLP on social media data is hard. Content is often brief, contains mistakes, lacks context, and is uncurated - very different from the well-formed news text that tools typically operate over. TwitIE is a GATE pipeline for Information Extraction over tweets, one of the noisiest forms of social media text. |
URL | https://gate.ac.uk/wiki/twitie.html |
Title | Twitter user classification |
Description | A pipeline to attempt to classify the author of a tweet as either a person, location or organization, based on clues found in their "user" profile metadata within the tweet. Within each broad "major type" a number of narrower "minor type" categories are also used. Output is given as an annotation AuthorClassification spanning the whole document, and when Twitter JSON is selected as the output format the classification is also added as a property "gate_classification" to the top-level "user" object in the tweet. |
Type Of Technology | Webtool/Application |
Year Produced | 2018 |
Impact | n/a |
Title | YODIE Named Entity Disambiguation (English) |
Description | YODIE is a named entity recognition and disambiguation system that identifies various types of named entities in text and attempts to link them to the most appropriate concept label in DBpedia. This version of YODIE operates on documents in English. |
Type Of Technology | Webtool/Application |
Year Produced | 2018 |
Impact | n/a |
Description | 3 Million Brexit Tweets Reveal Leave Voters Talked About Immigration More Than Anything Else |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Media reporting on our groundbreaking analysis of tweets on the EU membership referendum |
Year(s) Of Engagement Activity | 2016 |
URL | https://www.buzzfeed.com/jamesball/3-million-brexit-tweets-reveal-leave-voters-talked-about-imm |
Description | 9th GATE Training Course: Mining social media content with GATE |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Summer school lectures on social media analysis |
Year(s) Of Engagement Activity | 2016 |
Description | A workshop on fake news analysis |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Around 50 researchers, students, and participants from industry participated in the practical workshop, learning about tools and methods for analysing fake news and misinformation on social media. |
Year(s) Of Engagement Activity | 2017 |
URL | http://www.sobigdata.eu/computational-fake-news-analysis-practical-workshop |
Description | AAAI 2015 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | 'Towards detecting rumours in social media' - presented at AAAI Workshop on AI for Cities 2015 |
Year(s) Of Engagement Activity | 2015 |
Description | Blog on On GATE, Text Analysis, Summarisation, Linked Data and Mining Social Media |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | A blog on my research on text mining and summarisation of social media, with focus on Twitter. |
Year(s) Of Engagement Activity | 2012,2014,2015,2016,2017 |
Description | Blog post on the Brexit Analyser |
Form Of Engagement Activity | Engagement focused website, blog or social media channel |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | A blog post on our work on analysing tweets on the EU membership referendum |
Year(s) Of Engagement Activity | 2016 |
URL | http://gate4ugc.blogspot.co.uk/2016/07/the-tools-behind-our-brexit-analyser.html |
Description | GATE for social media mining - a tutorial |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | A half-day tutorial on text mining and summarisation of Twitter, Facebook, and other social media . |
Year(s) Of Engagement Activity | 2013 |
Description | Invited talk - University of Cambridge |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Invited talk at the University of Cambridge, which resulted in questions and discussions afterwards with researchers in related fields. |
Year(s) Of Engagement Activity | 2015 |
URL | http://talks.cam.ac.uk/talk/index/53855 |
Description | Invited talk at British Telecom |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Industry/Business |
Results and Impact | Invited talk at British Telecom, Ipswich, UK, on Natural Language Processing for Social Media |
Year(s) Of Engagement Activity | 2015 |
Description | Keynote talk at the CHIST-ERA 2011 conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | At present, 200 million Twitter users send 140 million tweets a day, Facebook has 750 million active users, who spend over 700 billion minutes per month on the site, and increasingly knowledge is generated utilising the "wisdom of the crowd", on Wikipedia, Quora, and other similar sites. This unprecedented rise in the volume and importance of online textual content has resulted in companies and individuals increasingly struggling with information overload, or, as Clay Shirky defines it - a filter failure. This talk will discuss how text analytics and natural language processing can help address these issues, through the development of methods capable of extracting useful knowledge from noisy, contradictory content; inferring an individual's information seeking goals; offering personalised information access, and making use of distributed human computation, by harnessing the knowledge of a large number of humans. I will also touch upon the challenge of developing research infrastructures for experimentation with large-scale Text-to-Knowledge (T2K) analytics, at affordable costs for research teams and companies. http://conference2011.chistera.eu/communication/kalina-bontcheva |
Year(s) Of Engagement Activity | 2011 |
URL | http://conference2011.chistera.eu/communication/kalina-bontcheva |
Description | Oral evidence at the UK DCMS parliamentary committee enquiry on fake news |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Policymakers/politicians |
Results and Impact | Gave oral oral evidence to the DCM&S parliamentary committee, based on my research on methods for analysing misinformation on social media. Questions included: What is 'fake news'? Where does biased but legitimate commentary shade into propaganda and lies? What impact has fake news on public understanding of the world, and also on the public response to traditional journalism? If all views are equally valid, does objectivity and balance lose all value? Is there any difference in the way people of different ages, social backgrounds, genders etc use and respond to fake news? Have changes in the selling and placing of advertising encouraged the growth of fake news, for example by making it profitable to use fake news to attract more hits to websites, and thus more income from advertisers? |
Year(s) Of Engagement Activity | 2017 |
URL | http://data.parliament.uk/writtenevidence/committeeevidence.svc/evidencedocument/digital-culture-med... |
Description | PHEME booth at ICT 2015 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Opportunity to engage and network with the 6000+ participants at the event |
Year(s) Of Engagement Activity | 2015 |
Description | PROMISE Winter School 2013 Bridging between Information Retrieval and Databases |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | The aim of the PROMISE Winter School 2013 on "Bridging between Information Retrieval and Databases" is to give participants a grounding in the core topics that constitute the multidisciplinary area of information access and retrieval to unstructured, semi-structured, and structured information. The school is a week-long event consisting of guest lectures from invited speakers who are recognized experts in the field. The school is intended for PhD students, Masters students or senior researchers such as post-doctoral researchers form the fields of databases, information retrieval, and related fields. |
Year(s) Of Engagement Activity | 2013 |
URL | http://www.promise-noe.eu/documents/10156/91889f81-86ba-46e4-9f50-3186d1be2340 |
Description | Participation in a Wilton Park dialogue on Computational propaganda |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Policymakers/politicians |
Results and Impact | Wilton Park dialogue: Computational propagandaWednesday 21 - Friday 23 March 2018 | WP1605 A specially convened dialogue. This 'offline' event is invite only and was held under the 'Wilton Park Protocol', whereby all discussion is off the record. |
Year(s) Of Engagement Activity | 2018 |
Description | Presentation at EACL'2014 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Tutorial: Natural language processing for social media at EACL'2014 |
Year(s) Of Engagement Activity | 2014 |
URL | http://eacl2014.org/tutorial-social-media |
Description | RDSM Workshop 2015 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Workshop: Rumors and Deception on Social Media: Detection, Tracking, and Visualization at WWW 2015, RDSM Workshop |
Year(s) Of Engagement Activity | 2015 |
Description | Twitter account - https://twitter.com/kbontcheva |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Kalina Bontcheva tweets about her research |
Year(s) Of Engagement Activity | 2010,2011,2012,2013,2014,2015,2016,2017 |
Description | Workshop on Scalability in Natural Language Processing |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | This workshop, held with RANLP 2013, aims to introduce contemporary work and to discuss novel methods for natural language processing at a large scale, and explore how the resulting technology and methods can be reused in applications both on the Web and in the physical world. |
Year(s) Of Engagement Activity | 2013 |
Description | Workshops at 7th GATE summer school |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Workshops in Mining social media with GATE at the 7th GATE summer school. |
Year(s) Of Engagement Activity | 2013,2014 |
URL | https://gate.ac.uk/conferences/fig/fig7.html |
Description | Written evidence submitted to the EC public consultation on fake news |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Policymakers/politicians |
Results and Impact | The consultation collected written submissions answering the following questions: Definition of fake information and their spread online Assessment of measures already taken by platforms, news media companies and civil society organisations to counter the spread of fake information online Scope for future actions to strengthen quality information and prevent the spread of disinformation online. |
Year(s) Of Engagement Activity | 2017 |
URL | https://ec.europa.eu/info/consultations/public-consultation-fake-news-and-online-disinformation_en |