Digital Social Research Tools, Tension Indicators and Safer Communities: a demonstration of the Cardiff Digital Research Platform (CDRP)

Lead Research Organisation: Cardiff University
Department Name: Sch of Social Sciences

Abstract

This demonstrator project will develop software tools for harvesting data from social media sites (such as facebook) focusing, in particular, on how such data can be automatically aggregated and stored for subsequent analysis. The demonstrator will make use of an existing API to ensure an early prototype is available to start the pilot study (in tension indicators/community cohesion indicators), followed by further refinement of the prototype alongside the Cardiff Digital Research Platform (CDRP -- described in appendix 1, figure 1 in the case for support). The tools implemented within the demonstrator are therefore part of the larger CDRP, intending to provide a software infrastructure for supporting data collection and analysis (covering both data analytics and visualisation) from real time on-line sources, along with traditional Web-based sources and expert curated data collections (provided and managed by government agencies). The tools will be validated in a "community cohesion" pilot study by focusing on particular social media sources.

The tools being developed in the project are general in scope and can be utilised in other areas within social sciences and can be adapted to address other research questions. The project is also intended to demonstrate how social media data can be combined with data from official sources (such as the Office of National Statistics) to support social sciences research. The outcome of the pilot study is intended to highlight the benefits and limitations of using social media data and in particular how such an approach can translate to other areas of social sciences research where social media data (often of limited quality and "messy") can augment expert curated data (obtained through "traditional" questionnaire and field-based activities). The deliverables from the project will include software tools for harvesting data from social network sites and an online guide outlining the use of such tools within a pilot study, highlighting particular lessons learned from this process which could translate to other areas within social sciences and beyond (such as methodological, technical, ethical and privacy issues). In particular, the project will demonstrate how the CDRP enables the analysis of information fed through social media in 'real-time' (and, therefore, the rapid and multiple testing of propositions about the frequency, distribution and content of social media communications) whilst also archiving this data in ways that can be stored by the ESRC and subject to secondary data analysis by subsequent users (see also, the 'data management plan' for the project). This demonstrator also provides the basis for undertaking further study on issues of: (i) data quality arising from the use of social media data and (ii) privacy and user engagement in supporting social media data analytics and trending.

Planned Impact

Who will benefit?

This work is intended to be of benefit to the social sciences community in the first instance, intending to combine social media data with survey data that may have been collected by the research team (using survey and questionnaire based terrestrial data collection). The demonstrator focusing on the analysis of community cohesion indicators will provide a case study that can help validate the effectiveness of aggregating social media data and provide a basis for assessing data quality and volume in supporting social sciences research questions. This work will also be of interest to computer scientists interested in developing analysis and visualisation algorithms that can be applied to data harvested from social media sites.

Researchers in strategic marketing and consumer behaviour analysis will benefit by using the tools to assess the impact of a product launch on a particular user community. Similarly, researchers in journalism and media will benefit by better understanding emerging trends in user-generated social content, similar in scope to Twitter and Google trends (via on-line portals such as "Trendsmap").

How will they benefit?

The tools developed in this project will enable social media data to be considered alongside data obtained from curated sources. The data harvesting tools produced as part of the demonstrator can be used to populate data sources on which a variety of text mining and analysis algorithms can be used. This data may also be visualised using currently available tools such as netvizz and gephi.

To achieve this, the project will entail:

- Development of a project web site, providing the data harvesting and analysis tools produced from this demonstrator.

- A workshop, organised at Cardiff University, to demonstrate how the tools can be used alongside social networking sites for data capture, storage and analysis.

- An online guide on harvesting, analysing and representing social media data for researchers across disciplines and non-academic users. This guide will highlight the methodological and ethical concerns with using such data in various contexts.

- Publications produced for both the social sciences (focusing on how the tool can be used to support social sciences research) and computer science research (focusing on data analysis and visualisation of the data) communities.

Publications

10 25 50

publication icon
Burnap P (2014) COSMOS: Towards an integrated and scalable service for analysing social media on demand in International Journal of Parallel, Emergent and Distributed Systems

publication icon
Burnap P (2015) Detecting tension in online communities with computational Twitter analysis in Technological Forecasting and Social Change

publication icon
Burnap P (2013) Making sense of self-reported socially significant data using computational methods in International Journal of Social Research Methodology

publication icon
Jeffrey Morgan (Author) (2012) Twitter Analytics

publication icon
Luke Sloan (Author) (2013) Using Social Media with Survey Data

 
Description This demonstrator project developed the Collaborative Online Social Media Observatory (COSMOS) that facilitates the ethical automated harvesting of data from social media and other open digital sources (e.g. administrative and curated), and focused on how such data can be linked and stored for subsequent social science analysis. The data analysis tools developed during this project, including gender, location, frequency, network, topic, sentiment and tension analysis, were validated in a pilot study that focused on racial tension on Twitter. In this study we focused in particular on developing a tension-monitoring tool, using our method of collaborative algorithm design that combined the expertise of social science and computer science knowledge. The evaluation of this tool evidenced its superior efficacy compared to more conventional computational methods (see Burnap et al 2013; Williams et al 2013).
This project also evaluated the methodological issues surrounding the use of social media data in the social sciences and established three lines of argument: that these new data: i) act as a surrogate for traditional research designs; ii) augment traditional designs, and should be used alongside; and iii) re- orientate social research around new digital objects, populations and units of analysis (Edwards et al 2013). The pilot study also highlighted how social media analysis is distinctive in capturing naturally occurring data at the level of populations in near real time. Consequently it offers the possibility of studying social processes as they unfold, as contrasted with their official construction through the use of temporally static 'terrestrial' curated datasets.

Methodological evaluation:
In our methodological evaluation of social media data we have distinguished three basic lines of argument about its prospective impact on social research. Some commentators suggest this innovation generates methods and data that can act as a surrogate for more traditional research designs. Others argue that social media re-orientate social research around new objects, populations and techniques of analysis. It can also be argued that digital social research augments, but needs to be used in conjunction with traditional methods (see Table 1). We have used C. Wright-Mills' classic statement of The Sociological Imagination to clarify the distinctive contribution of social media research; what can it do that traditional methods cannot in understanding how social relations are constituted, how they can change and how they generate social identities. We argue that social media research is distinctive in capturing naturally occurring data at the level of populations in near-real time. Consequently, it offers the possibility of studying social processes as they unfold at the level of populations as contrasted with their official construction through the use of curated datasets. We also explored ethical, as well as the technological, implications of social media observatories, focussing in particular on tensions between the 'panoptic' and 'synoptic' powers of digital observatories and the possibilities of a 'signature science' (see Edwards et al 2013; Sloan et al 2013).

Demonstrator results:
This demonstrator project developed COSMOS that facilitates the ethical automated harvesting of data from social media and other open digital sources (e.g. administrative and curated), and focused on how such data can be linked and stored for subsequent social science analysis. The data analysis tools developed during this project include gender, location, frequency, network, topic, sentiment and tension analysis. We developed collaborative algorithm design, a process whereby social science concepts are codified to computational algorithms that automate the analysis process for social media data. Using this process we developed a social media tension monitoring engine. Our results indicate that this engine is more accurate at detecting levels of tension in social media streams compared to machine learning methods and sentiment analysis (Burnap et al 2013). This is an important finding given existing police practices rely on sentiment analysis (Williams et al 2013). We have also developed a social network analysis tool for social media data which visualises connectivity metrics with tension metrics (see appendix 1 on ROS). We have also been successful in 'mashing' curated and administrative open data with social media data. The ONSAPI and Police API are integrated into COSMOS and we can currently overlay Tweet data with crime and demographic census data on a map of the UK (see appendix 2 on ROS).
Exploitation Route This project has contributed to innovation and broadening practice as part of the Digital Social Research Demonstrator programme. Our project was the first ESRC investment to focus on the potential of social media communications to inform social science research. This has assisted in bringing together social and computer scientists to study the methodological and empirical dimensions of Big 'Social' Data. Our continuing objective is to establish a coordinated international social science response to this new form of data in order to address next-generation research questions.

COSMOS captures Big Social Data at the level of populations in near-real-time. This offers researchers the possibility of studying social processes as they unfold at the level of populations as contrasted with their official construction through the use of 'terrestrial' curated and administrative data-sets. The potential for systematic data mining and mixed method analysis in relation to key social science concerns and questions is now possible; COSMOS provides a means of operationalising the next generation 'social computational tool kit'. It also provides a means of augmenting social science research training through the provision of new methodological tools and options for researchers conducting social inquiry in the 21st century. COSMOS will launch to academics in 2014.
Sectors Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Security and Diplomacy

URL http://www.cs.cf.ac.uk/cosmos/
 
Description Scientific: The science that has informed the development of COSMOS has been published in high-end peer review journals, has been presented at 16 research events in the UK and Australia and is linked to the capture of six more research grants (circa £800k) that aim to further develop and test the platform (including 3 ESRC grants-see Outcomes on ROS). To date we have edited two journal special issues (Policing & Society and International Journal of Social Research Methodology) that showcase the science conducted during this project (See Wall & Williams 2013; Housley et al 2013). In addition we have published five peer reviewed research articles (see Burnap et al 2013a, b; Edwards et al 2013; Sloan et al 2013; Williams et al 2013). We also won best conference paper at the IEEE International Conference on Cloud Computing and Services Science. During the project we established collaborations with the universities of Warwick, Manchester, Edinburgh, St Andrews, Wolverhampton, Leeds and UCL. We also established research connections with the universities of Princeton, Singapore, New York and Rutgers. We were funded by the University of Queensland to visit and present on COSMOS and are now working on a joint platform development. COSMOS has established relationships with several academic individuals and research groups within the Universities of Edinburgh, Leeds, Manchester, Wolverhampton, Southampton, Queensland, Beijing Normal, Singapore, Rutgers and New York, and the Karlsruhe Institute of Technology and Institute of Eduction (CLOSER). Individuals from several of these universities have acted as Beta testers for COSMOS 1.0 and have found value in the software and in our methodological work undertaken to date. The Institute for Social Change (Prof. Gibson, Manchester), the Wales Institute of Social and Economic Research, Data and Method (Prof. Rees Jones, Cardiff), the UK Data Archive (Dr. Corti, Essex) the ESRC National Centre for Research Methods (Prof. Sturgis, Southampton) and the joint MRC and ESRC centre for Cohort and Longitudinal Studies Enhancement Resources (Prof. Elliott, Institute of Education) have recognised the value added by the COSMOS programme. Broader Society: During the project we have established research connections with key industry and public sector partners, these include: Google UK; Twitter UK & US; Airbus Group; Fujitsu and High Performance Computing Wales; Sage Publications; Office for National Statistics; Food Standards Agency; Cabinet Office Office for Cyber Security and Information Assurance and the Identity Assurance Programme; Home Office Business Intelligence and Shared Services Programme; College of Policing; Metropolitan Police Service; Association of Chief Police Officers; UK Data Archive; and the Welsh Government Equality, Diversity and Inclusion Division. We are currently in discussions with the Airbus Group in relation to licencing our tension engine and other digital tools. Discussions are on-going with Fujitsu and High Performance Computing Wales to host the COSMOS platform, making it a sustainable resource for academics. Meetings are also on-going with Sage Publications (London and Washington DC) in relation to distributing the COSMOS platform in the North American HE sector. Work is about to begin with the Welsh Government that will stress test the tension engine in specific urban and rural locations in Wales. We are also working with the Metropolitan Police Service on the development of a crime and disorder detection engine which will extend the tension monitoring tool. Several non-academic research end-users have engaged with COSMOS 1.0 platform as Beta testers, including the Office for National Statistics Big Data Innovations Lab and the Food Standards Agency. NatCen (Mr. Morrell, Group Head, Heath, Social Attitudes and Environment), Sage Publishers (Dr. Brindle, Publisher, Online Content), the Office for National Statistics (Mr. Swier, Principal Methodologist, Big Data Project), the Metropolitan Police Service (D.S. Jervis, Head of Communication Intelligence Unit) and the Food Standards Agency (Dr Thomas, Head of Analytics & Mr. Baker, Head of Social Media) have recognised the value added by the COSMOS programme.
First Year Of Impact 2012
Sector Digital/Communication/Information Technologies (including Software),Education,Government, Democracy and Justice,Security and Diplomacy
Impact Types Societal

 
Description COSMOS influence on practice
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
Impact Several non-academic research end-users have engaged with COSMOS 1.0 platform as Beta testers, including the Office for National Statistics Big Data Innovations Lab and the Food Standards Agency. These organisations are evaluating the utility of COSMOS to meet departmental objectives.
 
Description Centre for Cyberhate Research & Policy: Real-Time Scalable Methods & Infrastructure for Modelling the Spread of Cyberhate on Social Media
Amount £383,983 (GBP)
Funding ID ES/P010695/1 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 04/2017 
End 12/2019
 
Description Detecting Tension and Cohesion in Local Communities with Social Media
Amount £51,040 (GBP)
Organisation Airbus Group 
Sector Academic/University
Country France
Start 09/2014 
End 05/2015
 
Description ESRC Capital Funding: Social Data Science Lab - Continuation of Methods and Infrastructure Development for Open Data Analytics in Social Research
Amount £607,960 (GBP)
Funding ID ES/P008755/1 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 01/2017 
End 01/2020
 
Description Hate Speech and Social Media: Understanding Users, Networks and Information Flows
Amount £124,986 (GBP)
Funding ID ES/K008013/1 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 04/2013 
End 08/2014
 
Description High Performance Computing, Scalability and Big 'Social' Data
Amount £80,000 (GBP)
Organisation Fujitsu 
Sector Private
Country Japan
Start 10/2014 
End 05/2015
 
Description Requirements Analysis for Social Media Analysis Research Tools
Amount £5,000 (GBP)
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 07/2012 
End 08/2012
 
Description Social Media and Prediction: Crime Sensing, Data Integration and Statistical Modelling
Amount £194,138 (GBP)
Funding ID DU/512589112 
Organisation Economic and Social Research Council 
Sector Public
Country United Kingdom
Start 04/2013 
End 04/2014
 
Description Supporting Empirical Digital Social Research for the Social Sciences with a Virtual Research Environment
Amount £55,519 (GBP)
Organisation Jisc 
Sector Public
Country United Kingdom
Start 08/2012 
End 03/2013
 
Description Understanding the role of social media in the aftermath of youth suicides
Amount £188,000 (GBP)
Funding ID PR-R5-0912-11008 
Organisation Government of Wales 
Department Department of Health
Sector Public
Country United Kingdom
Start 07/2013 
End 03/2015
 
Title COSMOS Platform 
Description ?This demonstrator project developed the Collaborative Online Social Media Observatory (COSMOS) that facilitates the ethical automated harvesting of data from social media and other open digital sources (e.g. administrative and curated), and focused on how such data can be linked and stored for subsequent social science analysis. The data analysis tools developed during this project, including gender, location, frequency, network, topic, sentiment and tension analysis, were validated in a pilot study that focused on racial tension on Twitter. In this study we focused in particular on developing a tension-monitoring tool, using our method of collaborative algorithm design that combined the expertise of social science and computer science knowledge. The evaluation of this tool evidenced its superior efficacy compared to more conventional computational methods (see Burnap et al 2013; Williams et al 2013). This project also evaluated the methodological issues surrounding the use of social media data in the social sciences and established three lines of argument: that these new data: i) act as a surrogate for traditional research designs; ii) augment traditional designs, and should be used alongside; and iii) re- orientate social research around new digital objects, populations and units of analysis (Edwards et al 2013). The pilot study also highlighted how social media analysis is distinctive in capturing naturally occurring data at the level of populations in near real time. Consequently it offers the possibility of studying social processes as they unfold, as contrasted with their official construction through the use of temporally static 'terrestrial' curated datasets. 
Type Of Material Improvements to research infrastructure 
Year Produced 2013 
Provided To Others? Yes  
Impact COSMOS has established relationships with several academic individuals and research groups within the Universities of Edinburgh, Leeds, Manchester, Wolverhampton, Southampton, Queensland, Beijing Normal, Singapore, Rutgers and New York, and the Karlsruhe Institute of Technology and Institute of Eduction (CLOSER). Individuals from several of these universities have acted as Beta testers for COSMOS 1.0 and have found value in the software and in our methodological work undertaken to date. The Institute for Social Change (Prof. Gibson, Manchester), the Wales Institute of Social and Economic Research, Data and Method (Prof. Rees Jones, Cardiff), the UK Data Archive (Dr. Corti, Essex) the ESRC National Centre for Research Methods (Prof. Sturgis, Southampton) and the joint MRC and ESRC centre for Cohort and Longitudinal Studies Enhancement Resources (Prof. Elliott, Institute of Education) have recognised the value added by the COSMOS programme. Several non-academic research end-users have engaged with COSMOS 1.0 platform as Beta testers, including the Office for National Statistics Big Data Innovations Lab and the Food Standards Agency. NatCen (Mr. Morrell, Group Head, Heath, Social Attitudes and Environment), Sage Publishers (Dr. Brindle, Publisher, Online Content), the Office for National Statistics (Mr. Swier, Principal Methodologist, Big Data Project), the Metropolitan Police Service (D.S. Jervis, Head of Communication Intelligence Unit) and the Food Standards Agency (Dr Thomas, Head of Analytics & Mr. Baker, Head of Social Media) have recognised the value added by the COSMOS programme. 
 
Title COSMOS Social Media Linked Database 
Description COSMOS routinely collects a random 1% of the global twitter feed daily (circa 5M tweets) as well as collecting all geocoded tweets in the UK (circa 500K tweets daily). COSMOS also hosts several bespoke datasets collected using keywords (including 'Obama Election'; 'Olympics'; 'Woolwich'; 'Boston Marathon'; 'Caroline Criado Perez'; 'Benefits Street/Britain'; 'Horse Meat'; 'Ebola'). These datasets will become a resource for UK academics. As Twitter forbid the direct sharing of their datasets we are engineering technical solutions to make these resources available more widely. We are also in the process of negotiating elevated data access with Twitter US. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact This database of tweets is one of a few in the UK. COSMOS is in the process of joining the Web Science Trust Web Observatory at the University of Southampton. 
URL http://www.cosmosproject.net
 
Description COSMOS and Office for National Statistics Partnership 
Organisation Office for National Statistics
Country United Kingdom 
Sector Private 
PI Contribution Team visits to ONS sites to demonstrate the COSMOS platform and gather requirements for the integration of the ONS API
Collaborator Contribution COSMOS selected to be a ONS API Case Study. Visits to Cardiff to discuss requirements of incorporating the ONS API into the COSMOS platform. Technical support.
Impact The ONS API will be incorporated into COSMOS following beta testing. This partnership is multi-disciplinary: computer and social sciences.
Start Year 2012
 
Title COSMOS Social Media Data Access, Storage and Big Data Analysis Platform 
Description ?This demonstrator project developed the Collaborative Online Social Media Observatory (COSMOS) platform that facilitates the ethical automated harvesting of data from social media and other open digital sources (e.g. administrative and curated), and focused on how such data can be linked and stored for subsequent social science analysis. The data analysis tools developed during this project, including gender, location, frequency, network, topic, sentiment and tension analysis, were validated in a pilot study that focused on racial tension on Twitter. In this study we focused in particular on developing a tension-monitoring tool, using our method of collaborative algorithm design that combined the expertise of social science and computer science knowledge. The evaluation of this tool evidenced its superior efficacy compared to more conventional computational methods (see Burnap et al 2013; Williams et al 2013). 
Type Of Technology Software 
Year Produced 2014 
Open Source License? Yes  
Impact Both the Office for National Statistics and the Food Standards Agency are using the COSMOS platform to support departmental objectives. The software is also being used by several universities in the UK and Australia. 
URL http://www.cs.cf.ac.uk/cosmos/
 
Description Appendix 1 & 2 of EoA Report 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Appendix 1 & 2 of EoA Report

See Appendix 1 & 2 of EoA Report
Year(s) Of Engagement Activity 2013