ESRC Capital Funding: Social Data Science Lab - Continuation of Methods and Infrastructure Development for Open Data Analytics in Social Research

Lead Research Organisation: Cardiff University
Department Name: Computer Science

Abstract

The Social Data Science Lab has been established with the mission of democratising access to big social data among the academic, public and third sectors, and to support real-time social data analytics for research, policy & practice. The proposed capability project is designed to address existing technical and methodological shortcomings in our ability to marshal big social data for social research purposes. In particular, the project will provide enhanced and sustainable social media data collection and analysis technologies to academic, public and third sector researchers. This capability project will achieve this by:

1. Providing the required technical resource and expertise to: i) optimise existing Lab social media analytics technologies for more efficient social media data collection, transformation and analysis across operating platforms (Windows, Linux, Mac OS); ii) integrate existing social media analytics tools (such as demographic and text classification tools); iii) make existing Lab tools extendable by the researcher community; iv) adapt Lab tools to user requirements and changes in social media provider technologies (i.e API changes); and v) support researchers in their social media data and analysis needs via dedicated Lab training and working-papers;

2. Providing the required social science resource and expertise to: i) liaise with the researcher community to gather social media tool requirements; ii) liaise with ESRC administrative, local government and consumer big data centres; iii) write training materials and coordinate capability building activities; and iv) support researchers in their data and analysis queries;

3. Ensuring the required investigator time to: i) manage the optimisation and enhancement of Lab tools and to implement a 'sustainability' business model; ii) manage existing partnerships with public, private and third sector users and the various ESRC big data network centres in the UK and elsewhere; iii) develop new partnerships with data providers; and iv) inform and oversee the development of world-leading training and capability building in Big Social Data Analytics;

4. Exploring options for the sustainable processing of social media analytics within UK HE research infrastructure and providing an options paper for use by existing ESRC big data investments, and ultimately Phase 3 of the network.

Planned Impact

The project will have five main categories of beneficiary: (1) Academic communities in the fields of social science, computer science, health studies and medicine, and arts and humanities, (2) government agencies that have a remit to engage with big social data; (3) law enforcement agencies, (4) voluntary sector organisations, and (5) private corporations with an interest in big social data. The main activities to realising potential benefits to these groups are:

1. The provision of free access to Lab social media analytics technologies for not-for-profit use;

2. The provision of free access to Lab social media analytics capability building and training materials, including webinars and online support community;

3. The invitation to an International Conference on Computational Social Science;

4. The recruitment onto the MSc in Social Data Science Part-Time route;

5. The continued support of industry-Lab partnerships (with the likes of Admiral Insurance and Airbus);

6. The continued support of government-Lab partnerships (with the likes of the ONS Data Science Campus, Home Office, Ministry of Justice and the Department for International Development)

7. The recruitment of non-academic government, voluntary and industry members onto the Steering Committee for the Lab social media analytics capability programme.

The Social Data Science Lab will leverage its existing relationships to achieve these activities. Existing links include 1) Private sector: Twitter US & UK; Google UK; Airbus Group; Admiral Insurance; RAND Corporation; RAND Europe; Fujitsu and High Performance Computing Wales; Sage Publications; and NatCen Social Research, 2) Public sector: Ministry of Justice; Home Office; Food Standards Agency; Department of Health; Department for International Development; Office for National Statistics Data Science Campus; Welsh Government; College of Policing; Metropolitan Police Service; City of London Police; UK Data Archive, and 3) Third sector: Tell Mama; Community Security Trust; Race Equality First; Stonewall.

Lab social media analytics technologies have been used by over 1000 organisations in over 30 countries, including all UK Russell Group universities, several top US universities (including Stanford, Cornell and MIT) and many non-academic institutions (including BBC; Foreign and Commonwealth Office; Citizens Foundation Iceland; Girl Guides; Dept. Work and Pensions; Bolton Council; MySociety; Police Foundation; West Midlans Police; CPS; Shelter; Scottish Government; Dept of Health; ONS; South Lanarkshire Council; Community Security Trust; Cabinet Office; College of Policing; Public Health Canada; Salvation Army; Institute for Sustainable Communication; Detroit Crime Commissions; Understanding Animal Research; British Geological Survey; Medway Council; European Space Agency; UK MOD Army; National Response Center for Cyber Crimes (Pakistan); National Library of Scotland; Dept. for Culture Media and Sport; HEFCW; Carmarthen County Council; Ceredigion Council; Airbus Group; British Institute of Human Rights; McKinsey; US Army; Fair Trials Intl.; Turkish National Police; Intl. Civil Society Centre; and Public Health England). We will provide all enhanced social media analytics technologies to these organisations and continue to support data and analysis needs where required and possible.

Cardiff University recognises the Social Data Science Lab's impact to date, and is supporting its impact plan going forward by locating it within the Nesta sponsored Social Science Research Park (SPARK). See Pathways to Impact for further details.
 
Title COSMOS - Automation of the download request process. 
Description Users are able to request to download COSMOS application, and instead of administrator has to manually approve the download request and send link to latest version to the user to download, this process has been automated by building a CRON job and PHP script to handle the email received by the user and store it into google drive document and send out a welcome email with the link to download COSMOS. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? No  
Impact Saved a lot of time from development and administrators team and helped the research community to receive a faster response. 
 
Title COSMOS - Bug Tracker 
Description Any issues in design and coding that cause incorrect results are considered software bugs. In a software development life cycle, tracking bugs is one of the most important aspects. Where a user has encountered an issue and submitted it via the Help Desk, but the team could not resolve it immediately, then it is reported as a bug using the COSMOS Bug tracker. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? No  
Impact The Bug Tracker improves connectivity between COSMOS development team members and plays an important role in providing feedback to the user when the bugs are resolved. It is also proving key in helping the team prioritize issues during software development sprints, and in delivering a high-quality sustainable product. The Bug tracker will continue to provide the development team with information on how to fix and improve COSMOS over the full duration of the capability grant. 
 
Title COSMOS Help Desk 
Description The help desk was designed as a one-stop-shop support mechanism for those COSMOS users encountering issues with the platform. Users are able to submit a 'ticket' detailing their issue and track their submission/check status of their problem, to identify when it is resolved. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? No  
Impact We are now able to ensure a continued dialogue to resolve the issue and our team is able to build up a knowledge base around the errors encountered. User-driven feature requests can also be submitted to making it easier for us to feed into future enhancements and development plans. 
URL http://www.cosmos-support.net
 
Description Cardiff University Airbus Centre of Excellence in Cyber Security Analytics 
Organisation Airbus Group
Department Airbus Operations
Country United Kingdom 
Sector Private 
PI Contribution Burnap is the director of the Centre, Anthi is a core IoT researcher within the Centre. Burnap leads IoT research for Airbus in the context of Industrial IoT
Collaborator Contribution Airbus are providing support to build an industrial IoT testbed as part of the IoTDepends project - this will underpin the research co-produced by Cardiff University and Airbus
Impact £760k research project funded by Endeavr Wales to study intrusion detection and probabilistic modeling of cyber attacks on Industry Control Systems (SCADA); £1.8m EPSRC research project studying the impact of IoT and sensors embedded in products of the future to support automated "Chatty Factories" of the Future; Journal article in Computers and Security (Malware Classification and Machine Learning); Journal article in IEEE Computer (Goal Oriented Risk Modeling); Journal article research has been transitioned into enhanced products and services within Airbus (Malware Classification -> SOC, Risk Modeling -> Risk consulting business)
Start Year 2017
 
Title COSMOS - Automation of the download request process. 
Description Users are able to request to download COSMOS application, and instead of administrator has to manually approve the download request and send link to latest version to the user to download, this process has been automated by building a CRON job and PHP script to handle the email received by the user and store it into google drive document and send out a welcome email with the link to download COSMOS. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Impact Saved a lot of time from development and administrators team. 
 
Title COSMOS - Bug Tracker 
Description Any issues in design and coding that cause incorrect results are considered software bugs. In a software development life cycle, tracking bugs is one of the most important aspects. Where a user has encountered an issue and submitted it via the Help Desk, but the team could not resolve it immediately, then it is reported as a bug using the COSMOS Bug tracker. 
Type Of Technology Webtool/Application 
Year Produced 2017 
Open Source License? Yes  
Impact The Bug Tracker improves connectivity between COSMOS development team members and plays an important role in providing feedback to the user when the bugs are resolved. It is also proving key in helping the team prioritize issues during software development sprints, and in delivering a high-quality sustainable product. The Bug tracker will continue to provide the development team with information on how to fix and improve COSMOS over the full duration of the capability grant. 
URL http://cosmos-support.net/reportbug
 
Description A meeting with UK Data Archive 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact UKDA are archiving New Forms of Data, including social media. From our two meetings, it was evident that COSMOS could be enhanced by adding the functionality to link to UKDA datasets containing tweet IDs, to 'rehydrate' them via the Twitter API. This enhancement would function in the same as a DOI that points to the dataset used to produce the results in an academic paper. The enhancement would also allow users of COSMOS to deposit tweet IDs into the UKDA. These rehydration and deposit functions would facilitate big social data reuse and study replication. The enhancement fits with UKDA, ESRC and the Social Data Science Lab strategic priorities, and therefore we have agreed to collaborate on this over the coming year.
UKDA are in need of documentation for data reusers on the technical, methodological and ethical reuse of social media data. The Lab has agreed to co-author this documentation with UKDA.
UKDA are trialing the use of their new High- Performance Computing infrastructure in the storage, management, and analysis of big data sources (currently smart energy meter data). It was agreed that the Lab work with UKDA on experimenting with social media analysis using this architecture. It may be possible to use this architecture to power some of the back-end COSMOS processes, speeding up analysis for heavy users significantly.
Year(s) Of Engagement Activity 2017
 
Description A meeting with public health of England 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact PHE is interested in developing a customized plugin to be used as part of COSMOS application, and the meeting held in March 2018 aims to collect requirements as part of our Sustainability plan for COSMOS.
Year(s) Of Engagement Activity 2018
 
Description Meeting with ADRC-Wales 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact We will jointly explore the possibility of providing training to ADRC-W in Swansea around the topic of social data science (e.g. social network analysis).
COSMOS will be utilised as an in-house resource within ADRC-W to provide the function of social media data analysis and linking with administrative data.
We will deliver a seminar to the ADRC network on social media analysis (showcasing COSMOS).
We will co-develop an emotion extraction plugin for COSMOS at the request and design of ADRC-W.
Year(s) Of Engagement Activity 2017
 
Description Meeting with MOPAC and MPS 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Potential collaboration to explore Develop an Intelligent methodology to detect Spread of hate - this will form part of the model integration later
detect the network associated with the user's posting hate - explore network metrics, esp. the networks of haters and hated.
COSMOS requirements: -Top 10 hashtags - Targeted to a certain individual, how many people have been targeted attack? @metion is one way, but it will also pick up counter-speech and responses. -include a frequency of hate posts with @mentions included in the text.
Year(s) Of Engagement Activity 2018
 
Description Meeting with Welsh Water Board representative 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Public/other audiences
Results and Impact Explore potential collaboration with Welsh Water, by developing a customized plugin based on Welsh Water's requirements. To learn more about their customers especially, within 5 years there will be more companies to join into Water Sector. Therefore, Welsh water is looking to integrate more intelligent social networks tools to improve their engagement with their customers.
Year(s) Of Engagement Activity 2018