Towards large-scale Cultural Analytics in the Arts and Humanities

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Literature Languages & Culture

Abstract

The UK has a world-leading cultural and creative economy, and every year there are hundreds of thousands of events, festivals, concerts, plays, and gigs, varying in scale from the very small and informal, to the large and coordinated. Events tell us much about the creative landscape: however, although data has been produced about them to facilitate listings, and ticket sales, there does not exist a service by which researchers can access this recently produced commercial data in order to generate accurate data-led analysis and visualisation of the UK's creative sector. Additionally, the data created by the events industry is large, and complex, involving commercial providers who have generated novel business models around data scraping, gathering, and dissemination. Any researcher who wishes to use this data has to navigate access to data, but also access to compute at scale, to generate novel understandings that may be of use in event planning, policy, or to identify potential trends, or opportunities for investment and support. Of particular interest at the moment is gathering accurate information on the effect of the COVID-19 pandemic on the UK's events industry: although events data exists, no accurate reports have used this to understand the effects of the pandemic, and industry recovery. To provide such a data service that could support researchers, there needs to be an investigation into how best to provide this data to Arts and Humanities (A&H) researchers, many of whom have not undertaken data-analysis at this scale. Therefore, a particular needs-analysis requires to be done with A&H researchers, whilst also working closely with industry contacts to understand the landscape of events data, and how this - as an example of the type of data that is produced by the UK's cultural industries - can be provided to A&H researchers as a supported service that negotiates relationships between data providers and processors. Any service in this area needs to also consider privacy, copyright, and intellectual property, as well as looking at particular user needs. Our research will support the development and design of a data repository for the capture and analysis of UK cultural and creative industries data at scale, focussing particularly on events-based data. We will undertake a range of scoping and user needs analysis with a diverse community from industry, academia, and data service providers. We will show how A&H researchers are already using, or could make future use, of events based data, and the impact this type of research may have in understanding our economy, cultural environment, and physical infrastructure planning. We will undertake a pilot study, with our project consultants The List, who are the UK's major events listings based data provider and have over 15 years of experience operating in this area. We will aim to understand how researchers can analyse over 2GB of data covering 2.5m events organised in the UK between 2017 and 2021. Our outputs will include a specification for a cultural and creative industries data service that includes capital, operational, and support costs, providing a roadmap for how to build a service that can support the UK's A&H researchers in understanding the cultural and heritage industries at scale. We will also propose a skills and capacity building programme for the A&H research community in accessing and using this type of creative industry information from a large-scale data service that utilises High Performance Computing and Data Analytics. The University of Edinburgh is uniquely placed to be able to carry out such a study, given the conjunction of expertise which exists, and collaborations, between national UK computing infrastructure, major events such as the festivals, creative industry researchers, and the university's recent major investment into digital approaches across the Arts, Humanities, and Social Sciences in the Edinburgh Futures Institute (www.efi.ed.ac.uk).

Publications

10 25 50
 
Description Top 3 Findings:
1. The data generated by the UK's creative and cultural industries are large-scale, complex, and have enormous untapped potential for research, as well as providing insights to the creative industries.
2. The A&H research community has the desire and expertise to greatly benefit from access to large-scale cultural events data.
3. The A&H research and wider user community requires support to fully benefit from access to this data in the form of both infrastructure (the LCAAH Data Service) and the support needed to access this service (the LCAAH Data Lab).

Further findings:
4. A rich dataset of cultural events data already exists (curated by Data Thistle) that can form the basis of a cultural events data service.
5. There are multiple other data sources that can be linked to Data Thistle's data to provide even richer research opportunities.
6. There are complexities specific to contemporary cultural data that necessitate support for researchers working with them.
7. The size and complexity of the data requires high-performance or large-scale compute, which is beyond the capacity of most A&H research communities.
8. Large-scale cultural data requires using large-scale compute and this is an area where we can invest in the skills of A&H researchers, as well as providing scalable infrastructure and support structures that can facilitate research in the A&H.
9. We have identified additional benefits to industry from such a data service, particularly regarding business models and innovation. We show that making the most of cultural events data needs close relationships to be built between academia and industry partners.
10. We recommend that any data service for the A&H adopts our proposed model of supporting both the infrastructure and a support service, to ensure adoption.
Exploitation Route (v) Top 5 Recommendations
1. LCAAH Data Service should be built using a Private Cloud infrastructure model utilising Edinburgh International Data Facility.
2. Large-scale cultural events data should start with What's On data available via Data Thistle and with additional sources added as required by research projects.
3. LCAAH Data Lab should provide support for projects using these data from inception to completion.
4. Ongoing training provision for A&H sector and industry will be needed to enable work with large-scale cultural data.
5. Investments should be made in data infrastructure for research that can be scaled up, and be able to incorporate additional linked datasets in the future, as well as the supportive lab service to ensure adoption. Any data service provided in the Arts and Humanities will require both these elements (infrastructure and support).
Sectors Creative Economy,Leisure Activities, including Sports, Recreation and Tourism

URL https://blogs.ed.ac.uk/tolcaah/
 
Description They are informing the AHRC's Infrastructure program about data services. We have yet to hear back from the AHRC on how they are taking our recommendations forward. Top 5 Recommendations 1. LCAAH Data Service should be built using a Private Cloud infrastructure model utilising Edinburgh International Data Facility. 2. Large-scale cultural events data should start with What's On data available via Data Thistle and with additional sources added as required by research projects. 3. LCAAH Data Lab should provide support for projects using these data from inception to completion. 4. Ongoing training provision for A&H sector and industry will be needed to enable work with large-scale cultural data. 5. Investments should be made in data infrastructure for research that can be scaled up, and be able to incorporate additional linked datasets in the future, as well as the supportive lab service to ensure adoption. Any data service provided in the Arts and Humanities will require both these elements (infrastructure and support).
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Leisure Activities, including Sports, Recreation and Tourism
Impact Types Cultural,Policy & public services