Digital Humanities Data Hive: Accessing Humanities Data At Scale

Lead Research Organisation: University of Glasgow
Department Name: School of Humanities

Abstract

This project will scope the establishment of the Digital Humanities Data Hive ("DH2"). This is conceived as an active and interactive national data centre for the arts and humanities where the rich and complex data at the heart of research can flourish in new and unexplored ways. At the heart of the project is a respect for the hybrid and diverse array of data that are the basis for arts and humanities research. Our proposal rejects narrow definitions of data types and disciplines, and instead builds on the UK's long history of expertise in digital text, image, and multimedia collections; within our scope is any humanities data which has the potential to be transformed and have value added through the data-driven integration of research data and tools.

Our research, carried in out close collaboration with the arts and humanities and data science communities, will explore how DH2 will make possible completely new ways of doing digitally enabled work. This will enable us to scope, develop and design a suite of tightly-integrated services for both data storage and use. Our approach centres around building tools for mining, analysing, exploring and linking data alongside its storage, thus ensuring data within a future centre or centres is firmly embedded within the research data lifecycle. By so doing, our project will foster new ways of thinking about creating digitally driven research in the arts and humanities - and in parallel, make the case that the creation of tools and services for using data for scholarship must be an integrated aspect of any data-focused project in the humanities. We know that at present, the 'human infrastructure' of researchers, developers, users, and repository managers does not wish to interact with raw data stores, but instead with the holistic confluence of data and tools. Therefore, the combined data/interface/tool ecosystem that our project will design will be essential for the identification of future cross-disciplinary opportunities and new and emerging transformative uses of humanities data. Finally, by centering re-use we want to create an environment where there is greater transparency around the use and analysis of data and where methods and workflows for research are open and can be observed, critiqued, and replicated.

The DH2 proposal this project will scope will include a methodological layer of tools and services for using data at scale, offering a service that will allow people to run their own data through tools and create results based on comparisons and integration with larger bodies of data. To do this, we will develop an evidence based, fully-costed project plan, and full technical specifications for a data centre which has two key elements: a Data Service that will federate new and existing data repositories via an abstraction layer, which itself will be used in our Data Lab, an analytical layer of tools for data manipulation, mining, re-use, and visualisation. Our project will therefore be a crucial step towards a national data service for the arts and humanities, bringing together human infrastructures, results-oriented services, and state of the art technical developments.

Publications

10 25 50