Born-digital big data and methods for history and the humanities

Lead Research Organisation: University of London
Department Name: Inst of Historical Research

Abstract

In recent years we have all become familiar with the notion of information overload, the digital deluge, the information explosion, and numerous variations on this idea. At the heart of this phenomenon is the growth of born-digital big data, a term which encompasses everything from aggregated tweets and Facebook posts to government emails, from the live and archived web to data generated by wearable and household technology. While there has been a growing interest in big data and the humanities in recent years, as exhibited notably in the AHRC's digital transformations theme, most academic research in this area has been undertaken by computer scientists and in emerging fields such as social informatics. As yet, there has been no systematic investigation of how humanities researchers are engaging with this new type of primary source, of what tools and methods they might require in order to work more effectively with big data in the future, and of what might constitute a specifically humanities approach to big data research. What kinds of questions will this data allow us to ask and answer? How can we ensure that this material is collected and preserved in such a way that it meets the requirements of humanities researchers? What insights can scholars in the humanities learn from ground-breaking work in the computer and social sciences, and from the archives and libraries who are concerned with securing all of this information?

The proposed research Network will bring together researchers and practitioners from all of these stakeholder groups, to discern if there is a genuine humanities approach to born-digital big data, and to establish how this might inform, complement and draw on other disciplines and practices. Over the course of three workshops, one to be held at The National Archives in Kew, one at the Institute of Historical Research, University of London, and one at the University of Cambridge, the Network will address the current state of the field; establish the most appropriate tools and methods for humanities researchers for whom born-digital material is an important primary source; discuss the ways in which researchers and archives can work together to facilitate big data research; identify the barriers to engagement with big data, particularly in relation to skills; and work to build an engaged and lasting community of interest.

The focus of the Network will be on history, but it will also encompass other humanities and social science disciplines. It will also include representatives of non-humanities disciplines, for example the computer, social and information sciences. Cross-disciplinary approaches and collaborative working are essential in such a new and complex area of investigation, and the Network relates to the current highlight notice encouraging the exploration of innovative areas of cross-disciplinary enquiry. While there has for some time been a recognition of the value of greater engagement between researchers in the humanities and the sciences in the development of new approaches to and understandings of born-digital big data, only very tentative first steps have been made towards realising this aim (for example forthcoming activity organised by the Turing Institute). The Network will provide a forum from which to launch precisely this kind of cross-disciplinary discussion, defining a central role for the humanities.

During the 12 months of the project all members of the Network will contribute to a web resource, which will present key themes and ideas to both an academic and wider audience of the interested general public. External experts from government, the media and other relevant sectors will also be invited to contribute, to ensure that the Network takes account of a range of opinions and needs. The exchange of knowledge and experience that takes place at the workshops will also be distilled into a white paper, which will be published under a CC-BY licence in month 12 of the Network.

Planned Impact

The proposed Network will have significant impact upon a range of audiences and sectors, including:

1 Institutions with responsibility for collecting and preserving big data
Archives, libraries and other memory institutions are increasingly concerned with the collection, preservation and ultimately publication of born-digital public data. If this material is to be of value to both current and future researchers it is essential that it is archived, described and structured in ways which meet their requirements and support innovative study and analysis. The proposed Network will provide a forum for the exchange of knowledge and development of methods among humanities researchers, computer scientists and archive and records professionals, establishing a community of interest in this emerging field.

2 Government and policy-makers
Government is one of the biggest creators and consumers of born-digital big data, and it has clear responsibilities to its citizens in how it stores, manages and analyses this information. The research Network is ideally placed to inform government policy in this area, notably in relation to ethical and methodological considerations, and to mediate the conversation between government and public about big data and its implications for interaction between the two.

3 The general public
A significant proportion of the born-digital big data generated in the late 20th and early 21st centuries is either produced by or contains information about ordinary men and women. It concerns their healthcare, their financial status, their social media lives, their professional and personal identities. The discourse about big data is often dominated by fears over surveillance and the consequent invasion of privacy, concerns which are only likely to be exacerbated by, among other things, the increasing prevalence of wearable technologies. The Network will address these issues in its workshops, but particularly in its public-facing communication, including a web resource which will present the issues in a clear and accessible form.

4 Journalists and other mainstream media-based researchers
Born-digital big data has become a mainstay of newspaper and other media coverage, from simple infographics to in-depth reflections on the importance of the archived web (e.g. 'Can the Internet be archived?' http://www.newyorker.com/magazine/2015/01/26/cobweb). The Network will help to provide context for this analysis and generate both resources and expertise on which journalists and other media-based researchers can draw.

5 Funding bodies in the humanities
There has been a great deal of investment in digital humanities scholarship in the past two decades, and in recent years this has begun to include big data research and the analysis of born-digital material such as the web (both live and archived) and various forms of social media. By assessing the current state of the field and identifying those areas where it would be most useful to conduct further research, the Network will help to inform future funding priorities and establish a benchmark against which to measure success.

The Network's members will gain significant professional benefits from involvement in the project. The opportunity to share expertise with and learn from colleagues in other sectors and with different disciplinary and national perspectives will be invaluable in terms of career progression, and will support the preparation of further joint research proposals, notably at the EU level. The research and professional skills that they will develop in the course of the project will also equip them both to offer advice and guidance within their host institutions and to undertake consultancy work within the higher education and archive sectors.

Publications

10 25 50
 
Description The research network highlighted the importance not only of interdisciplinary but particularly of cross-sectoral working. The challenges posed by born digital data (at scale) for memory institutions and researchers are too large and complex to be solved by individuals, or even within individual sectors or disciplines. They require new and agile collaborations to be developed between the organisations responsible for collecting, preserving and ultimately publishing that data and the researchers who both want to use it themselves and can help to inform future access arrangements. The collaborations seeded by this network have already borne fruit in a developing programme of events, including for example a series of 'Digital experimentation' workshops organised by The National Archives which involve members of the network. The PI is convening a web archiving workshop as part of this programme in the summer of 2018. One of the key lessons from the network is that the most interesting questions for this particular group of partners are not to do with issues of technology, but rather with cultures of data creation and reuse. Key themes to emerge include reflecting on the digital body of the 'other' and the ways in which volume of data may serve to obscure already marginalised voices (even though they may be present for the first time in archived materials); exploring archival gaps and absences, and how these can be contextualised for researchers now and in the future; and investigating whether it is possible to capture digital creativity as part of the archiving process. The groundwork has already been laid for at least two major research grants applications. Finally, the network has highlighted the importance of clear communication, and specifically the development of a common language for discussing digital research between and among different stakeholders.
Exploitation Route It is anticipated that the findings of this project will inform the development of a range of research projects in both the Galleries, Libraries, Archives and Museums (GLAM) and Higher Education sectors. The project has established research themes of common interest across these sectors, identified parameters for successful interdisciplinary and cross-sectoral collaboration, and suggested methodologies for (re)publishing and using born digital archives. The findings also have the potential to shape delivery and access arrangements for digital archival data now and in the future.
Sectors Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections

 
Description UNESCO UK Memory of the World Committee
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
URL https://www.unesco.org.uk/designation/memory-of-the-world-inscriptions-in-the-uk-uk-register/
 
Description Membership of the Turing Institute Data Science and Digital Humanities Interest Group 
Organisation Alan Turing Institute
Country Unknown 
Sector Academic/University 
PI Contribution As an external research on the interest group, I have been able to contribute expertise in relation to working with web archives and born-digital data for historical research.
Collaborator Contribution The main aims of the group are to strengthen relationships and build collaborations at the intersection between data science and digital humanities. Our goal is to raise the profile of data-driven humanities research at the Turing, open up future collaborations, and strengthen the Turing's links with organisations such as the British Library, The National Records of Scotland and The UK National Archives. The group will show the key role that can be played by The Alan Turing Institute in the area of Digital Humanities by demonstrating that data science research can answer questions relevant to the humanities and vice versa, thus benefiting both fields. This will be achieved with meetings, workshops, and joint research projects. Translating fundamental research in data science into lasting impact in the humanities requires interdisciplinary efforts, through the sharing of perspectives, methods and knowledge. The interest group builds on the organisers' extensive experience in interdisciplinary research on historical data and brings together people from a range of different disciplines.
Impact A workshop was held at the University of Edinburgh in 2018, which fed in to the UKRI infrastructure road map consultation.
Start Year 2017
 
Description RecordDNA 
Organisation Northumbria University
Country United Kingdom 
Sector Academic/University 
PI Contribution Member of the steering committee for the AHRC-funded RecordDNA research network, providing advice and guidance.
Collaborator Contribution Leading the research network.
Impact No outputs as yet.
Start Year 2017
 
Description RecordDNA 
Organisation The National Archives
Country United Kingdom 
Sector Public 
PI Contribution Member of the steering committee for the AHRC-funded RecordDNA research network, providing advice and guidance.
Collaborator Contribution Leading the research network.
Impact No outputs as yet.
Start Year 2017
 
Description RecordDNA 
Organisation University College London
Department Department of Information Studies
PI Contribution Member of the steering committee for the AHRC-funded RecordDNA research network, providing advice and guidance.
Collaborator Contribution Leading the research network.
Impact No outputs as yet.
Start Year 2017
 
Description 'The future of the past' public roundtable 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact This roundtable discussion was held as part of a series of public seminars organised under the theme of 'History now and then'. It addressed how future historians might judge today's historiography, what we over- or under-emphasise, big data and big history, and how history is changing in the digital age. One of the aims of the event was to raise awareness of the changing nature of historians' primary sources in a digital age, and in particular to encourage attendees to think about how they handle their personal digital archives.
Year(s) Of Engagement Activity 2017
 
Description 'Will history survive the digital age?', BBC History magazine article 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact The article in BBC History magazine discussed the challenges for historians of working with large-scale born-digital sources, and also the actions that people can take to make sure that their own digital records are preserved for future researchers.
Year(s) Of Engagement Activity 2017
 
Description BBC World Histories 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Thinkpiece published in the February/March issue of BBC World Histories on the question 'Is the world changing faster than ever before?'
Year(s) Of Engagement Activity 2018
 
Description CPD25 M25 Consortium of Academic Libraries event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presentation about the use and promotion of born digital archives at a CPD25 event on 'My Digital Tools Bring all the Researchers to the Library - Marketing your Library in the 21st Century'. The main aim of the presentation was to demonstrate how to engage humanities researchers with 'difficult' digital collections.
Year(s) Of Engagement Activity 2016
 
Description Digital History summer school, Lausanne 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Opening keynote at a Digital History summer school organised by the University of Lausanne in June 2017.
Year(s) Of Engagement Activity 2017
URL https://www.dhsummerschool.ch/
 
Description Future past: researching archives in the digital age 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation on common misconceptions among researchers about born digital archives as part of a symposium organised by the British Records Association and the Institute of Historical Research on 'Future past: researching archives in the digital age'.
Year(s) Of Engagement Activity 2017
 
Description Public debate 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Participation in a roundtable on the subject of 'Web Archives: truth, lies and politics in the 21st century'. The web and social media play a key role in the circulation of news in the 21st century. But increasingly it is becoming difficult to separate fact from fiction and untruth, or even to agree on what constitutes fact. These problems are heightened by the speed with which information can be shared, modified or deleted, the personalisation (both explicit and hidden) that determines which news we see online, and the difficulties of establishing authorship and provenance. This public roundtable discussed the role of web and social media archives in helping us, as digital citizens, to navigate through this complex and changing information landscape.
Year(s) Of Engagement Activity 2017
URL https://archivedweb.blogs.sas.ac.uk/digital-conversations/
 
Description The National Archives Big Ideas seminar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presentation to staff at The National Archives of the UK of 'Born digital big data and approaches for history and the humanities'. Feedback indicated that the talk raised awareness of the challenges associated with digital archiving.
Year(s) Of Engagement Activity 2017
 
Description Web Archiving Week 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A week of web archiving events and activities was organised in London on 12-16 June 2017. The centrepiece of the programme was a major international conference combining the second RESAW Conference and the rescheduled IIPC Web Archiving Conference, 14-16 June. The week began with a two-day Archives Unleashed hackathon, and a public debate was held on the evening of 14 June, as part of the British Library's series of Data Conversations.

Web Archiving Week was hosted by the British Library and the School of Advanced Study, University of London, and organised with the support and assistance of the IIPC, RESAW (A Research Infrastructure for the Study of Archived Web Materials), The National Archives and Archives Unleashed.
Year(s) Of Engagement Activity 2017
URL https://archivedweb.blogs.sas.ac.uk/