International Research Collaboration Network in Computational Archival Science (IRCN-CAS)

Lead Research Organisation: King's College London
Department Name: Digital Humanities

Abstract

The large-scale digitisation of analogue archives, the emerging diverse forms of born-digital archive, and the new ways in which researchers across disciplines (as well as the public) wish to engage with archival material, are disrupting traditional archival theories and practices, and are presenting challenges for practitioners and researchers who work with archival material. They also offer enhanced possibilities for scholarship, through the application of computational methods and tools to the archival problem space, and, more fundamentally, through the integration of 'computational thinking' with 'archival thinking'. This potential has led the collaborators in this proposal to identify Computational Archival Science (CAS) as a new field of study (see http://dcicblog.umd.edu/cas/).

This Network will extend this prior work by engaging directly with the archive community and archival practitioners. The specific research focus of the Network will be on the use of computational methods for contextualising digital records, both digitised and born-digital. Contextualisation is one of the central challenges for the modern archive highlighted in the Digital Strategy 2017-19 from The National Archives, and is an area in which computational methods have great potential. The Network will reach out directly to archives and archival practitioners, establishing systematic collaborations and research programmes that leverage and extend the innovative research being carried out within individual institutions in this area.

The context of a record is key for understanding its value as historical evidence, and the ability to map out and provide access to that context is key for conferring value on what would otherwise be (relatively) disconnected pieces of information, enabling them to be used effectively - found, understood and re-purposed - by historians and other archive-centric scholars drawing on the archival evidence base. The increasingly digital nature of the archive provides opportunities as well as challenges for addressing this question, by using a range of computational methods for meeting the increasingly complex demands of both archival users and practitioners.

This Network will organise a series of interconnected events to explore this question of contextualisation, whether through capturing metadata, enhancing records by semantic tagging, or indeed contextualising records with other records, and thus connecting up previously disconnected information into 'knowledge graphs'. We will not focus on specific technologies, but rather examine a range of technologies with potential for meeting this challenge, including natural language processing, graph technologies, machine learning, probabilistic approaches, and other methods from the broad field of data science and AI. The focus will thus be on the research question rather than the technology.

The events will be organised as two research symposia and two 'datathons'. The first symposium will be a participatory event focusing on identifying and opening up the questions we will explore, with participants from the complementary fields of information/archival science, computer/data science, and archival practice. The datathons will conduct hands-on experiments relating to some of these questions, through small teams of researchers - with a particular focus on early career researchers and graduate students - who will work collaboratively on small projects applying computational methods to a range of challenges relating to record contextualisation. One datathon will focus on digitised records, the other on born-digital records. The closing symposium will then draw these threads together, focusing on reflection, synthesis, and identifying future programme of work and collaborations, with a view to implementing some of the ideas generated in practice. We will produce a final white paper that integrates the Network's conclusions and recommendations for further work.

Planned Impact

1) Archive professionals, and other cultural/heritage organisations

The area addressed by the Network will have a profound effect on the nature of archives, on how researchers and other users interact with them, and on how archivists create and manage them. Not only will it result in new opportunities for archives in terms of the service they provide, it will impact upon the role of the archival profession in the face of AI-assisted recordkeeping and other forms of automation. The Network will, as a consequence, also impact on archival education and training, by making recommendations developing strategies and specific case studies, and producing materials that can be integrated into MA programmes for archivists and other information professionals. The Network's linking of professional practice and academia will facilitate this, and will also provide the context for carrying out trials within existing educational programmes.

2) Government and policy makers

The ability to explore the context of records is key both to supporting transparency around the creation, implementation and impact of government policy and actions, and to providing an evidence base for government policy development, two fundamental aspects of the role of official government archives such as The National Archives and the Maryland State Archives. The research area addressed by the Network will thus impact on the process of government, and in particular on the creation, implementation and evaluation of policy decisions.

3) Digital technology and AI industries

The digital industries, in particular data science and AI, are a key part of industrial strategy and future economic development in both the UK and USA. While there is a great deal of private sector R&D being undertaken in this area, much of this is focused on datasets generated in the Web or Internet (e.g. in marketing) or in datasets generated by specific industrial sectors (e.g. smart energy grids, medicine). Archival data presents its own challenges of heterogeneity and complexity, and provides a challenging context for identifying and trialling new methods and tools, or enhancing existing ones. The Network is this expected to drive new technological developments in this area.

4) General public (and also professions such as journalism)

The general public are key stakeholders in government archives. They are in many cases what the records in the archives are "about", and they are the people who are affected by the decisions and processes that the archives document. The integration of 'computational contextualisation' within archives will increase the transparency of these decisions and processes, and consequently the public's understanding of them. In particular, such methods, developed by future programmes that will be facilitated by the Network's activities, will in the longer term make archives more accessible to the public, increasing inclusivity and reducing the 'information divide' between people with the expertise and resources to negotiate archives, and those without.

5) Funding bodies

The Network will help to identify areas for which further research, exploration and development would be of value, and provide an evidence base on which funders will be able to draw when establishing priority areas for support. In addition, the Network will in the long run impact upon the value for money obtained by funders of archives and cultural organisations (and taxpayers), as their collections will become more open.

Publications

10 25 50
 
Description The project ran two events in which we investigated - through hands-on sessions involving mixed groups of archival practitioners, data scientists, graduate students, and historians - how computational methods can be used to enhance access to or use of archival materials. The focus was not so much on specific technologies, but rather on the methodological issues of applying them.
Exploitation Route Archives and archival practitioners: use of results to enhance access to and use of their collections.
Higher education (e.g. in archival studies, information science): the project has produced case studies that can be used as exemplars in education.
Sectors Education,Culture, Heritage, Museums and Collections

 
Description Findings have been integrated into research-led teaching (at PGT level) at KCL and UMD, and have fed into archival work at TNA.
First Year Of Impact 2020
Sector Education,Culture, Heritage, Museums and Collections
Impact Types Cultural,Societal,Economic

 
Description Advanced Information Collaboratory (AIC), an international collaboration based at the University of Maryland 
Organisation University of Maryland
Country United States 
Sector Academic/University 
PI Contribution The new initiaitive grew out of this AHRC network and other joint activities.
Collaborator Contribution The new initiaitive grew out of this AHRC network and other joint activities.
Impact N/A
Start Year 2020
 
Description Collaboration with The Turing Institute 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution Staff from the Turing Institute were active participants in out data exploration workshop held at The National Archives, and the final network symposium was hosted at the Turing Institute, who are now key partners in the N etwork going forward.
Collaborator Contribution Exchange of knowledge and skills through discussion and active participation in events.
Impact White paper on data event in progress.
Start Year 2019
 
Description Special issue of ACM Journal on Computing and Cultural Heritage (JOCCH) addressing Computational Archival Science 
Organisation The National Archives
Country United Kingdom 
Sector Public 
PI Contribution My Co-Is from The National Archives and the University of Maryland worked together on the proposal and the call for papers, which has been accepted. The submission deadline for the special issue is 31 August 2020, and the issue will be published in 2021. Mark Hedges is the corresponding editor; the other editors are Eirini Goudarouli (The National Archives) and Richard Marciano (University of Maryland).
Collaborator Contribution See previous.
Impact None yet - but the special issue itself will be the main outcome. I will update this when the publication date/details are available.
Start Year 2019
 
Description Special issue of ACM Journal on Computing and Cultural Heritage (JOCCH) addressing Computational Archival Science 
Organisation University of Maryland
Country United States 
Sector Academic/University 
PI Contribution My Co-Is from The National Archives and the University of Maryland worked together on the proposal and the call for papers, which has been accepted. The submission deadline for the special issue is 31 August 2020, and the issue will be published in 2021. Mark Hedges is the corresponding editor; the other editors are Eirini Goudarouli (The National Archives) and Richard Marciano (University of Maryland).
Collaborator Contribution See previous.
Impact None yet - but the special issue itself will be the main outcome. I will update this when the publication date/details are available.
Start Year 2019
 
Description IRCN-CAS Network final public symposium at the Alan Turing Institute 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Approximatey 100 people attended a public symposium at the Alan Turing Institute. This resulted in the enlargement of our network with practitioners in the archives and data science fields.
Year(s) Of Engagement Activity 2020
URL https://www.turing.ac.uk/events/computational-archival-science-cas-symposium