International Research Collaboration Network in Computational Archival Science (IRCN-CAS)
Lead Research Organisation:
King's College London
Department Name: Digital Humanities
Abstract
The large-scale digitisation of analogue archives, the emerging diverse forms of born-digital archive, and the new ways in which researchers across disciplines (as well as the public) wish to engage with archival material, are disrupting traditional archival theories and practices, and are presenting challenges for practitioners and researchers who work with archival material. They also offer enhanced possibilities for scholarship, through the application of computational methods and tools to the archival problem space, and, more fundamentally, through the integration of 'computational thinking' with 'archival thinking'. This potential has led the collaborators in this proposal to identify Computational Archival Science (CAS) as a new field of study (see http://dcicblog.umd.edu/cas/).
This Network will extend this prior work by engaging directly with the archive community and archival practitioners. The specific research focus of the Network will be on the use of computational methods for contextualising digital records, both digitised and born-digital. Contextualisation is one of the central challenges for the modern archive highlighted in the Digital Strategy 2017-19 from The National Archives, and is an area in which computational methods have great potential. The Network will reach out directly to archives and archival practitioners, establishing systematic collaborations and research programmes that leverage and extend the innovative research being carried out within individual institutions in this area.
The context of a record is key for understanding its value as historical evidence, and the ability to map out and provide access to that context is key for conferring value on what would otherwise be (relatively) disconnected pieces of information, enabling them to be used effectively - found, understood and re-purposed - by historians and other archive-centric scholars drawing on the archival evidence base. The increasingly digital nature of the archive provides opportunities as well as challenges for addressing this question, by using a range of computational methods for meeting the increasingly complex demands of both archival users and practitioners.
This Network will organise a series of interconnected events to explore this question of contextualisation, whether through capturing metadata, enhancing records by semantic tagging, or indeed contextualising records with other records, and thus connecting up previously disconnected information into 'knowledge graphs'. We will not focus on specific technologies, but rather examine a range of technologies with potential for meeting this challenge, including natural language processing, graph technologies, machine learning, probabilistic approaches, and other methods from the broad field of data science and AI. The focus will thus be on the research question rather than the technology.
The events will be organised as two research symposia and two 'datathons'. The first symposium will be a participatory event focusing on identifying and opening up the questions we will explore, with participants from the complementary fields of information/archival science, computer/data science, and archival practice. The datathons will conduct hands-on experiments relating to some of these questions, through small teams of researchers - with a particular focus on early career researchers and graduate students - who will work collaboratively on small projects applying computational methods to a range of challenges relating to record contextualisation. One datathon will focus on digitised records, the other on born-digital records. The closing symposium will then draw these threads together, focusing on reflection, synthesis, and identifying future programme of work and collaborations, with a view to implementing some of the ideas generated in practice. We will produce a final white paper that integrates the Network's conclusions and recommendations for further work.
This Network will extend this prior work by engaging directly with the archive community and archival practitioners. The specific research focus of the Network will be on the use of computational methods for contextualising digital records, both digitised and born-digital. Contextualisation is one of the central challenges for the modern archive highlighted in the Digital Strategy 2017-19 from The National Archives, and is an area in which computational methods have great potential. The Network will reach out directly to archives and archival practitioners, establishing systematic collaborations and research programmes that leverage and extend the innovative research being carried out within individual institutions in this area.
The context of a record is key for understanding its value as historical evidence, and the ability to map out and provide access to that context is key for conferring value on what would otherwise be (relatively) disconnected pieces of information, enabling them to be used effectively - found, understood and re-purposed - by historians and other archive-centric scholars drawing on the archival evidence base. The increasingly digital nature of the archive provides opportunities as well as challenges for addressing this question, by using a range of computational methods for meeting the increasingly complex demands of both archival users and practitioners.
This Network will organise a series of interconnected events to explore this question of contextualisation, whether through capturing metadata, enhancing records by semantic tagging, or indeed contextualising records with other records, and thus connecting up previously disconnected information into 'knowledge graphs'. We will not focus on specific technologies, but rather examine a range of technologies with potential for meeting this challenge, including natural language processing, graph technologies, machine learning, probabilistic approaches, and other methods from the broad field of data science and AI. The focus will thus be on the research question rather than the technology.
The events will be organised as two research symposia and two 'datathons'. The first symposium will be a participatory event focusing on identifying and opening up the questions we will explore, with participants from the complementary fields of information/archival science, computer/data science, and archival practice. The datathons will conduct hands-on experiments relating to some of these questions, through small teams of researchers - with a particular focus on early career researchers and graduate students - who will work collaboratively on small projects applying computational methods to a range of challenges relating to record contextualisation. One datathon will focus on digitised records, the other on born-digital records. The closing symposium will then draw these threads together, focusing on reflection, synthesis, and identifying future programme of work and collaborations, with a view to implementing some of the ideas generated in practice. We will produce a final white paper that integrates the Network's conclusions and recommendations for further work.
Planned Impact
1) Archive professionals, and other cultural/heritage organisations
The area addressed by the Network will have a profound effect on the nature of archives, on how researchers and other users interact with them, and on how archivists create and manage them. Not only will it result in new opportunities for archives in terms of the service they provide, it will impact upon the role of the archival profession in the face of AI-assisted recordkeeping and other forms of automation. The Network will, as a consequence, also impact on archival education and training, by making recommendations developing strategies and specific case studies, and producing materials that can be integrated into MA programmes for archivists and other information professionals. The Network's linking of professional practice and academia will facilitate this, and will also provide the context for carrying out trials within existing educational programmes.
2) Government and policy makers
The ability to explore the context of records is key both to supporting transparency around the creation, implementation and impact of government policy and actions, and to providing an evidence base for government policy development, two fundamental aspects of the role of official government archives such as The National Archives and the Maryland State Archives. The research area addressed by the Network will thus impact on the process of government, and in particular on the creation, implementation and evaluation of policy decisions.
3) Digital technology and AI industries
The digital industries, in particular data science and AI, are a key part of industrial strategy and future economic development in both the UK and USA. While there is a great deal of private sector R&D being undertaken in this area, much of this is focused on datasets generated in the Web or Internet (e.g. in marketing) or in datasets generated by specific industrial sectors (e.g. smart energy grids, medicine). Archival data presents its own challenges of heterogeneity and complexity, and provides a challenging context for identifying and trialling new methods and tools, or enhancing existing ones. The Network is this expected to drive new technological developments in this area.
4) General public (and also professions such as journalism)
The general public are key stakeholders in government archives. They are in many cases what the records in the archives are "about", and they are the people who are affected by the decisions and processes that the archives document. The integration of 'computational contextualisation' within archives will increase the transparency of these decisions and processes, and consequently the public's understanding of them. In particular, such methods, developed by future programmes that will be facilitated by the Network's activities, will in the longer term make archives more accessible to the public, increasing inclusivity and reducing the 'information divide' between people with the expertise and resources to negotiate archives, and those without.
5) Funding bodies
The Network will help to identify areas for which further research, exploration and development would be of value, and provide an evidence base on which funders will be able to draw when establishing priority areas for support. In addition, the Network will in the long run impact upon the value for money obtained by funders of archives and cultural organisations (and taxpayers), as their collections will become more open.
The area addressed by the Network will have a profound effect on the nature of archives, on how researchers and other users interact with them, and on how archivists create and manage them. Not only will it result in new opportunities for archives in terms of the service they provide, it will impact upon the role of the archival profession in the face of AI-assisted recordkeeping and other forms of automation. The Network will, as a consequence, also impact on archival education and training, by making recommendations developing strategies and specific case studies, and producing materials that can be integrated into MA programmes for archivists and other information professionals. The Network's linking of professional practice and academia will facilitate this, and will also provide the context for carrying out trials within existing educational programmes.
2) Government and policy makers
The ability to explore the context of records is key both to supporting transparency around the creation, implementation and impact of government policy and actions, and to providing an evidence base for government policy development, two fundamental aspects of the role of official government archives such as The National Archives and the Maryland State Archives. The research area addressed by the Network will thus impact on the process of government, and in particular on the creation, implementation and evaluation of policy decisions.
3) Digital technology and AI industries
The digital industries, in particular data science and AI, are a key part of industrial strategy and future economic development in both the UK and USA. While there is a great deal of private sector R&D being undertaken in this area, much of this is focused on datasets generated in the Web or Internet (e.g. in marketing) or in datasets generated by specific industrial sectors (e.g. smart energy grids, medicine). Archival data presents its own challenges of heterogeneity and complexity, and provides a challenging context for identifying and trialling new methods and tools, or enhancing existing ones. The Network is this expected to drive new technological developments in this area.
4) General public (and also professions such as journalism)
The general public are key stakeholders in government archives. They are in many cases what the records in the archives are "about", and they are the people who are affected by the decisions and processes that the archives document. The integration of 'computational contextualisation' within archives will increase the transparency of these decisions and processes, and consequently the public's understanding of them. In particular, such methods, developed by future programmes that will be facilitated by the Network's activities, will in the longer term make archives more accessible to the public, increasing inclusivity and reducing the 'information divide' between people with the expertise and resources to negotiate archives, and those without.
5) Funding bodies
The Network will help to identify areas for which further research, exploration and development would be of value, and provide an evidence base on which funders will be able to draw when establishing priority areas for support. In addition, the Network will in the long run impact upon the value for money obtained by funders of archives and cultural organisations (and taxpayers), as their collections will become more open.
Publications
Description | The project ran two events in which we investigated - through hands-on sessions involving mixed groups of archival practitioners, data scientists, graduate students, and historians - how computational methods can be used to enhance access to or use of archival materials. The focus was not so much on specific technologies, but rather on the methodological issues of applying them. |
Exploitation Route | Archives and archival practitioners: use of results to enhance access to and use of their collections. Higher education (e.g. in archival studies, information science): the project has produced case studies that can be used as exemplars in education. |
Sectors | Education Culture Heritage Museums and Collections |
Description | Findings have been integrated into research-led teaching (at PGT level) at KCL and UMD, and have fed into archival work at TNA. |
First Year Of Impact | 2020 |
Sector | Education,Culture, Heritage, Museums and Collections |
Impact Types | Cultural Societal Economic |
Description | Advanced Information Collaboratory (AIC), an international collaboration based at the University of Maryland |
Organisation | University of Maryland |
Country | United States |
Sector | Academic/University |
PI Contribution | The new initiaitive grew out of this AHRC network and other joint activities. |
Collaborator Contribution | The new initiaitive grew out of this AHRC network and other joint activities. |
Impact | N/A |
Start Year | 2020 |
Description | Collaboration with The Turing Institute |
Organisation | Alan Turing Institute |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Staff from the Turing Institute were active participants in out data exploration workshop held at The National Archives, and the final network symposium was hosted at the Turing Institute, who are now key partners in the N etwork going forward. |
Collaborator Contribution | Exchange of knowledge and skills through discussion and active participation in events. |
Impact | White paper on data event in progress. |
Start Year | 2019 |
Description | Special issue of ACM Journal on Computing and Cultural Heritage (JOCCH) addressing Computational Archival Science |
Organisation | The National Archives |
Country | United Kingdom |
Sector | Public |
PI Contribution | My Co-Is from The National Archives and the University of Maryland worked together on the proposal and the call for papers, which has been accepted. The submission deadline for the special issue is 31 August 2020, and the issue will be published in 2021. Mark Hedges is the corresponding editor; the other editors are Eirini Goudarouli (The National Archives) and Richard Marciano (University of Maryland). |
Collaborator Contribution | See previous. |
Impact | None yet - but the special issue itself will be the main outcome. I will update this when the publication date/details are available. |
Start Year | 2019 |
Description | Special issue of ACM Journal on Computing and Cultural Heritage (JOCCH) addressing Computational Archival Science |
Organisation | University of Maryland |
Country | United States |
Sector | Academic/University |
PI Contribution | My Co-Is from The National Archives and the University of Maryland worked together on the proposal and the call for papers, which has been accepted. The submission deadline for the special issue is 31 August 2020, and the issue will be published in 2021. Mark Hedges is the corresponding editor; the other editors are Eirini Goudarouli (The National Archives) and Richard Marciano (University of Maryland). |
Collaborator Contribution | See previous. |
Impact | None yet - but the special issue itself will be the main outcome. I will update this when the publication date/details are available. |
Start Year | 2019 |
Description | IRCN-CAS Network final public symposium at the Alan Turing Institute |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Approximatey 100 people attended a public symposium at the Alan Turing Institute. This resulted in the enlargement of our network with practitioners in the archives and data science fields. |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.turing.ac.uk/events/computational-archival-science-cas-symposium |