ARCHANGEL - Trusted Archives of Digital Public Records

Lead Research Organisation: University of Surrey
Department Name: Vision Speech and Signal Proc CVSSP

Abstract

The aim of ARCHANGEL is to ensure the long-term sustainability of digital archives though the design, development and trialling of transformational new distributed ledger technology (DLT) to promote accessibility and ensure integrity of content, whilst maximising its impact through novel business models for commodification and open access.

Archives and Memory Institutions (AMIs) are the lens through which future generations will perceive today; they form the authoritative economic, social and cultural memory of a nation. For example, The National Archives (>15 petabytes) is one of the world's largest and oldest AMIs responsible for preserving the digital record of the UK Government e.g. key decisions made by Ministers and advice received. Some of this information is made open, some kept closed for decades. AMIs are founded upon the principle of public trust, of being neutral and completely trustworthy; the immutability and integrity of AMIs are essential to maintaining their objectivity. Yet world history is littered with examples where this objectivity has been compromised e.g. through expunging of physical records during times of political unrest. Today's digital age presents new socio-technical challenges to AMIs around safeguarding of data. Digital public records are intangible and so easy to remove or modify without that modification necessarily being detectable. Indeed in some cases records have to be modified to ensure their continued accessibility as formats change and the curation of data is also accompanied by the need to maintain associated code to render that data for presentation, often across decades. How should decisions over migration or prioritisation of maintenance be taken, or audited? What are the implications of migrations resulting in minor losses of fidelity one hundred years from now? How can the public be sure that digital content when released is fundamentally unaltered from the original? Existing archival practice is ill-equipped to respond to such issues, and is in urgent need of disruption to keep pace with our transformation into an increasingly digital society, so ensuring the integrity and impartiality of knowledge for future generations.

ARCHANGEL is a 18 month socio-technical feasibility study co-creating and evaluating a novel prototype DLT service with end-users to determine how archival practices, sustainable models and public attitudes could evolve in the presence of a trusted decentralised technology to prove content integrity and ensure open access to digital public archives. From a technological standpoint, ARCHANGEL will leverage cutting-edge machine learning to collect robust digital signatures derived from digitised physical, and born-digital content, within a permissioned DLT. Both signatures and programmatic code to render content and verify its provenance and integrity will be encoded within the DLT. Novel business models for sustaining the DLT e.g. via contributed effort (proof of work) will be explored at the points of creation and consumption using a cross-AMI model in which a single DLT is contributed to by multiple AMIs, across disciplines and nations, mitigating risk of archive distortion by its operating AMI. Impact is not limited to traditional AMIs, but any digital public archive: University research data repositories (linked to DOI); better management of corporate memory in multi-nationals (e.g. financial/regulatory compliance, managing records of prior art in tech companies).

To undertake this adventurous and ambitious project we have formed a strategic multi-disciplinary partnership uniting a world-leading group in multi-modal signal processing (CVSSP), the Centre for the Digital Economy (CODE) within Surrey Business School, and a consortium of AMI stakeholders including The National Archives and Tim Berners-Lee's Open Data Institute (ODI). The infrastructure will be developed with DLT platform provider Guardtime, and impact accelerated via Methods Digital.

Planned Impact

ARCHANGEL will transform the sustainability of digital public archives, delivering long-term horizontal impact across all sectors within the Digital Economy benefiting from enhanced integrity and accessibility of such archives. For example, archives of policy evidence (government and public services), research data (e.g. institutions working with medical or climate data), regulatory compliance data (finance, commerce), intellectual property data (patent filing bodies) and benefiting society more broadly through Archives and other Memory Institutions (AMIs) from cultural archives to the National Archives of government. Through partnership with the Open Data Institute (ODI), ARCHANGEL will engage many organisations, both public sector and commercial, as well as cross industry sector groups, in the building of the ARCHANGEL infrastructure to increase efficiency and generate more value from data, whether open, shared or closed. Decentralisation is an important factor in creating shared trustworthy data infrastructures, so this work will have applications in many areas across the economy.

ARCHANGEL will deliver vertical impact to specific sectors within the AMI landscape, driven through end-user partner The National Archives (TNA) - one of the world's largest and oldest archives, with >185km of physical records and capability to store >15 petabytes of digital content. TNA sets standards and accredits other archives through the Archive Service Accreditation scheme. ARCHANGEL will impact the record of government through TNA enabling dissemination of best practice to others (interest already received from two additional cultural archives) through TNA's leadership role. A founder member of the Digital Preservation Coalition (DPC), TNA is ideally placed to ensure ARCHANGEL's outcomes reach other AMIs beyond the archives sector such as the British Library and the National Records of Scotland through the DPC. ARCHANGEL is directly aligned with TNA's priorities which cite digital as its biggest strategic challenge, with the exponential growth of digital content (especially social media) ARCHANGEL is timely, coinciding with the deluge of government departments transferring digital records to TNA for long term preservation and the mandated reduction in timing for transfer. UK society is impacted more broadly through TNA's adoption since trusted records of government are needed to support policy development and assess their impact; to provide accountability for decisions; to share knowledge and to enable departments to provide accurate and comprehensive evidence to inquiries or legal actions. Partner Methods Digital frequently contracts with key government verticals and big business to drive digital transformation and will assist with exploitation and dissemination.

ARCHANGEL engages significantly with end-users throughout, adopting an open participatory, co-design process. Engagement begins with sandpits scoping the scenarios and real-world problems to address, following up with iterative live trials. We will organise a cross-disciplinary stakeholder workshop, and engage relevant RCUK networks (e.g. CREDIT), initiating a series of mini-projects to develop case studies showcasing ARCHANGEL in the wild to maximise impact and uptake of the research.

We are committed to public engagement which will see outcomes packaged into film-based deliverables as part of a schools and broader public outreach programme engaging the public in future-scoping workshops, further informing research objectives. Academic impact will be delivered through targeting top-tier internationally-focused venues in the multi-modal signal processing and secure systems space e.g. PAMI, ICCV, IJCAI, ACM TISSEC, TDSC. Of potentially high impact is the adaptation of Guardtime's global DLT platform to deliver an entirely novel DLT based infrastructure for trusted public digital archives. This could open new information assurance markets for Guardtime and DLT providers.

Publications

10 25 50