Oceanic Exchanges: Tracing Global Information Networks In Historical Newspaper Repositories, 1840-1914

Lead Research Organisation: University College London
Department Name: Sch of European Languages, Culture & Soc

Abstract

Newspapers were the first big data for a mass audience. Their dramatic expansion over the nineteenth century created a global culture of abundant, rapidly circulating information. The significance of the newspaper, however, has largely been defined in metropolitan and national terms in scholarship of the period. The collecting and digitization by local institutions further situated newspapers within a national context. "Oceanic Exchanges: Tracing Global Information Networks in Historical Newspaper Repositories, 1840-1914" (OcEx) brings together leading efforts in computational periodicals research from the US, Mexico, Germany, the Netherlands, Finland and the UK to examine patterns of information flow across national and linguistic boundaries in nineteenth century newspapers by linking digitized newspaper corpora currently siloed in national collections. OcEx seeks to break through the conceptual, institutional, and political barriers which have limited working with big humanities data by bringing together historical newspaper experts from different countries and disciplines around common questions; by actively crossing the national boundaries that have previously separated digitized newspaper corpora through computational analysis; and by illustrating the global connectedness of nineteenth-century newspapers in ways hidden by typical national organizations of digital cultural heritage. We propose to coordinate the efforts of this six-nation team to: build classifiers for textual and visual similarity of related newspaper passages; create a networked ontology of different genres, forms, and textual elements that emerged during the nineteenth century; model and visualise textual migration and viral culture; model and visualise conceptual migration and translation of texts across regional, national, and linguistic boundaries; analyze the sensitivity and generality of results; release public collections. For scholars of nineteenth-century periodicals and intellectual history, OcEx uncovers the ways that the international was refracted through the local as news, advice, vignettes, popular science, poetry, fiction, and more. By revealing the global networks through which texts and concepts traveled, OcEx creates an abundance of new evidence about how readers around the world perceived each other through the newspaper. These insights may reshape the assumptions that underpin research by scholars in comparative literature, translation studies, transnational and intellectual history, and beyond. Computational linguistics provides building blocks (recognizing translation, paraphrasing, text reuse) that can enable scholarly investigations, with both historical and contemporary implications. At the same time, such methods raise fundamental questions regarding the validity and reliability of their results (such as the effects of OCR-related noise, or imperfect comparability of corpora). Finally, by linking research across large-scale digital newspaper collections, OcEx will offer a model for national libraries and other data custodians that host large-scale data for digital scholarship. The project will test the accessibility and interoperability of emerging and well established newspaper digitisation efforts and output clear recommendations for structuring such development in future.

Planned Impact

Oceanic Exchanges will have significant impact upon a range of audiences and sectors, including academic researchers, in particular scholars of nineteenth-century periodicals and intellectual history, computational linguists and digital humanists; data custodians at institutions with responsibility for collecting and preserving digitised newspapers, e.g. national libraries and other heritage institutions; commercial publishers of historical newspapers; the wider historically interested public, journalists, educators and policy-makers. For scholars of nineteenth-century periodicals and intellectual history, Oceanic Exchanges uncovers the ways that the international was refracted through the local as news, advice, vignettes, popular science, poetry, fiction, and more. By revealing the global networks through which texts and concepts travelled, Oceanic Exchanges creates an abundance of new evidence about how readers around the world perceived each other through the newspaper. These insights may reshape the assumptions that underpin research by scholars in comparative literature, translation studies, transnational and intellectual history, and beyond. For computer science-related disciplines like computational linguistics and visualisation, Oceanic Exchanges represents an appealing application domain that offers a range of tasks (recognising translation, paraphrasing, text reuse, etc.) and fundamental challenges (such as robustness regarding OCR-related noise or assessing the comparability of corpora). The results from these disciplines then provide quantitative methods to pursue research questions on the transmission of texts and topics across languages has with both historical and contemporary implications. Finally, by linking research across large-scale digital newspaper collections, Oceanic Exchanges offers a model for national libraries and other data custodians that host large-scale data for digital scholarship. The project will test the accessibility and interoperability of emerging and well established newspaper digitization efforts and output clear recommendations for structuring such development in future. Data custodians will be represented on project's advisory board. Oceanic Exchanges will also reach out to radio and newspaper journalists and the wider historically interested public by submitting articles to popular science and humanities journals and creating and maintaining a public-facing website highlighting popular findings and case studies from the project. Many project outputs can be used by educators and policy-makers well beyond the project's actual life span, thus guaranteeing lasting impact.
 
Description Our work has made important contributions to two main areas: (i) we have cast new light on how and why digital newspaper archives take the form that they currently do and (ii) advanced the state of the art foundations of the cross-collection text analysis of selected North-Atlantic and Anglophone-Pacific retrodigitised nineteenth-century newspapers.
(i) How and why digital newspaper archives take the form that they currently do
The many digitisation activities that have been undertaken in recent years have resulted in millions of pages of historical newspapers being made available to end-users under various licenses. The numerous digital newspaper archives that have been created often give end-users the impression that they are working with self-contained and complete archives. This impression can be reinforced by the paucity of information that accompanies some digital newspaper archives about why the why material they contain was chosen and curated, how it was obtained, and from which specific source material it was transformed into a digital copy. In addition to a wide ranging literature review, we conducted a series of semi-structured interviews with librarians, archivists and digital content managers in public institutions and commercial companies based in Australia, the Netherlands, UK and USA. From these interviews we derived new information and reflections on the complex interplay of institutional, intellectual, economic, technical, practical and social factors that have shaped decisions about the inclusion and exclusion of digitised newspapers in and from online archives. We also drew attention to the use that some digital publishers are making of user analytics, an issue of which many users will be unaware. A key outcome of this work is a set of recommendations for digitisers of cultural heritage materials:
engage in critical (self-)reflection on the implicit and explicit selection criteria that shape their collections;

provide detailed selection rationale that inform users about the inclusion and exclusion of materials in and from the digital archive;

acknowledge and communicate the role that funding bodies, internal and advisory boards, user feedback, and tracked behaviour play in ongoing changes to collections or their access points;

inform users of how their actions are being tracked, and that future goods and services may be built upon this analysis of their behaviour;

Most importantly, this information should not be stored as an auxiliary report for the select few who request it but bundled with the digital archive as a living document that responsibly educates all users about the nature of the digital archive at every level of resolution-the collection, title, issue, article and corresponding metadata.
We hope that our recommendations will be taken up and that our article will also prompt further reflection on the kinds of critical frameworks that must be developed for use by those who seek to research, learn from, teach with and make creative use of digital newspaper archives.

(ii) Advancing the state of the art of the cross-collection text analysis of selected North-Atlantic and Anglophone-Pacific retrodigitised nineteenth-century newspapers.
The Oceanic Exchanges team published a substantial open access resource that will advance the state of the art of the cross-collection text analysis of selected North-Atlantic and Anglophone-Pacific retrodigitised nineteenth-century newspapers. We also hope that the approach set out in the report will be taken up by other researchers who wish to engage in foundational research on approaches to cross-collection computational analysis. he numerous newspaper digitisation projects that have been undertaken in recent years have resulted in the remediation of many millions of pages of nineteenth-century newspapers. Yet, those researchers who wish to pursue questions about global history, for example, have often found it difficult to carry out data-driven research across those digitised collections. As our report discusses, there are many reasons for this, including how digitisation projects are often undertaken in national settings but newspapers often participate in global conversations; standards that can overarch and integrate numerous, disparate digital newspaper collections have not been implemented; the shape and scope of digitised newspaper collections is informed by a multiplicity of situated contexts which can be difficult for those who are external to digitisation projects to establish; also, though digital newspapers are often encoded in line with METS/ALTO, for example, notable variations exist in how those metadata specifications are applied to digital newspaper collections exist.

To respond to this, and to further research that takes place across digital newspaper collections, this 200 page report brings together qualitative data, metadata and paradata about selected digitised newspaper databases. It provides crucial historical and contextual information about the circumstances under which those collections came into being. It provides a textual ontology that describes the relationships between the informational units of which the respective databases are comprised, between the data and metadata of the different collections and on the interrelationships between analogue newspapers and their retrodigitised representations. Also included are maps which support the visual inspection and comparison of data across disparate newspaper collections alone with JSON or xpath paths to the data.
Exploitation Route We have had some initial interest in our findings from commercial scholarly publishers and from the library sector. Their interest is especially in terms of recommendations that we might make about the level of information that they may deliver to the scholarly end user of a digitised resource about the selectional rationale that underpinned that resource.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections

URL https://oceanicexchanges.org/
 
Description The findings of the project have impacted the publishing and GLAM (Galleries, Libraries, Archives and Museum) sector via the Atlas of Digitised Newspapers and Metadata published by the project which is "an open access guide to digitised newspapers around the world. Its initial selection is limited in scope, being comprised of the ten databases (including the aggregator Europeana) for which we were able to secure access and licensing to the machine-readable data. Nonetheless, it aims to form the foundation of a wider mapping of collections beyond its current North Atlantic and Anglophone-Pacific focus. It brings together their histories and digitisation choices with a deeper look at the language of the digitised newspaper, the evolution of newspaper terminology and the variety of metadata available in these collections. It explores how machine-readable information about an issue, volume, page, and author is stored in the digital file alongside the raw content or text, and provides a controlled vocabulary designed to be used across disciplines, within academia and beyond" (https://eadh.org/projects/atlas-digitised-newspapers-and-metadata) and also through our We informed the publishing and memory sector about this research via a series of lectures and seminars set out in the corresponding sections of Research Fish. The impact that we have had has been to raise awareness within the cultural sector of the importance of documenting the deep history of the digitisation choices (writ large) that have informed newspaper collections. Indicators of this impact is attested by how, for example, the websites of cultural heritage digitisation initiatives link to the atlas and recommend its users to be aware of the histories of their collection that are documented in the Atlas (e.g. https://www.deutsche-digitale-bibliothek.de/content/newspaper/fragen-antworten?lang=en under the question "Where can I find further information on the Deutsches Zeitungportal?"). That the Deutsches Zeitung portal links to the (external) atlas to provide crucial context about the history of their collection suggests that they value the additional information that the alast provides to their internal collections. So too, we are noticing that a number of library guides are recommending their users consult the atlas, either to learn more about the histories of specific collections or regarding the methodological aspects of the atlas and how this may inform the source criticism of digital resources. Examples include: Michigan State University Library Research Guide (https://libguides.lib.msu.edu/c.php?g=95580&p=6293247), University of Ottowa Library (https://uottawa.libguides.com/Communication-en/news),
First Year Of Impact 2022
Sector Culture, Heritage, Museums and Collections
Impact Types Cultural

 
Description Empowering Users of Historical Digitised Newspapers Collections
Amount £83,725 (GBP)
Funding ID 1519 
Organisation Loughborough University 
Sector Academic/University
Country United Kingdom
Start 08/2019 
End 07/2020
 
Description UCL Grand Challenges Workshop: Facilitating New Connections Between the Disciplines and Professions that can Transform the Global Data Context
Amount £2,250 (GBP)
Organisation University College London 
Sector Academic/University
Country United Kingdom
Start 11/2018 
End 07/2019
 
Title Full Map of Digitised Newspaper Metadata 
Description A full map of metadata used by the various digitised newspaper databases discussed in the Atlas of Digitised Newspapers and Metadata 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact See narrative impact section 
URL https://figshare.com/articles/dataset/Full_Map_of_Digitised_Newspaper_Metadata/11560110
 
Description British Library Oral History Interview 
Organisation The British Library
Department British Library Labs
Country United Kingdom 
Sector Public 
PI Contribution Collaboration with the British Library about the digitization work on the digitised newspapers programme in the form of interviews conducted by Dr. Melodee Beals. This work is part of the Work Package 3 strand of the Oceanic Exchanges Project, which is concerned with mapping out a unified ontology of genres, forms, and textual elements to support transnational annotation of digitized newspapers and to develop a shared vocabulary for newspaper research. The work on this package includes questionnaires and interviews with staff from the digital curators team at the British Library.
Collaborator Contribution The partners are providing us with research data about the assumptions made during various digitisation and encoding processes of the newspapers . This in turn will provide researchers on the Oceanic Exchanges project to further the development of a shared, source-specific, and culturally neutral ontology for describing the form and content of nineteenth-century newspapers and to integrate these in collection with other newspaper datasets from within the Oceanic Exchanges project.
Impact 1)The current research being undertaken on the collection of the British Library newspaper datasets will be presented at the British Library Labs Roadshow on the 24th April 2018 by Dr Tessa Hauswedell https://www.eventbrite.co.uk/e/working-with-the-british-librarys-digital-content-data-and-services-tickets-43754312326. 2) The work being currently undertaken on the oral history questionnaires will result in a co-authored research article.
Start Year 2018
 
Description "Oceanic Exchanges: Building a Transnational Understanding of Digitised Newspapers" British Library Roadshow, University of Lincoln 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The presentation was to raise awareness of our research into digital archives, and to provide specific, hands on examples of how we engaged with the British Library collections in our research (as an example to other researchers) and what we were able to learn about the practical limitations and opportunities of these collections (for librarians, teachers and students using them). There was a large audience of nearly 50 individuals across the academic and library support services sector with a significant range of questions immediately after the presentation and throughout the day. Many requested contact when outputs were complete.
Year(s) Of Engagement Activity 2018
URL https://www.history-uk.ac.uk/sample-page/2018-working-with-the-british-librarys-digital-content-data...
 
Description A talk entitled: What can the Humanities do for digital technologies? 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact I gave a talk at the thinkBIG Workshop on the Digital Humanities and Computational Social Sciences. Cumberland Lodge, Windsor. As a result of giving this lecture and networking I got to know more colleagues working on the text mining of newspapers in the UK and beyond
Year(s) Of Engagement Activity 2018
URL https://thinkbig.enm.bris.ac.uk/dh-css-workshop/
 
Description Emily Bell: 'Like so many roads on an ethereal map': Remixing the Archive through Metadata Integration. Archives, Access and AI: Working with Born-Digital and Digitised Archival Collections/Loughborough University London 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact AHRC funded event on archives, access and AI including archivists, librarians, academics and others involved with collections.
Year(s) Of Engagement Activity 2020
URL https://ahrc.ukri.org/newsevents/events/calendar/archives-access-and-ai-working-with-born-digital-an...
 
Description Emily Bell: Linked News/Linked Data: Working Across Newspaper Collections Using the Semantic Web. New Directions in Nineteenth-Century Periodical Studies/University of Leeds (20 September 2019) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Event on new research in periodical studies. Laid the foundation for future collaborations on the Oceanic Exchanges Atlas.
Year(s) Of Engagement Activity 2019
URL https://newdirections19.wordpress.com/
 
Description Emily Bell: Questing for Interoperability with Newspaper Metadata (lightning talk and poster presentation). Linked Data and the Semantic Web for Humanities Research Spring School (LiSeH 2019)/University of Graz 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact A spring school on linked data, where we learned from one other's projects.
Year(s) Of Engagement Activity 2019
URL https://informationsmodellierung.uni-graz.at/en/institute/events/archive/spring-school-liseh-2019/
 
Description Emily Bell: Understanding Digitised Newspaper Databases Through Metadata (lightning talk)." Software Sustainability Institute Collaborations Workshop/Loughborough University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Understanding Digitised Newspaper Databases Through Metadata (lightning talk)." Software Sustainability Institute Collaborations Workshop/Loughborough University
Year(s) Of Engagement Activity 2019
 
Description Historical Portals: Preservation, Access and Community Building with Digital Newspaper Archives at the History Research Seminar. Sheffield Hallam University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Presentation to a mixed group of about 20 academic researchers, postgraduate researchers and interested support staff, which sparked questions and discussions for further research and development (feedback) on the possible impact and usefulness of project outputs to different audiences.
Year(s) Of Engagement Activity 2018
 
Description Invited lecture as part of STM: the Global Voice of Scholarly Publishing. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact I have an invited lecture entitled "Oceanic Exchanges, research across large-scale digital newspaper collections" at the STM: the Global Voice of Scholarly Publishing. UK conference as part of the strand "Digital humanities: Setting the stage to go digital in the humanities/social sciences"
Year(s) Of Engagement Activity 2018
URL https://www.stm-assoc.org/events/day-2-stm-week-2018-innovations/
 
Description Invited talk at the British Library Roadshow, 24th April 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact British Library Labs (BL Labs) promotes, inspires, and supports the use of the Library's digital collections and data. The team works on projects with researchers, developers, educators, entrepreneurs and artists from around the world.

The event included a series of presentations exploring the British Library's digital collections, how they have been used in various subject areas such as the Humanities, Computer Science and Social Sciences and the lessons learned by working with researchers, The Roadshow showcased examples of the British Library's digital content and data, addressed some of the challenges and issues of working with it, and how interesting and exciting projects have been developed via the annual British Library Labs Competition and Awards.

Tessa Hauswedell spoke about the "Oceanic Exchanges" Project:Tracing Global Information Networks In Historical Newspaper Repositories, 1840-1914.

There was some good discussion around potential ideas of working with the Library's data for researchers and the wider public and the talk led to a further invitation to give an extended talk at the British Library in the Autumn of 2018.
Year(s) Of Engagement Activity 2018
URL https://blogs.ucl.ac.uk/dh/2018/05/01/bllabs2018/
 
Description Invited talk at the British Library Staff Talk Series, 12 December 2018 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact This was a lunchtime talk about the Oceanic Exchanges Project for the British Library Staff as part of a series organised by the BL Digital Scholarship team. The talk was part of a season of talks on content mining for digital scholarship with cultural heritage collections.
The talk was given by Dr Julianne Nyhan and Dr Tessa Hauswedell at the invitation of Dr Rossitza Atanassova, Digital Curator at the BL. The aim was to showcase to professional archivists and curators how their collections are being utilised by researchers.
Year(s) Of Engagement Activity 2018
 
Description Launch event for the Atlas of Digitised Newspapers and Metadata 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact The evebt was advertised as follows: In 2020, The Atlas of Digitised Newspapers and Metadata: Reports from Oceanic Exchanges was published as part of the AHRC/ESRC-funded project 'Oceanic Exchanges: Tracing Global Information Networks in Historical Newspaper Repositories, 1840-1914'. Join us for a free, virtual launch (taking place on Zoom) during which we will give an introduction to the Atlas, including a how-to session on working with the collections included in the report.

Hosted by UCLDH, the launch will include speakers Professor Ryan Cordell (Northeastern University), Professor Melissa Terras (University of Edinburgh), Dr Quintus van Galen (Senior Librarian, Nieuwe Veste) and the report's editors, Dr Melodee Beals (Loughborough University) and Dr Emily Bell (University of Leeds).

The Atlas of Digitised Newspapers is a comprehensive guide to the histories, structures and metadata of ten collections of digitised newspapers, including those held by:

Chronicling America (The Library of Congress)
The Hemeroteca Nacional Digital de México
The British Library
The Times Digital Archive
Delpher (Koninklijke Bibliotheek)
Europeana
The Suomen Kansalliskirjaston Digitoidut Sanomalehdet
Trove (The National Library of Australia)
Papers Past (The National Library of New Zealand)

This ambitious project brought together a consortium of cultural historians, computational linguistics, literary scholars, digital curators, humanists, and computer scientists and represents a collaboration between researchers in six countries. The resulting report provides readers with a deep contextualisation of the collections including their history, selection process and digitisation projects, as well as detailed technical information about how to obtain, interpret, manipulate and map metadata and content across databases. A key resource for researchers of nineteenth-century periodicals, the report can be downloaded for free and has recently opened up to new contributors who can add further collections.

Find out more about the Atlas: https://www.digitisednewspapers.net.
Year(s) Of Engagement Activity 2021
URL https://www.ucl.ac.uk/digital-humanities/events/2021/jan/ucldh-online-book-launch-atlas-digitised-ne...
 
Description Plenary rountable contribution 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Third sector organisations
Results and Impact I was invited to contribute to a plenary session of the European Association of Digital Humanities meeting in Galway, Ireland on the topic of 'Data in Digital Humanities'. The session sparked much discussion and debate and I have been invited to give other talks as a result of it.
Year(s) Of Engagement Activity 2018