Machines Reading Maps: Finding and Understanding Text on Maps

Lead Research Organisation: The Alan Turing Institute
Department Name: Research

Abstract

'Machines Reading Maps' (MRM) aims to change the way that humanists and heritage professionals interact with map images. Maps constitute a significant body of global cultural heritage, and they are being scanned at a rapid pace in the US and UK. However, most critical investigation of maps continues on a small scale, through close 'readings' of a few maps. Individual maps communicate through visual grammars, supplemented by text. But text on maps is an almost entirely untapped source for understanding how knowledge of place is constructed. Investigating map content at scale can teach us about what has been preserved and omitted in the cartographic record. Such knowledge is a key starting point for understanding why using map text to enrich collection metadata may be advisable (when collection records lack any or only the most superficial geographic or locational information) or potentially harmful (when map text replicates colonial power structures).

Additionally, the right maps can be hard to find. Large map collections tend to be among those that are rarely catalogued at item level: one record is meant to capture the metadata for dozens, if not thousands, of sheets. This is a well-known research obstacle, one that has made historical maps, like serialized sources such as newspapers, challenging sources in the humanities. We envision a future where map collections can be searched based on their spatial content, similar to the way that digitised newspaper collections enable full-text searching across scanned pages. This project contributes to reversing the fortunes of historic map collections at the moment when many of them are being made available online. MRM will enable researchers and cultural institutions to generate and analyze this data across collections and institutions, contributing to metadata creation and decolonization efforts, and enhancing accessibility and discoverability of un- or minimally-catalogued sheets.

MRM builds on the project team's expertise around historical maps and map processing. Importantly, it refines an already robust tool for extracting text from maps (Strabo), developed by the US Co-I and colleagues on the Linked Maps project. Advancing software tools for handling new types of maps is essential to making text extraction a method that can be used in libraries and archives around the world. MRM generates data from scanned map collections and builds community among map and data curators, metadata and digital scholarship specialists, historians, and geographic information and data scientists. Working with partners at the National Library of Scotland (NLS), British Library (BL) and the Library of Congress (LC) who have extensive scanned map collections, this work unites research questions about the spatial experience of industrialization in 19th-c. Great Britain and social change in US cities during the 20th c. with GIScience expertise in using computational methods to process historical maps at scale.

By predicting what type of content text on maps represents (roads, buildings, mountains, etc.) and linking to gazetteers (indexes of places and related metadata, like locations), we unlock the potential for users to find and interpret maps by the thousands. Cultural institutions can feed map text data back into their work to study the geographical coverage of their collections, or investigate differences between existing metadata and reported locations of map labels. On a sheet-by-sheet basis, for example, MARC fields for subjects and topics can be enriched by map text. After processing US and UK maps and linking them to historical gazetteers, we test linking UK map labels to Scottish Trade Directories, and matching Sanborn map data to US census records-making a significant contribution to both British and American digital historical data. Such research test cases exemplify the versatility of map labels as primary sources.

Publications

10 25 50