Machines Reading Maps: Finding and Understanding Text on Maps

Lead Research Organisation: The Alan Turing Institute
Department Name: Research

Abstract

'Machines Reading Maps' (MRM) aims to change the way that humanists and heritage professionals interact with map images. Maps constitute a significant body of global cultural heritage, and they are being scanned at a rapid pace in the US and UK. However, most critical investigation of maps continues on a small scale, through close 'readings' of a few maps. Individual maps communicate through visual grammars, supplemented by text. But text on maps is an almost entirely untapped source for understanding how knowledge of place is constructed. Investigating map content at scale can teach us about what has been preserved and omitted in the cartographic record. Such knowledge is a key starting point for understanding why using map text to enrich collection metadata may be advisable (when collection records lack any or only the most superficial geographic or locational information) or potentially harmful (when map text replicates colonial power structures).

Additionally, the right maps can be hard to find. Large map collections tend to be among those that are rarely catalogued at item level: one record is meant to capture the metadata for dozens, if not thousands, of sheets. This is a well-known research obstacle, one that has made historical maps, like serialized sources such as newspapers, challenging sources in the humanities. We envision a future where map collections can be searched based on their spatial content, similar to the way that digitised newspaper collections enable full-text searching across scanned pages. This project contributes to reversing the fortunes of historic map collections at the moment when many of them are being made available online. MRM will enable researchers and cultural institutions to generate and analyze this data across collections and institutions, contributing to metadata creation and decolonization efforts, and enhancing accessibility and discoverability of un- or minimally-catalogued sheets.

MRM builds on the project team's expertise around historical maps and map processing. Importantly, it refines an already robust tool for extracting text from maps (Strabo), developed by the US Co-I and colleagues on the Linked Maps project. Advancing software tools for handling new types of maps is essential to making text extraction a method that can be used in libraries and archives around the world. MRM generates data from scanned map collections and builds community among map and data curators, metadata and digital scholarship specialists, historians, and geographic information and data scientists. Working with partners at the National Library of Scotland (NLS), British Library (BL) and the Library of Congress (LC) who have extensive scanned map collections, this work unites research questions about the spatial experience of industrialization in 19th-c. Great Britain and social change in US cities during the 20th c. with GIScience expertise in using computational methods to process historical maps at scale.

By predicting what type of content text on maps represents (roads, buildings, mountains, etc.) and linking to gazetteers (indexes of places and related metadata, like locations), we unlock the potential for users to find and interpret maps by the thousands. Cultural institutions can feed map text data back into their work to study the geographical coverage of their collections, or investigate differences between existing metadata and reported locations of map labels. On a sheet-by-sheet basis, for example, MARC fields for subjects and topics can be enriched by map text. After processing US and UK maps and linking them to historical gazetteers, we test linking UK map labels to Scottish Trade Directories, and matching Sanborn map data to US census records-making a significant contribution to both British and American digital historical data. Such research test cases exemplify the versatility of map labels as primary sources.
 
Description Having reached the end of the UK-funded timeline for the Machines Reading Maps project, we are proud that the work is already making an impact in changing the way that researchers and map collection curators understand the potential for working with text on maps. (The US side of the project has been extended until October 2023.)
We began Machines Reading Maps motivated to see whether we could create usable datasets of text identified on historical maps. Our goals were to use this data to help map collections become more discoverable and to inspire critical research uses of these growing, digitized cartographic collections were ambitious. In less than two years, we have designed foundational methods and tools that allow people to work with text on maps one sheet at a time as well as thousands of maps at a time. Importantly, our work has not only developed technical solutions for working with maps as data at scale, we have also theorized our work with text on maps to ensure that the people using our tools interact with maps in ways that account for the cultural features unique to historical maps as these change over time and place.
Finally, our public-facing demonstrations and workshops showcasing the mapKurator and bespoke Recogito tools have led to additional funding for this work from private map collector David Rumsey as well as substantial interest in future collaborations from historians, geographers, environmental scientists, and librarians and archivists. We are now exploring opportunities to develop partnerships with an international consortium of libraries to support development of these tools.
Exploitation Route Tools created by this project can be re-used by libraries and other collections of historical maps to improve discoverability of those materials. Data created from our experiments using these tools is also available to anyone to re-use for research that benefits from additional historical, geolocated structured data about place.
Sectors Creative Economy,Digital/Communication/Information Technologies (including Software),Education,Environment,Government, Democracy and Justice,Culture, Heritage, Museums and Collections,Transport

 
Description First, our research is having an impact for the communities supported by our library partners (National Library of Scotland and Library of Congress in particular). The bespoke Recogito annotation interface has been used in a crowdsourcing project with the National Library of Scotland to produce a dataset of 25k annotations of text on maps of 19th c. Edinburgh, which is now an open access dataset documented and available on the NLS Data Foundry website. The NLS Data Foundry is an important point of access for creative re-uses of cultural heritage data, and we look forward to hearing about future initiatives that use this data for a variety of purposes. Our work to experimentally process a sample from the Library of Congress's Sanborn fire insurance map collection is resulting in an innovative collaboration with the LC Labs group to release the derived data as an item in the main Library of Congress catalog, documented using their new standards for "Data Packages" (https://labs.loc.gov/work/experiments/cchc/). The data released in this way will be openly available as a foundational example of large, derived data package that can be used as a benchmark for comparing against future methodological experiments in spotting text on maps. This will be possible because of our work to share both the machine-generated data (mapKurator output) and a set of gold standard data that machine-generated data can be evaluated against in the future. With the British Library, our remaining cultural heritage partner in the UK, we are organizing a final workshop to demonstrate the map annotation and processing tools developed on Machines Reading Maps and to scope opportunities for incorporating text on maps within BL workflows in the future, whether as derived datasets (like with the Library of Congress initiative) or as metadata within traditional catalog record fields. The bespoke Recogito annotation platform has been adopted as software for a new ERC-funded project, documented elsewhere in this report. We are developing plans for this to become a permanently hosted public research tool for the growing community of people who wish to annotate maps. Findings from this project are being discussed widely across the cultural heritage sector, with an aim to continue supporting this work to improve results, test the methods on new collections, and build more robust workflows for integrating text on maps in library systems. For example, the new Associate University Librarian of Research Data Services at Stanford Libraries, Peter Leonard, included Machines Reading Maps as an example of cutting-edge opportunities for digital libraries in his first annual report to Library Trustees. Not surprisingly, therefore, with Stanford Libraries, we are organizing a conference in April 2023 at Stanford University, to scope the creation of and future financial support for an open source community around Machines Reading Maps tool development and use.
First Year Of Impact 2023
Sector Digital/Communication/Information Technologies (including Software),Education,Culture, Heritage, Museums and Collections
Impact Types Cultural

 
Description K. McDonough, invited member of JISC Task & Finish Group on AI & Machine Learning for Arts & Humanities Research in the UK
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
URL https://digitisation.jiscinvolve.org/wp/2023/02/03/is-ai-for-me/
 
Title Documentation and training material (V. Vitale) 
Description Documentation and training material, in GitHub Wiki format, about the use of our annotation platform, integrating manual and automated reading and enriching of digitised maps. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact The documents detailing how to use our platform have been used as support material in the workshops our project organised, as well as in those organised by some of our partners, either in collaboration with us or independently. 
URL https://github.com/machines-reading-maps/Tutorials-Newsletters/wiki
 
Title Entity Recognition API 
Description This entity linking API developed by UMN PhD student Jina Kim links labels identified by mapKurator models (including only partially recognised ones) to external knowledge bases. 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact Enables semantic type prediction at scale for map text. 
URL https://github.com/machines-reading-maps/entity-recommendation-api
 
Title mapKurator (Z.Li) 
Description mapKurator automatically reads text from maps and links map text to external knowledge bases to generate semantic rich metadata for individual map scans. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact MapKurator is being tested in multiple collections beyond the scope of our initial plans for this grant. Success demoing mapKurator has resulted in further funding in the form of a personal gift from major private map collector David Rumsey to both The Alan Turing Institute and the University of Minnesota to aid ongoing work in improving mapKurator functionality on a wider variety of maps, including those in his collection. 
URL https://github.com/machines-reading-maps/map-kurator
 
Description R. Simon joins IN-ROME, an ERC project led by Barbara Borg at SNS Pisa 
Organisation Scuola Normale Superiore di Pisa
Country Italy 
Sector Academic/University 
PI Contribution Rainer Simon is on the teams of both Machines Reading Maps and IN-ROME. IN-ROME has access to Turing-based/MRM-hosted web application 'Recogito for Maps'.
Collaborator Contribution The IN-ROME project uses and supports the Machines Reading Maps-hosted annotation web application 'Recogito for Maps' developed and maintained by Rainer Simon (MRM team member).
Impact Continued use and development of historical map annotation features in the Recogito for maps web application developed by Rainer Simon and Machines Reading Maps.
Start Year 2022
 
Title Recogito for Machines Reading Maps 
Description Bespoke instance of the award-winning Recogito annotation platform customized for the map annotation tasks developed on Machines Reading Maps. 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact Transforming the possibility of using historical maps as primary source documents in the humanities and social sciences. This version of Recogito is developed to make machine learning-generated annotations accessible for editing by experts and the general public, as tested in Machines Reading Maps research use cases and a crowdsourcing collaboration with the National Library of Scotland. 
URL https://github.com/machines-reading-maps/mrm-recogito-ui
 
Description Annotation workshop. Europe data. (Valeria Vitale, Katherine McDonough, Rainer Simon) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop presented our new, integrated annotation platform for the first time to our data partners, as well as other experts in the field of digital cartography, spatial humanities and metadata management. Participants had the chance to test the annotation interface in real time, and to understand better the potential of the tools we were developing. There was a formal feedback form afterwards, and the opinions of the workshop's participants informed the subsequent refinement of the tools. Further, more public facing events were also planned with attendees of this workshop.
Year(s) Of Engagement Activity 2021
 
Description Annotation workshop. US data. (Valeria Vitale, Katherine McDonough, Zekun Li) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact This workshop presented our new, integrated annotation platform for the first time to our data partners, as well as other experts in the field of digital cartography, spatial humanities and metadata management. Participants had the chance to test the annotation interface in real time, and to understand better the potential of the tools we were developing. There was a formal feedback form afterwards, and the opinions of the workshop's participants informed the subsequent refinement of the tools.
Year(s) Of Engagement Activity 2021
 
Description Blog post about NLS Crowdsourcing collaboration 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Blog post announcing how Machines Reading Maps methodological developments are being used for a crowdsourcing campaign with our partner, the National Library of Scotland.
Year(s) Of Engagement Activity 2022
URL https://www.turing.ac.uk/blog/how-can-machine-learning-help-us-unlock-historical-maps
 
Description Blog post on creating synthetic map data for National Library of Scotland 
Form Of Engagement Activity Engagement focused website, blog or social media channel
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Collaboratively written blog post with NLS maps curator Chris Fleet showcases ongoing research in Machines Reading Maps using NLS digitized map collections. In the first week, more than 400 people viewed the blog post with an average of over 4 minutes of engagement time.
Year(s) Of Engagement Activity 2021
URL https://blog.nls.uk/maps-with-a-sense-of-the-past/
 
Description Demo for David Rumsey Map Centre (Katherine McDonough, Yao-Yi Chiang, Valeria Vitale) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Supporters
Results and Impact We presented the aims of the project, and gave a demo of the annotation platform for the David Rumsey map centre in San Francisco (CA). This demo was a crucial step into a new, funded collaboration with the centre, sponsored by David Rumsey himself.
Year(s) Of Engagement Activity 2022
 
Description Geo4Lib 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A video summarising the project's aims, methods and tools was part of the event dedicated to digital approaches to map collections organised by Stanford. The presentation was followed by questions, and was part of a larger debate.
Year(s) Of Engagement Activity 2022
URL https://geo4libcamp.github.io/
 
Description J. Kim, Y.Y. Chiang, Generating Geospatial Linked Data from Text Labels on Maps, University Consortium for Geographic Information Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation to US GIS community about using bespoke Recogito platform to annotate maps and select semantic types from schema.org.
Year(s) Of Engagement Activity 2022
 
Description K. McDonough, "Building Collaborations for Historical Research," Digitization and accessibility of cultural heritage collections DIGARV Seminar (Swedish Research Council), [virtual] 29 Oct. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited talk for Swedish Research Council-funded seminar on digitization and research using cultural heritage collections attended by approximately 50 people.
Year(s) Of Engagement Activity 2021
URL https://www.digarv.se/en/2021/09/research-infrastructures-in-heritage-institutions-zoom-workshop/
 
Description K. McDonough, Roundtable, "The Future of Spatial History" w/L. Scholz, J. Taylor, I. Gregory originally for Spatial Humanities 2021 but rescheduled as a Lancaster University DHangout, 13 Oct 2021 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Roundtable discussion about approaches to using computational methods to study spatial history attended by approx 30 researchers and postgraduate students in the digital humanities.
Year(s) Of Engagement Activity 2021
 
Description K. McDonough, Turing Catch-Up about Maps as Data research in Machines Reading Maps and Living with Machines 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Research update on Machines Reading Maps as part of presentation about working computationally with historical maps for the monthly, internal Turing catch up call in Jan 2022. Attended by 250 researchers from the Turing.
Year(s) Of Engagement Activity 2022
 
Description Linked Pasts Round Table (Katherine McDonough, Valeria Vitale) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The Round Table was part of the large Linked Pasts Symposium, and increased our project's visibility in the field of Linked Open GeoData for the study of the Past. The discussion included guests from different fields, cultural heritage and academic research, and explored some of the more methodological aspects of our work. The Round Table was an interesting and engaging way to connect with two of our data partners (National Library of Scotland and Library of Congress), and an opportunity to reflect on our outputs in a critical way. It was also organised in collaboration with the OS200 project.
Year(s) Of Engagement Activity 2021
URL https://www.eventbrite.co.uk/e/reading-and-linking-places-in-text-and-maps-tickets-219024988637#
 
Description Linked Pasts workshop (Valeria Vitale, Katherine McDonough, Rainer Simon) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact This workshop was organised in the framework of the large Linked Pasts symposium, and increased our project's visibility in the field of Linked Open GeoData for the study of the Past. It was organised in collaboration with another project, OS200, after they discovered our annotation platform during a previous workshop. Attendees for several countries and different backgrounds joined the workshop, and tried the hands-on annotation activities. The workshop strengthened our relationship with a relevant project, focussing on a related corpus of digital maps and documents, and tested the robustness of the platform with a larger audience, as well as the clarity of the supporting material that we had produced. The following discussion provided us with very useful user feedback, as well as an exploration of different research applications, for example in the field of linguistics.
Year(s) Of Engagement Activity 2021
URL https://www.eventbrite.co.uk/e/reading-and-linking-places-in-text-and-maps-tickets-219024988637#
 
Description Newsletter for the Advisory Board 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The newsletter provided our advisory board with up-to-date information about the status of the project, and the areas where we would have welcomed their input and feedback
Year(s) Of Engagement Activity 2021
URL https://github.com/machines-reading-maps/Tutorials-Newsletters/blob/main/Newsletter_2021_10.pdf
 
Description Presentation for Stanford Library staff (Y. Chiang and K. McDonough) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact We presented our project and our new collaboration with the David Rumsey Map Collection. We discussed the potential of the tools we are developing and how our data outcomes could be made interoperable with library standards. The presentation started a regular series of meeting to consolidate the grounds for collaboration.
Year(s) Of Engagement Activity 2022
 
Description Press release about new collaboration with David Rumsey Map Collection 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Announcement of the 12-month collaboration supported by a gift from David Rumsey to both the US and the UK sides of the Machines Reading Maps project. Stimulated intersest in our research methods and has led to further discussions about spin-off of Machines Reading Maps into open source community project in collaboration with Stanford Libraries.
Year(s) Of Engagement Activity 2022
URL https://www.turing.ac.uk/news/new-collaboration-promises-enrich-data-60000-historical-maps
 
Description TPS Coffee Chat (Katherine McDonough, Valeria Vitale) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact A brief and informal presentation about the project, followed by a debate on the topics of place-based annotation. A dozen people joined the conversation, and we were able to compare our research with the work of other colleagues.
Year(s) Of Engagement Activity 2022
 
Description UCGIS Webinar (Yao-Yi Chiang) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation of the project and its goals to an audience of professionals, to gather initial feedback and foster possible future collaborations
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=F45jgjbhoIY
 
Description V. Vitale & K. McDonough, DH2022 short presentation "Machines Reading Maps: front rext on maps to linked spatial data" 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation as part of DH2022 conference (Tokyo) attended by about 20 participants.
Year(s) Of Engagement Activity 2022
URL https://dh2022.dhii.asia/dh2022bookofabsts.pdf