Digital approaches to the capture and analysis of watermarks using the manuscripts of Isaac Newton as a test case

Lead Research Organisation: University of Cambridge
Department Name: History

Abstract

This project will investigate two research areas with general application in digital humanities scholarship, using the dispersed manuscript corpus of Isaac Newton as a test case. The immediate purpose of the test case will be to use artificial intelligence to assist with the identification and classification of watermarks in Newton material and, in the process, to build a general tool to assist with the organisation and dating of manuscripts. The project also has much wider significance. The project's first stage will be the methodological investigation of techniques for the production of images of watermarks which are suitable for automated analysis, using both new photography and the exploration of the potential latent in existing images. During the second stage, we will develop computer vision methods to systematically cluster and match the assembled corpus of watermark images across manuscripts and collections. Methods developed through this project will be transferrable to watermark collections beyond that of Newton's corpus, creating a methodology for scholars seeking to analyse, date, and organise historical collections via watermark matching, and for conservators seeking to establish standardised surveying and documentation methods while imaging and digitising watermarked documents. A final stage of the project will allow us to disseminate our findings through research workshops, web tools, and improvements to online databases, as well as traditional publications in journals.
Since the groundbreaking early twentieth-century research of Charles Moïse Briquet, watermarks have formed a central part in the dating of otherwise undated manuscripts. Briquet's monumental 1907 catalogue, Les filigranes, made it possible, in principle, to date (and to some extent localise) pre-1600 watermarks found by researchers in manuscripts by reference to exemplars in Briquet's catalogue. While this catalogue and others have been digitised thanks to the Bernstein consortium (https://memoryofpaper.eu/), advances in research and technology have revealed the limitations of the traditional approach, which requires time-consuming procedures and some degree of expertise for the identification of each single watermark. It is very difficult to find exact matches between watermarks in situ and those reproduced in any catalogue, first due to the limited comprehensiveness of the catalogues, and, second, because each individual watermark is produced in two "twin" versions, never perfectly identical, and suffers deformation over time as a result of repeated use in the paper manufacturing process. By developing and enhancing new approaches and techniques to improve the acquisition and analysis of watermarks, we hope to solve basic problems and thereby provide benefit to all who must rely upon paper documents for chronological evidence.
While computer vision has made significant progress in recent years thanks to machine learning and artificial intelligence, this project will build on cutting-edge work already undertaken by the Ecole Nationale des Chartes and its partners (notably the computer scientists at École des Ponts ParisTech) to investigate the problem of matching images, specifically of watermarks, across formats (photographs and tracings). In creating a corpus of images used to train and develop the open source software created by the Ecole des Chartes we will build on recent work by The National Archives (TNA) to use comparatively affordable equipment and techniques to produce images of watermarks that are highly suitable for machine analysis. The project will develop and apply both of these approaches in order to attempt to enhance the computer-vision software so that it may be able to unlock the latent information held in thousands of existing images shot in reflected light which institutions have already digitised and made accessible through IIIF.

Publications

10 25 50
publication icon
Voelkel, J. R. (2021) Chasing the Clues in Isaac Newton's Manuscripts in Distillations: Science History Institute

publication icon
Mandelbrote, S. (2022) The Newton Project in European Mathematical Society Magazine

 
Description 1. It will be possible to produce imaging guidelines for the digitisation of watermarks, and to specify imaging behaviour for a variety of levels of photographic equipment. Following these guidelines will enable the production of useful images for computer analysis. Work is ongoing to publish such guidelines (Klein, J.A., Maximo Rocha, P., Pawlikowski, M.M., "A Comparative Study of Photographic Techniques for Paper Watermark Recognition", for submission to Journal of the American Institute for Conservation).
2. It is currently possible to train a machine-learning visual recognition tool to group watermarks by essential features to a level of approximately 70% accuracy. Enhanced training on a larger data set should allow the development of a tool with significantly enhanced accuracy. We have built an environment for such training and will be trying to improve the accuracy of recognition and to hone the factors used for recognition. Work is ongoing using a consultant funded by the NEH section of this grant to improve the performance of this tool. The draft software will be posted to github and improved subsequently. A publication reporting on these findings is in progress (Johnson, W., Jones, H., Ilyas, C.M.A., Sethares, W. "Reading between the lines: computational approaches to paper in the manuscripts of Isaac Newton", for submission to Digital Scholarship in the Humanities).
3. Extensive imaging of paper used by Isaac Newton has been completed (at the University of Cambridge, the National Libraries, and the Huntington Library). These images have been made available in simple form in the Cambridge Digital Library gallery of images associated with the Newton Project, combined with transcriptions and associated metadata, or in the Chymistry of Isaac Newton pages hosted by the University of Indiana. The majority of the imaging has however been to provide data on which to train the visual recognition tool: its use is therefore ongoing, deploying supercomputing facilities at the University of Indiana. Collaboration with the Bodmer Library (Bodmer Lab) and the National Library of Israel has generated many new, lower quality images of watermarks. Taken together with the images prepared for artificial recognition, these images allow extensive manual scholarly work to group watermarks within Newton's archive. That work is currently ongoing and should provide an initial restructuring of the chronology of Newton's writings, pending the full development of the visual recognition tool. A database of the findings has been developed and is currently being tested at the University of Indiana.
4. Key collaborations have been established among repositories holding papers from Newton's archive on three continents and between scholars analysing the archive.
5. Redating of Newton manuscripts is ongoing but important findings have already been communicated (in presentations by Scott Mandelbrote and William R. Newman at the Huntington Library) relating to both theological and chymical papers. Mandelbrote, Newman, and Marc Adam Kolakowski intend to submit a paper (probably to Early Science and Medicine) on these results.
Exploitation Route We plan to publish imaging guidelines which will provide guidance, prepared by the Cambridge University Library and the National Archives, which will be available to other institutions and individuals to use. (Timeline: to August 2024) This will allow the development of best practice and the planned acquisition of appropriate equipment to library imaging services elsewhere.
We are not yet able to make our visual recognition tool public but hope to be able to do so, either through the University of Cambridge or the University of Indiana, by August 2024. Once public it will be able to upload and categorise new images supplied in an appropriate form by any user.
Information about the dates of Newton's archive derived from the research of the project is being incorporated into the publicly available databases of the Newton Project and the Chymistry of Isaac Newton. This will have implications for all future scholarship on Newton's writings.
A closed meeting reporting progress and discussing future progress towards outcomes with interested parties in Cambridge was held in late March 2023; a public presentation of more complete findings was made at the Huntington Library in January 2024. This was attended by academic professionals and also by professionals involved in the development of digital photographic and analysis hardware and software.
Sectors Digital/Communication/Information Technologies (including Software)

Education

Culture

Heritage

Museums and Collections

URL https://live-events.nli.org.il/events/newton-watermark-project-yahuda-collection?doculang=false
 
Description Findings have been used to develop best practice in the imaging services of the Cambridge University Library, the National Archives, and the Huntington Library, San Marino, California. They have influenced the choices made by the Bodmer Lab (a collaboration between the University of Geneva and the Bodmer Library, Coligny, Switzerland) in the digitisation, encoding for TEI, and study of the Bodmer Newton manuscript and been communicated also to the National Library of Israel (which together with the Cambridge University Library holds the UNESCO World Heritage listing for the papers of Isaac Newton). They have generated new entries in the Cambridge Digital Library, including collaborations with King's College, Cambridge, and The National Archives. They are being used to enhance devices used in the photographic processing of watermark images.
First Year Of Impact 2022
Sector Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections
Impact Types Cultural

 
Description Research and Collections Programme
Amount £3,800 (GBP)
Organisation University of Cambridge 
Department Cambridge University Library
Sector Academic/University
Country United Kingdom
Start 03/2022 
End 10/2022
 
Title King's College Collection on the Cambridge Digital Library 
Description Images and metadata for Alchemical and Theological Papers of Isaac Newton held at King's College (captured and presented as part of the watermark capture of the relevant manuscripts for this project) 
Type Of Material Database/Collection of data 
Year Produced 2024 
Provided To Others? Yes  
Impact Increased use of the digital library and the Newton Project transcriptions (see Newton Papers on Cambridge Digital Library section) 716 further images will be added during the completion of the NEH section of this project. 
URL https://cudl.lib.cam.ac.uk/collections/kingsrarebooksandmanuscripts/1
 
Title Newton Papers on the Cambridge Digital Library 
Description Augmentation of the resources in the Newton Papers section in the Cambridge Digital Library. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact 706 multispectral images 944 reflected light images Available as IIIF and as high-resolution zooming images through the platform 
URL https://cudl.lib.cam.ac.uk/view/MS-ADD-09597-00002-00018-MSI/1
 
Title Newton Watermark Database 
Description Database of manuscripts produced by Newton, arranged by Newton Project identifier, with information about watermarks provided from edited results of Alan E. Shapiro's manuscript listing of watermarks, augmented with digital watermark images provided by this project. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? No  
Impact Redating of particular manuscripts; the database will be made public in due course and integrated with the Cambridge Digital Library. 
URL https://alchemy.sitehost.iu.edu/newtondocs
 
Description Huntington Library 
Organisation Huntington Library
Country United States 
Sector Academic/University 
PI Contribution Provision of imaging protocols and advice; provision of database advice
Collaborator Contribution Provision of images; development of imaging protocols; database content
Impact None as yet
Start Year 2021
 
Description Indiana University 
Organisation Indiana University Bloomington
Country United States 
Sector Academic/University 
PI Contribution We have begun photographing images that will be used as a dataset for computer analysis using supercomputing facilities at Indiana. We have begun working together on a database structure.
Collaborator Contribution Provision of supercomputing facilities. Provision of descriptions of manuscripts to be imaged.
Impact None as yet
Start Year 2021
 
Description King's College, Cambridge 
Organisation University of Cambridge
Department King's College Cambridge
Country United Kingdom 
Sector Academic/University 
PI Contribution Selection of manuscript items for imaging and for future inclusion in the data set for computer analysis
Collaborator Contribution Provision of manuscript items, preparation of items for future photography
Impact None so far
Start Year 2021
 
Description National Library of Israel 
Organisation National Library of Israel
Country Israel 
Sector Public 
PI Contribution We delivered four one-hour talks as part of an online education programme hosted by the National Library of Israel. Talks were attended by 30-50 participants on each occasion.
Collaborator Contribution National Library of Israel staff (head of western special collections; project manager, education and culture) participated throughout the talks and provided the infrastructure for them. A follow-up visit to Cambridge was made by the head of western special collections.
Impact The outputs are the four talks available via the URL. The disciplines involved are history/ history of science, archive and collection management, and digital humanities.
Start Year 2022
 
Description The National Archives 
Organisation The National Archives
Country United Kingdom 
Sector Public 
PI Contribution Collaboration in production of imaging guidelines and selection of material to image; coding and identification of items for imaging.
Collaborator Contribution Production of draft imaging guidelines, technical assistance with establishing imaging equipment at the University Library; technical assistance with job search for post-doctoral appointment; selection and imaging of materials held by the National Archives.
Impact History, History of Science, Digital Humanities, Archive Sciences
Start Year 2021
 
Title Chain Line and Laid Line Analysis 
Description Software to develop work of Tamara Grossman in automatic analysis of chain lines and laid lines in watermark images 
Type Of Technology Software 
Year Produced 2024 
Impact On-going development of this technique in collaboration with researchers at Cornell University and University of Wisconsin-Madison 
URL https://github.com/Aqdus01/HiddenKnowledge
 
Title Image Matching and Clustering for Watermarks 
Description Image matching and clustering work developed by this grant, being tested and upgraded by consultant paid for by ongoing NEH grant at Indiana for release later in 2024 
Type Of Technology Software 
Year Produced 2024 
Impact Continuing refinement of AI matching of watermark images, ongoing research consultations with VISE group at University of Oxford and development team at Université de Paris Ponts Techniques 
 
Description Lectures on the Newton Watermark Project at the National Library of Israel 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Four public lectures (December 2022-March 2023) on aspects of the Watermarks project; follow-up visit to Cambridge by head of western special collections at the National Library of Israel.
Year(s) Of Engagement Activity 2022,2023
URL https://live-events.nli.org.il/events/newton-watermark-project-yahuda-collection?doculang
 
Description Presentation to Memory of Paper conference by Scott Mandelbrote and Marc Adam Kolakowski 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation of the project findings to members and associates of the Bernstein Project, the hosts of the standard online tools for watermark analysis and identification, at the annual meeting of the International Association of Paper Historians.
Year(s) Of Engagement Activity 2023
URL https://www.paperhistory.org/News/events/AbstractsVerona2023.pdf
 
Description Public lecture, Huntington Library 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Conference at the Huntington Library (19-20 January 2024), concluded with public lecture, also broadcast by Scott Mandelbrote and William R. Newman, and training session by Joel Klein
Year(s) Of Engagement Activity 2024
URL https://huntington.org/event/new-technologies-and-approaches-paper-and-ink-newtons-manuscripts