Digital approaches to the capture and analysis of watermarks using the manuscripts of Isaac Newton as a test case

Lead Research Organisation: University of Cambridge

Department Name: History

Abstract

This project will investigate two research areas with general application in digital humanities scholarship, using the dispersed manuscript corpus of Isaac Newton as a test case. The immediate purpose of the test case will be to use artificial intelligence to assist with the identification and classification of watermarks in Newton material and, in the process, to build a general tool to assist with the organisation and dating of manuscripts. The project also has much wider significance. The project's first stage will be the methodological investigation of techniques for the production of images of watermarks which are suitable for automated analysis, using both new photography and the exploration of the potential latent in existing images. During the second stage, we will develop computer vision methods to systematically cluster and match the assembled corpus of watermark images across manuscripts and collections. Methods developed through this project will be transferrable to watermark collections beyond that of Newton's corpus, creating a methodology for scholars seeking to analyse, date, and organise historical collections via watermark matching, and for conservators seeking to establish standardised surveying and documentation methods while imaging and digitising watermarked documents. A final stage of the project will allow us to disseminate our findings through research workshops, web tools, and improvements to online databases, as well as traditional publications in journals.
Since the groundbreaking early twentieth-century research of Charles Moïse Briquet, watermarks have formed a central part in the dating of otherwise undated manuscripts. Briquet's monumental 1907 catalogue, Les filigranes, made it possible, in principle, to date (and to some extent localise) pre-1600 watermarks found by researchers in manuscripts by reference to exemplars in Briquet's catalogue. While this catalogue and others have been digitised thanks to the Bernstein consortium (https://memoryofpaper.eu/), advances in research and technology have revealed the limitations of the traditional approach, which requires time-consuming procedures and some degree of expertise for the identification of each single watermark. It is very difficult to find exact matches between watermarks in situ and those reproduced in any catalogue, first due to the limited comprehensiveness of the catalogues, and, second, because each individual watermark is produced in two "twin" versions, never perfectly identical, and suffers deformation over time as a result of repeated use in the paper manufacturing process. By developing and enhancing new approaches and techniques to improve the acquisition and analysis of watermarks, we hope to solve basic problems and thereby provide benefit to all who must rely upon paper documents for chronological evidence.
While computer vision has made significant progress in recent years thanks to machine learning and artificial intelligence, this project will build on cutting-edge work already undertaken by the Ecole Nationale des Chartes and its partners (notably the computer scientists at École des Ponts ParisTech) to investigate the problem of matching images, specifically of watermarks, across formats (photographs and tracings). In creating a corpus of images used to train and develop the open source software created by the Ecole des Chartes we will build on recent work by The National Archives (TNA) to use comparatively affordable equipment and techniques to produce images of watermarks that are highly suitable for machine analysis. The project will develop and apply both of these approaches in order to attempt to enhance the computer-vision software so that it may be able to unlock the latent information held in thousands of existing images shot in reflected light which institutions have already digitised and made accessible through IIIF.

Funded Value:

£202,667

Funded Period:

Feb 21 - Jan 24

Funder:

FIC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

AH/V009486/1

Principal Investigator:

Scott Mandelbrote

Research Subject:

History (14%)

Info. & commun. Technol. (56%)

Library & information studies (28%)

Research Topic:

Archives (14%)

Artificial Intelligence (28%)

History of Sci./Med./Technol. (14%)

Image & Vision Computing (28%)

Information & Knowledge Mgmt (14%)

Organisations

People	ORCID iD
Scott Mandelbrote (Principal Investigator)
Joel Klein (Co-Investigator)
Huw Jones (Co-Investigator)
William Newman (Co-Investigator)
Ruth Selman (Co-Investigator)
James Voelkel (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Mandelbrote, S. (2022) The Newton Project in European Mathematical Society Magazine

Voelkel, J. R. (2021) Chasing the Clues in Isaac Newton's Manuscripts in Distillations: Science History Institute

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	1. It will be possible to produce imaging guidelines for the digitisation of watermarks, and to specify imaging behaviour for a variety of levels of photographic equipment. Following these guidelines will enable the production of useful images for computer analysis. Work is ongoing to publish such guidelines (Klein, J.A., Maximo Rocha, P., Pawlikowski, M.M., "A Comparative Study of Photographic Techniques for Paper Watermark Recognition", for submission to Journal of the American Institute for Conservation). 2. It is currently possible to train a machine-learning visual recognition tool to group watermarks by essential features to a level of approximately 70% accuracy. Enhanced training on a larger data set should allow the development of a tool with significantly enhanced accuracy. We have built an environment for such training and will be trying to improve the accuracy of recognition and to hone the factors used for recognition. Work is ongoing using a consultant funded by the NEH section of this grant to improve the performance of this tool. The draft software will be posted to github and improved subsequently. A publication reporting on these findings is in progress (Johnson, W., Jones, H., Ilyas, C.M.A., Sethares, W. "Reading between the lines: computational approaches to paper in the manuscripts of Isaac Newton", for submission to Digital Scholarship in the Humanities). 3. Extensive imaging of paper used by Isaac Newton has been completed (at the University of Cambridge, the National Libraries, and the Huntington Library). These images have been made available in simple form in the Cambridge Digital Library gallery of images associated with the Newton Project, combined with transcriptions and associated metadata, or in the Chymistry of Isaac Newton pages hosted by the University of Indiana. The majority of the imaging has however been to provide data on which to train the visual recognition tool: its use is therefore ongoing, deploying supercomputing facilities at the University of Indiana. Collaboration with the Bodmer Library (Bodmer Lab) and the National Library of Israel has generated many new, lower quality images of watermarks. Taken together with the images prepared for artificial recognition, these images allow extensive manual scholarly work to group watermarks within Newton's archive. That work is currently ongoing and should provide an initial restructuring of the chronology of Newton's writings, pending the full development of the visual recognition tool. A database of the findings has been developed and is currently being tested at the University of Indiana. 4. Key collaborations have been established among repositories holding papers from Newton's archive on three continents and between scholars analysing the archive. 5. Redating of Newton manuscripts is ongoing but important findings have already been communicated (in presentations by Scott Mandelbrote and William R. Newman at the Huntington Library) relating to both theological and chymical papers. Mandelbrote, Newman, and Marc Adam Kolakowski intend to submit a paper (probably to Early Science and Medicine) on these results.
Exploitation Route	We plan to publish imaging guidelines which will provide guidance, prepared by the Cambridge University Library and the National Archives, which will be available to other institutions and individuals to use. (Timeline: to August 2024) This will allow the development of best practice and the planned acquisition of appropriate equipment to library imaging services elsewhere. We are not yet able to make our visual recognition tool public but hope to be able to do so, either through the University of Cambridge or the University of Indiana, by August 2024. Once public it will be able to upload and categorise new images supplied in an appropriate form by any user. Information about the dates of Newton's archive derived from the research of the project is being incorporated into the publicly available databases of the Newton Project and the Chymistry of Isaac Newton. This will have implications for all future scholarship on Newton's writings. A closed meeting reporting progress and discussing future progress towards outcomes with interested parties in Cambridge was held in late March 2023; a public presentation of more complete findings was made at the Huntington Library in January 2024. This was attended by academic professionals and also by professionals involved in the development of digital photographic and analysis hardware and software.
Sectors	Digital/Communication/Information Technologies (including Software) Education Culture Heritage Museums and Collections
URL	https://live-events.nli.org.il/events/newton-watermark-project-yahuda-collection?doculang=false


Description	Findings have been used to develop best practice in the imaging services of the Cambridge University Library, the National Archives, and the Huntington Library, San Marino, California. They have influenced the choices made by the Bodmer Lab (a collaboration between the University of Geneva and the Bodmer Library, Coligny, Switzerland) in the digitisation, encoding for TEI, and study of the Bodmer Newton manuscript and been communicated also to the National Library of Israel (which together with the Cambridge University Library holds the UNESCO World Heritage listing for the papers of Isaac Newton). They have generated new entries in the Cambridge Digital Library, including collaborations with King's College, Cambridge, and The National Archives. They are being used to enhance devices used in the photographic processing of watermark images.
First Year Of Impact	2022
Sector	Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections
Impact Types	Cultural


Description	Research and Collections Programme
Amount	£3,800 (GBP)
Organisation	University of Cambridge
Department	Cambridge University Library
Sector	Academic/University
Country	United Kingdom
Start	03/2022
End	10/2022


Title	King's College Collection on the Cambridge Digital Library
Description	Images and metadata for Alchemical and Theological Papers of Isaac Newton held at King's College (captured and presented as part of the watermark capture of the relevant manuscripts for this project)
Type Of Material	Database/Collection of data
Year Produced	2024
Provided To Others?	Yes
Impact	Increased use of the digital library and the Newton Project transcriptions (see Newton Papers on Cambridge Digital Library section) 716 further images will be added during the completion of the NEH section of this project.
URL	https://cudl.lib.cam.ac.uk/collections/kingsrarebooksandmanuscripts/1


Title	Newton Papers on the Cambridge Digital Library
Description	Augmentation of the resources in the Newton Papers section in the Cambridge Digital Library.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	706 multispectral images 944 reflected light images Available as IIIF and as high-resolution zooming images through the platform
URL	https://cudl.lib.cam.ac.uk/view/MS-ADD-09597-00002-00018-MSI/1


Title	Newton Watermark Database
Description	Database of manuscripts produced by Newton, arranged by Newton Project identifier, with information about watermarks provided from edited results of Alan E. Shapiro's manuscript listing of watermarks, augmented with digital watermark images provided by this project.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	No
Impact	Redating of particular manuscripts; the database will be made public in due course and integrated with the Cambridge Digital Library.
URL	https://alchemy.sitehost.iu.edu/newtondocs


Description	Huntington Library
Organisation	Huntington Library
Country	United States
Sector	Academic/University
PI Contribution	Provision of imaging protocols and advice; provision of database advice
Collaborator Contribution	Provision of images; development of imaging protocols; database content
Impact	None as yet
Start Year	2021


Description	Indiana University
Organisation	Indiana University Bloomington
Country	United States
Sector	Academic/University
PI Contribution	We have begun photographing images that will be used as a dataset for computer analysis using supercomputing facilities at Indiana. We have begun working together on a database structure.
Collaborator Contribution	Provision of supercomputing facilities. Provision of descriptions of manuscripts to be imaged.
Impact	None as yet
Start Year	2021


Description	King's College, Cambridge
Organisation	University of Cambridge
Department	King's College Cambridge
Country	United Kingdom
Sector	Academic/University
PI Contribution	Selection of manuscript items for imaging and for future inclusion in the data set for computer analysis
Collaborator Contribution	Provision of manuscript items, preparation of items for future photography
Impact	None so far
Start Year	2021


Description	National Library of Israel
Organisation	National Library of Israel
Country	Israel
Sector	Public
PI Contribution	We delivered four one-hour talks as part of an online education programme hosted by the National Library of Israel. Talks were attended by 30-50 participants on each occasion.
Collaborator Contribution	National Library of Israel staff (head of western special collections; project manager, education and culture) participated throughout the talks and provided the infrastructure for them. A follow-up visit to Cambridge was made by the head of western special collections.
Impact	The outputs are the four talks available via the URL. The disciplines involved are history/ history of science, archive and collection management, and digital humanities.
Start Year	2022


Description	The National Archives
Organisation	The National Archives
Country	United Kingdom
Sector	Public
PI Contribution	Collaboration in production of imaging guidelines and selection of material to image; coding and identification of items for imaging.
Collaborator Contribution	Production of draft imaging guidelines, technical assistance with establishing imaging equipment at the University Library; technical assistance with job search for post-doctoral appointment; selection and imaging of materials held by the National Archives.
Impact	History, History of Science, Digital Humanities, Archive Sciences
Start Year	2021


Title	Chain Line and Laid Line Analysis
Description	Software to develop work of Tamara Grossman in automatic analysis of chain lines and laid lines in watermark images
Type Of Technology	Software
Year Produced	2024
Impact	On-going development of this technique in collaboration with researchers at Cornell University and University of Wisconsin-Madison
URL	https://github.com/Aqdus01/HiddenKnowledge


Title	Image Matching and Clustering for Watermarks
Description	Image matching and clustering work developed by this grant, being tested and upgraded by consultant paid for by ongoing NEH grant at Indiana for release later in 2024
Type Of Technology	Software
Year Produced	2024
Impact	Continuing refinement of AI matching of watermark images, ongoing research consultations with VISE group at University of Oxford and development team at Université de Paris Ponts Techniques


Description	Lectures on the Newton Watermark Project at the National Library of Israel
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Four public lectures (December 2022-March 2023) on aspects of the Watermarks project; follow-up visit to Cambridge by head of western special collections at the National Library of Israel.
Year(s) Of Engagement Activity	2022,2023
URL	https://live-events.nli.org.il/events/newton-watermark-project-yahuda-collection?doculang


Description	Presentation to Memory of Paper conference by Scott Mandelbrote and Marc Adam Kolakowski
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation of the project findings to members and associates of the Bernstein Project, the hosts of the standard online tools for watermark analysis and identification, at the annual meeting of the International Association of Paper Historians.
Year(s) Of Engagement Activity	2023
URL	https://www.paperhistory.org/News/events/AbstractsVerona2023.pdf


Description	Public lecture, Huntington Library
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Conference at the Huntington Library (19-20 January 2024), concluded with public lecture, also broadcast by Scott Mandelbrote and William R. Newman, and training session by Joel Klein
Year(s) Of Engagement Activity	2024
URL	https://huntington.org/event/new-technologies-and-approaches-paper-and-ink-newtons-manuscripts

Abstract

Organisations

People

ORCID iD

Publications