Digital approaches to the capture and analysis of watermarks using the manuscripts of Isaac Newton as a test case
Lead Research Organisation:
University of Cambridge
Department Name: History
Abstract
This project will investigate two research areas with general application in digital humanities scholarship, using the dispersed manuscript corpus of Isaac Newton as a test case. The immediate purpose of the test case will be to use artificial intelligence to assist with the identification and classification of watermarks in Newton material and, in the process, to build a general tool to assist with the organisation and dating of manuscripts. The project also has much wider significance. The project's first stage will be the methodological investigation of techniques for the production of images of watermarks which are suitable for automated analysis, using both new photography and the exploration of the potential latent in existing images. During the second stage, we will develop computer vision methods to systematically cluster and match the assembled corpus of watermark images across manuscripts and collections. Methods developed through this project will be transferrable to watermark collections beyond that of Newton's corpus, creating a methodology for scholars seeking to analyse, date, and organise historical collections via watermark matching, and for conservators seeking to establish standardised surveying and documentation methods while imaging and digitising watermarked documents. A final stage of the project will allow us to disseminate our findings through research workshops, web tools, and improvements to online databases, as well as traditional publications in journals.
Since the groundbreaking early twentieth-century research of Charles Moïse Briquet, watermarks have formed a central part in the dating of otherwise undated manuscripts. Briquet's monumental 1907 catalogue, Les filigranes, made it possible, in principle, to date (and to some extent localise) pre-1600 watermarks found by researchers in manuscripts by reference to exemplars in Briquet's catalogue. While this catalogue and others have been digitised thanks to the Bernstein consortium (https://memoryofpaper.eu/), advances in research and technology have revealed the limitations of the traditional approach, which requires time-consuming procedures and some degree of expertise for the identification of each single watermark. It is very difficult to find exact matches between watermarks in situ and those reproduced in any catalogue, first due to the limited comprehensiveness of the catalogues, and, second, because each individual watermark is produced in two "twin" versions, never perfectly identical, and suffers deformation over time as a result of repeated use in the paper manufacturing process. By developing and enhancing new approaches and techniques to improve the acquisition and analysis of watermarks, we hope to solve basic problems and thereby provide benefit to all who must rely upon paper documents for chronological evidence.
While computer vision has made significant progress in recent years thanks to machine learning and artificial intelligence, this project will build on cutting-edge work already undertaken by the Ecole Nationale des Chartes and its partners (notably the computer scientists at École des Ponts ParisTech) to investigate the problem of matching images, specifically of watermarks, across formats (photographs and tracings). In creating a corpus of images used to train and develop the open source software created by the Ecole des Chartes we will build on recent work by The National Archives (TNA) to use comparatively affordable equipment and techniques to produce images of watermarks that are highly suitable for machine analysis. The project will develop and apply both of these approaches in order to attempt to enhance the computer-vision software so that it may be able to unlock the latent information held in thousands of existing images shot in reflected light which institutions have already digitised and made accessible through IIIF.
Since the groundbreaking early twentieth-century research of Charles Moïse Briquet, watermarks have formed a central part in the dating of otherwise undated manuscripts. Briquet's monumental 1907 catalogue, Les filigranes, made it possible, in principle, to date (and to some extent localise) pre-1600 watermarks found by researchers in manuscripts by reference to exemplars in Briquet's catalogue. While this catalogue and others have been digitised thanks to the Bernstein consortium (https://memoryofpaper.eu/), advances in research and technology have revealed the limitations of the traditional approach, which requires time-consuming procedures and some degree of expertise for the identification of each single watermark. It is very difficult to find exact matches between watermarks in situ and those reproduced in any catalogue, first due to the limited comprehensiveness of the catalogues, and, second, because each individual watermark is produced in two "twin" versions, never perfectly identical, and suffers deformation over time as a result of repeated use in the paper manufacturing process. By developing and enhancing new approaches and techniques to improve the acquisition and analysis of watermarks, we hope to solve basic problems and thereby provide benefit to all who must rely upon paper documents for chronological evidence.
While computer vision has made significant progress in recent years thanks to machine learning and artificial intelligence, this project will build on cutting-edge work already undertaken by the Ecole Nationale des Chartes and its partners (notably the computer scientists at École des Ponts ParisTech) to investigate the problem of matching images, specifically of watermarks, across formats (photographs and tracings). In creating a corpus of images used to train and develop the open source software created by the Ecole des Chartes we will build on recent work by The National Archives (TNA) to use comparatively affordable equipment and techniques to produce images of watermarks that are highly suitable for machine analysis. The project will develop and apply both of these approaches in order to attempt to enhance the computer-vision software so that it may be able to unlock the latent information held in thousands of existing images shot in reflected light which institutions have already digitised and made accessible through IIIF.
Publications
Voelkel, J. R.
(2021)
Chasing the Clues in Isaac Newton's Manuscripts
in Distillations: Science History Institute
Mandelbrote, S.
(2022)
The Newton Project
in European Mathematical Society Magazine
Description | The intention is to extend the award period by a year because of delays caused in the start of the research and imaging by Covid. Nevertheless some key findings are already reportable: 1. It will be possible to produce imaging guidelines for the digitisation of watermarks, and to specify imaging behaviour for a variety of levels of photographic equipment. Following these guidelines will enable the production of useful images for computer analysis. Work is ongoing to produce such guidelines. 2. It is currently possible to train a machine-learning visual recognition tool to group watermarks by essential features to a level of approximately 70% accuracy. Enhanced training on a larger data set should allow the development of a tool with significantly enhanced accuracy. We have built an environment for such training and will be trying to improve the accuracy of recognition and to hone the factors used for recognition. This may require additional funding or collaborations within the extended term of the award. 3. Extensive imaging of paper used by Isaac Newton has been completed (at the University of Cambridge, the National Libraries, and the Huntington Library). These images will be made available in simple form in the Cambridge Digital Library gallery of images associated with the Newton Project, combined with transcriptions and associated metadata, or in the Chymistry of Isaac Newton pages hosted by the University of Indiana. A first batch should be online in late March. The majority of the imaging has however been to provide data on which to train the visual recognition tool: its use is therefore ongoing, deploying supercomputing facilities at the University of Indiana. Collaboration with the Bodmer Library (Bodmer Lab) and the National Library of Israel has generated many new, lower quality images of watermarks. Taken together with the images prepared for artificial recognition, these images allow extensive manual scholarly work to group watermarks within Newton's archive. That work is currently ongoing and should provide an initial restructuring of the chronology of Newton's writings, pending the full development of the visual recognition tool. 4. Key collaborations have been established among repositories holding papers from Newton's archive on three continents and between scholars analysing the archive. |
Exploitation Route | We plan to publish imaging guidelines which will provide guidance, prepared by the Cambridge University Library and the National Archives, which will be available to other institutions and individuals to use. (Timeline: to August 2024) This will allow the development of best practice and the planned acquisition of appropriate equipment to library imaging services elsewhere. We are not yet able to make our visual recognition tool public but hope to be able to do so, either through the University of Cambridge or the University of Indiana, in due course. Once public it will be able to upload and categorise new images supplied in an appropriate form by any user. It should be possible to incorporate information about the dates of Newton's archive derived from the research of the project into the publicly available databases of the Newton Project and the Chymistry of Isaac Newton. This will have implications for all future scholarship on Newton's writings. We will have held a closed meeting reporting progress and discussing future progress towards outcomes with interested parties in Cambridge in late March 2023; extension of the award to August 2024 will allow us to make a public presentation of more complete findings at the Huntington Library in January 2024. |
Sectors | Education,Culture, Heritage, Museums and Collections |
URL | https://live-events.nli.org.il/events/newton-watermark-project-yahuda-collection?doculang=false |
Description | Findings have been used to develop best practice in the imaging services of the Cambridge University Library, the National Archives, and the Huntington Library, San Marino, California. They have influenced the choices made by the Bodmer Lab (a collaboration between the University of Geneva and the Bodmer Library, Coligny, Switzerland) in the digitisation, encoding for TEI, and study of the Bodmer Newton manuscript and been communicated also to the National Library of Israel (which together with the Cambridge University Library holds the UNESCO World Heritage listing for the papers of Isaac Newton). |
First Year Of Impact | 2022 |
Sector | Digital/Communication/Information Technologies (including Software),Culture, Heritage, Museums and Collections |
Impact Types | Cultural |
Description | Research and Collections Programme |
Amount | £3,800 (GBP) |
Organisation | University of Cambridge |
Department | Cambridge University Library |
Sector | Academic/University |
Country | United Kingdom |
Start | 03/2022 |
End | 10/2022 |
Description | Huntington Library |
Organisation | Huntington Library |
Country | United States |
Sector | Academic/University |
PI Contribution | Provision of imaging protocols and advice; provision of database advice |
Collaborator Contribution | Provision of images; development of imaging protocols; database content |
Impact | None as yet |
Start Year | 2021 |
Description | Indiana University |
Organisation | Indiana University Bloomington |
Country | United States |
Sector | Academic/University |
PI Contribution | We have begun photographing images that will be used as a dataset for computer analysis using supercomputing facilities at Indiana. We have begun working together on a database structure. |
Collaborator Contribution | Provision of supercomputing facilities. Provision of descriptions of manuscripts to be imaged. |
Impact | None as yet |
Start Year | 2021 |
Description | King's College, Cambridge |
Organisation | University of Cambridge |
Department | King's College Cambridge |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Selection of manuscript items for imaging and for future inclusion in the data set for computer analysis |
Collaborator Contribution | Provision of manuscript items, preparation of items for future photography |
Impact | None so far |
Start Year | 2021 |
Description | National Library of Israel |
Organisation | National Library of Israel |
Country | Israel |
Sector | Public |
PI Contribution | We delivered four one-hour talks as part of an online education programme hosted by the National Library of Israel. Talks were attended by 30-50 participants on each occasion. |
Collaborator Contribution | National Library of Israel staff (head of western special collections; project manager, education and culture) participated throughout the talks and provided the infrastructure for them. A follow-up visit to Cambridge was made by the head of western special collections. |
Impact | The outputs are the four talks available via the URL. The disciplines involved are history/ history of science, archive and collection management, and digital humanities. |
Start Year | 2022 |
Description | The National Archives |
Organisation | The National Archives |
Country | United Kingdom |
Sector | Public |
PI Contribution | Collaboration in production of imaging guidelines and selection of material to image; coding and identification of items for imaging. |
Collaborator Contribution | Production of draft imaging guidelines, technical assistance with establishing imaging equipment at the University Library; technical assistance with job search for post-doctoral appointment; selection and imaging of materials held by the National Archives. |
Impact | History, History of Science, Digital Humanities, Archive Sciences |
Start Year | 2021 |
Description | Lectures on the Newton Watermark Project at the National Library of Israel |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Four public lectures (December 2022-March 2023) on aspects of the Watermarks project; follow-up visit to Cambridge by head of western special collections at the National Library of Israel. |
Year(s) Of Engagement Activity | 2022,2023 |
URL | https://live-events.nli.org.il/events/newton-watermark-project-yahuda-collection?doculang |