The Bentham Papers Transcription Initiative
Lead Research Organisation:
UNIVERSITY COLLEGE LONDON
Department Name: Bentham Project
Abstract
The Bentham Papers deposited in UCL Library consist of 60,000 folios. This material has never been properly edited, most of it has never been published, and two-thirds of it remains un-transcribed. Much of its content is, therefore, unknown. The Bentham Project was established in 1959 in order to produce an authoritative edition of 'The Collected Works of Jeremy Bentham', and has to date published twenty-six volumes of a projected sixty-eight. In 2006 the Project completed an on-line database catalogue of the Bentham Papers, consisting of up to sixteen fields of information for each of the folios, including headings, dates, pagination, and titles. The database is currently used as an editing tool by the Bentham Project, allowing it, for instance, to identify all the manuscripts relevant to a particular work in a moment, rather than after several weeks of manual searching.
Part of the vision at the time of creating the database was to enhance it by linking it to transcripts and digital images of the manuscripts. The Bentham Project has transcribed around 20,000 folios, but 40,000 remain to be done. Transcription is the first and fundamental stage in the editing of Bentham's works. A pilot project has established that an interface can be created linking transcripts, digital images, and the database catalogue, and that the digital images are of such quality that they can used for the purposes of transcription. The object of the present project is to develop a web-site which will integrate these various elements in a coherent way, but which will also implement a 'crowd-sourcing' exercise, whereby members of the public will be invited to submit their own transcripts of previously unread manuscripts.
The transcription project will have a limited duration of six months. We will provide around 12,500 images of Bentham's manuscripts, amounting to around 10,000 folios. Individuals will be able to download a transcription tool, take ownership for a limited duration of images of manuscripts, and enter the text into a transcription window. There will be a series of basic rules that they will be asked to follow. Transcripts will be submitted to the Bentham Project for moderating, and once approved, made available on-line. Transcribers will be awarded a merit mark for each successful submission, both as a virtual reward, and as a means of identifying the truly dedicated, who might then be involved in some more formal way with the resource. We will establish a users forum and a means of undertaking joint transcription, so that two or more people can co-operate in producing transcripts.
The transcriptions will eventually be used by the professional researchers at the Bentham Project when preparing the material for the new critical edition. They will also form part of an 'ideas bank'. Bentham wrote on a wide variety of subjects. His ideas were of enormous historical importance, but are also of great contemporary relevance. The transcripts will be readily searchable, so that researchers, whether academics or members of the general public, who are interested in a particular subject, can discover what Bentham thought about that subject. The Bentham Project's existing transcripts, which are currently in a proprietary format, and the newly submitted transcripts, will be encoded according to the protocol of the Text Encoding Initiative, thus guaranteeing the sustainability and transferability of the resource.
At the end of the project, a study will be produced, using both quantitative and qualitative data, on the way in which the database was used, and on the lessons to be drawn from it. A generic transcription tool will be made available for other humanities digital research projects to incorporate into their own web-sites.
Part of the vision at the time of creating the database was to enhance it by linking it to transcripts and digital images of the manuscripts. The Bentham Project has transcribed around 20,000 folios, but 40,000 remain to be done. Transcription is the first and fundamental stage in the editing of Bentham's works. A pilot project has established that an interface can be created linking transcripts, digital images, and the database catalogue, and that the digital images are of such quality that they can used for the purposes of transcription. The object of the present project is to develop a web-site which will integrate these various elements in a coherent way, but which will also implement a 'crowd-sourcing' exercise, whereby members of the public will be invited to submit their own transcripts of previously unread manuscripts.
The transcription project will have a limited duration of six months. We will provide around 12,500 images of Bentham's manuscripts, amounting to around 10,000 folios. Individuals will be able to download a transcription tool, take ownership for a limited duration of images of manuscripts, and enter the text into a transcription window. There will be a series of basic rules that they will be asked to follow. Transcripts will be submitted to the Bentham Project for moderating, and once approved, made available on-line. Transcribers will be awarded a merit mark for each successful submission, both as a virtual reward, and as a means of identifying the truly dedicated, who might then be involved in some more formal way with the resource. We will establish a users forum and a means of undertaking joint transcription, so that two or more people can co-operate in producing transcripts.
The transcriptions will eventually be used by the professional researchers at the Bentham Project when preparing the material for the new critical edition. They will also form part of an 'ideas bank'. Bentham wrote on a wide variety of subjects. His ideas were of enormous historical importance, but are also of great contemporary relevance. The transcripts will be readily searchable, so that researchers, whether academics or members of the general public, who are interested in a particular subject, can discover what Bentham thought about that subject. The Bentham Project's existing transcripts, which are currently in a proprietary format, and the newly submitted transcripts, will be encoded according to the protocol of the Text Encoding Initiative, thus guaranteeing the sustainability and transferability of the resource.
At the end of the project, a study will be produced, using both quantitative and qualitative data, on the way in which the database was used, and on the lessons to be drawn from it. A generic transcription tool will be made available for other humanities digital research projects to incorporate into their own web-sites.
Planned Impact
One of the key elements of this proposal is the interactive transcription tool. The very point is to engage the general public in the process of transcribing manuscripts, and hence stimulate greater interest in the life and thought of Jeremy Bentham. There is already significant passive interest in Bentham - many people, for instance, have heard about Bentham's auto-icon (it was mentioned, for instance, in a 'Guardian' editorial 'In praise of University College London' on 10 October 2009, and features as a main attraction in the Ripley's Believe It Or Not Museum at Piccadilly Circus), and many are aware of the panopticon prison scheme. Bentham is taught on Religious Studies, Philosophy, and History courses in sixth-forms throughout the UK (we occasionally give lectures to school groups and we recently hosted two sixth-form work placement students). Our experience with visitors to the Bentham Project is that they are fascinated by Bentham's manuscripts, and are often keen to 'have a go' at reading them. We have received strong endorsement from a School's Extended Services Manager, responsible for 27 schools in Bedfordshire, and from a humanities teacher in a sixth-form college in Wigan, Greater Manchester, who believe the resource will be attractive both to sixth-form teachers and students alike, as well as to Gifted and Talented children lower down the school.
We do, therefore, have a strong expectation that there will be a significantly large pool of people who will be interested in transcribing Bentham manuscripts. The transcription tool will be kept straightforward, instructions will be clear, and the digital images of high quality, and all will be presented in an attractive and intuitive interface. We will set up a users forum, and permit joint transcription.Transcripts will be submitted to the Bentham Project, where they will be moderated, and the transcriber will receive a merit mark for each acceptable transcript (using reasonably generous criteria as to what is acceptable).
We intend that the resource should constitute an 'ideas bank', and should be used by a wide variety of persons who are interested in a wide variety of subject areas. Ideas do not suddenly appear in the mind from nowhere. One of our main sources of ideas are the great thinkers of the past, themselves drawing on traditions of thought, and sometimes inventing new ones of their own. By drawing on the wealth of ideas which the great thinkers of the past have bequeathed us, we can clarify our own thoughts, and often be pointed towards issues which had not even occurred to us. Bentham is particularly valuable in this respect, since his influence on our society has been profound, and he still has much to teach. We have a massive archive of his papers which has only been partially explored, and much of what has been explored remains in relatively inaccessible transcripts in the Bentham Project archive. Hence a key element of this proposal is to render the transcripts of the Bentham Papers searchable in a way that will be helpful to those who are in search of ideas.
The main target groups for the 'ideas bank' are policy makers (for instance think tanks) and the media. We are often asked, 'What would Bentham have said about ....?' The best response is, 'Well, here is what he did say about it.' Hence, when issues surrounding surveillance are debated, policy makers, for instance, might look to see what Bentham said about the panopticon prison; when issues surrounding openness in government, or corruption in public life, are being debated, they might look to Bentham's writings on representative democracy; and when issues concerning legal reform are being debated, they might look to Bentham's writings on codification.
The two elements of impact described here are intimately linked, in that the greater the amount of transcription undertaken, the more in
We do, therefore, have a strong expectation that there will be a significantly large pool of people who will be interested in transcribing Bentham manuscripts. The transcription tool will be kept straightforward, instructions will be clear, and the digital images of high quality, and all will be presented in an attractive and intuitive interface. We will set up a users forum, and permit joint transcription.Transcripts will be submitted to the Bentham Project, where they will be moderated, and the transcriber will receive a merit mark for each acceptable transcript (using reasonably generous criteria as to what is acceptable).
We intend that the resource should constitute an 'ideas bank', and should be used by a wide variety of persons who are interested in a wide variety of subject areas. Ideas do not suddenly appear in the mind from nowhere. One of our main sources of ideas are the great thinkers of the past, themselves drawing on traditions of thought, and sometimes inventing new ones of their own. By drawing on the wealth of ideas which the great thinkers of the past have bequeathed us, we can clarify our own thoughts, and often be pointed towards issues which had not even occurred to us. Bentham is particularly valuable in this respect, since his influence on our society has been profound, and he still has much to teach. We have a massive archive of his papers which has only been partially explored, and much of what has been explored remains in relatively inaccessible transcripts in the Bentham Project archive. Hence a key element of this proposal is to render the transcripts of the Bentham Papers searchable in a way that will be helpful to those who are in search of ideas.
The main target groups for the 'ideas bank' are policy makers (for instance think tanks) and the media. We are often asked, 'What would Bentham have said about ....?' The best response is, 'Well, here is what he did say about it.' Hence, when issues surrounding surveillance are debated, policy makers, for instance, might look to see what Bentham said about the panopticon prison; when issues surrounding openness in government, or corruption in public life, are being debated, they might look to Bentham's writings on representative democracy; and when issues concerning legal reform are being debated, they might look to Bentham's writings on codification.
The two elements of impact described here are intimately linked, in that the greater the amount of transcription undertaken, the more in
Organisations
- UNIVERSITY COLLEGE LONDON (Lead Research Organisation)
- Polytechnic University of Valencia (Collaboration)
- Naver Labs Europe (Collaboration)
- University of Innsbruck (Collaboration)
- Vienna University of Technology (Collaboration)
- University of London (Collaboration)
- National Centre for Scientific Research (NCSR) Demokritos (Collaboration)
- Direction de la Sécurité et de la Justice (Collaboration)
- UNIVERSITY OF EDINBURGH (Collaboration)
- Swiss Federal Institute of Technology in Lausanne (EPFL) (Collaboration)
- Democritus University of Thrace (Collaboration)
- University of Leipzig (Collaboration)
- University of Rostock (Collaboration)
- Xerox Corporation (Collaboration)
People |
ORCID iD |
Thomas Schofield (Principal Investigator) |
Publications

Causer T
(2014)
Crowdsourcing Bentham: Beyond the Traditional Boundaries of Academic History
in International Journal of Humanities and Arts Computing

Causer T
(2014)
Crowdsourcing Our Cultural Heritage

Causer T
(2018)
'Making such bargain': Transcribe Bentham and the quality and cost-effectiveness of crowdsourced transcription1
in Digital Scholarship in the Humanities

Causer T
(2012)
Building a Volunteer Community: Results and Findings from 'Transcribe Bentham'
in Digital Humanities Quarterly

Causer T
(2012)
Transcription maximized; expense minimized? Crowdsourcing and editing The Collected Works of Jeremy Bentham*
in Literary and Linguistic Computing

Gatos B
(2014)
Ground-Truth Production in the Transcriptorium Project

Moyle M
(2011)
Manuscript Transcription by Crowdsourcing: Transcribe Bentham
in LIBER Quarterly: The Journal of the Association of European Research Libraries

Prats Lopez M
(2015)
Extra-Organizational Learning: Learning Beyond Organizational Boundaries

Schofield P
(2015)
Jeremy Bentham and the computer age: reflections on crowdsourcing the transcription of handwritten documents
in Annual Bulletin of Resources and Historical Collections Office (shiryo-shitsu), The Library of Economics, The University of Tokyo

Tieberghien E
(2016)
Mapping the Bentham Corpus
Title | A film by UCL Media Services about Transcribe Bentham, & how it fits into the editorial work of the Bentham project. |
Type Of Art | Film/Video/Animation |
Description | (READ) - Recognition and Enrichment of Archival Documents |
Amount | € 8,220,716 (EUR) |
Funding ID | 674943 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2016 |
End | 06/2019 |
Description | (tranScriptorium) - tranScriptorium |
Amount | € 3,005,570 (EUR) |
Funding ID | 600707 |
Organisation | European Commission |
Sector | Public |
Country | European Union (EU) |
Start | 01/2013 |
End | 12/2015 |
Description | A Collaborative Project Between Faculty and Students at the University of Toronto and UCL Using Handwritten Text Recognition Technology and Topic Modelling |
Amount | £16,668 (GBP) |
Organisation | University College London |
Sector | Academic/University |
Country | United Kingdom |
Start | 07/2019 |
End | 03/2021 |
Description | The Consolidated Bentham Papers Repository |
Amount | £339,000 (GBP) |
Organisation | Andrew W. Mellon Foundation |
Sector | Private |
Country | United States |
Start | 09/2012 |
End | 09/2014 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | Democritus University of Thrace |
Country | Greece |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | Direction de la Sécurité et de la Justice |
Country | Switzerland |
Sector | Public |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | NAVER LABS Europe |
Country | France |
Sector | Public |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | National Centre for Scientific Research (NCSR) Demokritos |
Country | Greece |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | Polytechnic University of Valencia |
Country | Spain |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | Swiss Federal Institute of Technology in Lausanne (EPFL) |
Country | Switzerland |
Sector | Public |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | University of Edinburgh |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | University of Innsbruck |
Country | Austria |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | University of Leipzig |
Country | Germany |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | University of London |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | University of Rostock |
Country | Germany |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | Vienna University of Technology |
Country | Austria |
Sector | Academic/University |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | Retrieval and Enrichment of Archival Documents (READ) |
Organisation | Xerox Corporation |
Department | Xerox Research Centre Europe - XRCE |
Country | France |
Sector | Private |
PI Contribution | The Bentham Project has made a key contribution to the development and adoption of Handwritten Text Recognition (HTR) and other technologies which are transforming, in a way that would have been barely imaginable a decade ago, how the public and researchers access holdings of archival collections. This is being achieved through the accurate transcription by computers of historical documents written in a variety of languages and scripts. Critical to developing HTR, and the associated technologies incorporated into Transkribus, were digital images and transcripts of the Bentham Papers. Together, this material provided a central test case for computer scientists to ensure the technology was sufficiently robust to contend with a variety of problems. • The Bentham Papers contain numerous features that an effective HTR platform needs to solve, such as difficult handwriting, pages written in different hands (e.g. copyists and correspondents), headings, marginalia, faint pencil writing, skewed writing, crossings-out, interlineations, and occasional use of Latin, Greek, and French. As computer scientist Professor Enrique Vidal noted that if HTR could deal with the Bentham Papers, then it could deal with almost anything else. • Transcripts produced by the Bentham Project were used to generate 'ground truth'-that is, a precise transcript by which to train HTR models to read eighteenth and nineteenth century English handwriting. Early experiments resulted in a model with a Character Error Rate (CER) of around 18%, that is 82% of the characters on a fairly straightforward Bentham manuscript were correctly recognised. By the end of the READ programme subsequent experiments produced models capable of achieving a CER of well below 5% on straightforward Bentham manuscripts, and a CER of 9% on the most complex. • The Bentham ground truth was made available to computer scientists for use in research competitions linked to the 2014 International Conference on Frontiers of Handwriting Recognition and the 2015 International Conference on Document Analysis and Recognition. • The HTR models produced using Bentham ground truth are freely available in Transkribus as off-the-shelf English language models for others to use and re-use. • The Bentham Project worked with colleagues at the Universitat Politècnica de València to produce the 'Bentham Papers Indexing and Search' engine. Based on pattern recognition, the engine allows the user to search around 100,000 pages of the 'iconic' Bentham manuscripts collection without the need for transcription-a proof-of-concept of a potentially transformative technology for further widening access to historic manuscripts. |
Collaborator Contribution | The Retrieval and Enrichment of Archival Documents (READ), research team, funded by the European Commission's Horizon2020 programme from 2016-19, was a pan-European consortia consisted of computer scientists, computational linguists, archives and information professionals, and humanities scholars, which carried out fundamental research in the indexing, searching, and full transcription by computers of handwritten historic manuscripts, using Handwritten Text Recognition (HTR), Document Image Analysis, and Keyword Spotting technologies. The tools, once developed, were made widely available to the public, scholars, and research institutions through the Transkribus client, now a standard tool for the automated transcription of handwritten material with tens of thousands of registered users. Transkribus is now supported and further developed by the READ COOP, a non-profit organisation whose subscribers include the British Library, and the respective national archives of Finland, Luxembourg, Norway, and Sweden. The work of the READ team was given the final (highest) rating of 'outstanding' by the European Commission's assessment panel. Such was the success of the READ project that it received one of the five Horizon Impact Awards for 2020 from the Commission, out of a field of 225 applicants across all disciplines. |
Impact | Transkribus https://readcoop.eu/transkribus/ |
Start Year | 2016 |
Description | tranScriptorium |
Organisation | Polytechnic University of Valencia |
Country | Spain |
Sector | Academic/University |
PI Contribution | tranScriptorium is a STREP of the Seventh Framework Programme in the ICT for Learning and Access to Cultural Resources challenge. tranScriptorium is planned to last from 1 January 2013 to 31 December 2015. tranScriptorium aims to develop innovative, efficient and cost-effective solutions for the indexing, search and full transcription of historical handwritten document images, using modern, holistic Handwritten Text Recognition (HTR) technology. tranScriptorium will turn HTR into a mature technology by addressing the following objectives: - Enhancing HTR technology for efficient transcription - Departing from state-of-the-art HTR approaches, tranScriptorium will capitalize on interactive-predictive techniques for effective and user-friendly computer-assisted transcrition. - Bringing the HTR technology to users Expected users of the HTR technology belong mainly to two groups: a) individual reserachers with experience in handwritten documents transcription interested in transcribing specific documents. b) volunteers which collaborate in large transcription projects. The HTR technology will support the digitization of the handwritten materials. The outcomes of the tranScriptorium tools will be attached to the published handwritten document images. This includes not only full, correct transcriptions, but also partially correct transcription and other kinds of automatically produced metadata, useful for indexing and searching. |
Start Year | 2013 |
Description | 'Transcribe Bentham': Presentation to the Digital Communities winner's forum |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation to attendees of the Digital Communties category winners' forum, Ars Electronica festival. Brucknerhaus, Linz, 4 September 2011. |
Year(s) Of Engagement Activity | 2011 |
Description | Crowdsourcing: Utilizing the Power of the Many in Research |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | |
Results and Impact | Dr Causer was invited to take part in a panel on crowdsourcing, to an international audience of market researchers, to discuss ways in which different organisation have implemented crowdsourcing in what they do. The other panel members were: - Benita Matofska: Chief Sharer, People Who Share - Phil Geraghty: Managing Director, PeopleFund.it - Heidi Schneigansz: Idea Bounty |
Year(s) Of Engagement Activity | 2013 |
Description | Hacking the Past: An Archives Game Jam |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Around 50 attendees attended this event, organised in conjunction with The National Archives, to create 'games with a purpose', that is to encourage the transcription of archival material through games. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.eventbrite.co.uk/e/hacking-the-past-an-archives-game-jam-tickets-53954846398# |
Description | The Bentham Hackathon, 22-23 October 2017 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Around 60 to 70 participants registered to take part in the 'Bentham Hackathon', organised in conjunction with UCL Centre for Digital Humanities, UCL Innovation and Enterprise, and IBM, to explore how to use digital tools to explore Bentham's life and work. |
Year(s) Of Engagement Activity | 2017 |
URL | https://blogs.ucl.ac.uk/transcribe-bentham/2017/10/24/project-update-bentham-hackathon-weekend/ |