A Collaboration between Classics and Astrophysics: An Advanced Multispectral Imaging Laboratory Optimised through Crowd-Sourced Statistical Analysis

Lead Research Organisation: University of Oxford
Department Name: Classics Faculty

Abstract

Transcribing ancient texts written on papyrus and other artefacts has traditionally been slow, painstaking and difficult work undertaken primarily by specialists. Recently, new techniques of digital imaging, including multispectral and 3-D imaging, such as those produced by the Imaging Papyri and Advanced Multispectral Imaging Projects at the University of Oxford (www.papyrology.ox.ac.uk/advancedMSI) have enabled new readings of previously indecipherable texts. Public interest in these electronic techniques of enhanced textual visualation has been considerable. New techniques of imaging fragments of ancient Biblical texts, ancient lyric and dramatic poetry, and ancient historical accounts have documented by popular programs on radio and television, including, for example, Discovery Channel and Nova.

Paradoxically, our team faced a huge problem, and an equally tantalising opportunity: a huge number of texts await transcription by a small number of specialists while large numbers of the interested public were keen to handle, read, and decipher the texts by themselves. In the propose project we resolve to foster and enable this unprecedented public interest, and to harness its tremendous potential to speed, to utterly transform the way ancient texts are transcribed, deciphered, and understood. Our objective is not merely to broaden access to digital images of ancient texts, or simply to improve the quality of access, but to make use of scientific methods that will endow large populations of interested users (c. 200,000) with means of enhancing and optimising their own access and opportunity to transcribe and to decipher ancient documents that form part of our collective cultural heritage.

Thus we propose to establish a collaborative advanced multispectral imaging and multimedia laboratory for papyri and other inscribed ancient artefacts at the University of Oxford. The project will achieve new forms of participation and access for user groups of unprecedented size and diversity to digital images of different kinds of ancient artefacts, including images of published Oxyrhynchus and Herculaneum papyri created by the Imaging Papyri at Oxford Project. Interdisciplinary project collaboration of an entirely new kind and order, linking research teams in the fields of Classics (Greek and Latin Papyrology) and Astrophysics, will enable transcription and decipherment of ancient Greek texts captured on diagnostic multispectral images of papyri and other ancient artefacts by the use of methods and techniques of statistical weighting and analysis heretofore confined to the natural and astronomical sciences. Broad public access for the ancient texts will be dramatically increased not only in scale, but also in quality.

In this first phase of the project we will focus on delivery, transcription and decipherment of a large number of conventional RGB and black and white images of papyri and related artefacts. Text and images will be published online at http://www.papyrology.ox.ac.uk.


Through close collaboration and large-scale distributed storage, transport, and sharing of bulk multispectral image data with the Galaxy Zoo and Zoo Universe research teams at the University of Oxford ( http://www.galaxyzoo.org ), we will develop online public user interfaces which will enable statistically weighted contributions from users to be used to decipher ancient texts and to optimise diagnostic multispectral imaging workflows and MSI image post-processing algorithms. This work will be done in coordination with a grant pending from the John Fell Fund, Oxford for establishing data-transport between Imaging Papyri and Galaxy Zoo. The result will be an interoperable data resource, insured for sustainable access through distributed storage with (1) Imaging Papyri NAS storage units; (2) storage in data facilities currently being designed by OUCS at Oxford; (3) large-scale bulk storage via Zoo's redundant cloud computing facilities.

Planned Impact

The project will benefit individuals, larger groups, (e.g. Greek nationals), non-profit foundations, charities, and government bodies (e.g. Greek Ministry of Culture) concerned with preservation of Greek and European cultural heritage, museums, schools, local and national councils of primary school and secondary school education. The project is also of considerable potential benefit in the commercial sector, through use of human-assisted OCR technology and its provision of convenient GUs enabling Iuser selection of multispectral imaging (MSI) algorithms. Museums and collections will remain interested in our imaging research and outputs, both as means of deciphering ancient texts and as a supplement to traditional conservation. The memberships of charity organisations such as the Herculaneum Society, have shown continued interest in our digital imaging efforts over the past six years, as have several organisations in the national and international media (Discovery Channel, Der Spiegel, The Times, Telegraph, Nova).

The project will immediately impact upon interested individuals from the point of its inception, in a scope of unprecedented scale and diversity. The collaborating astrophysics research teams have developed projects (www.galaxyzoo.org) that have engaged the input of c. 200,000 users. In our project users will have opportunity to educate themselves through special control training sets consisting of digitised multispectral images of papyri and associated digitised texts. Beginning from elementary sets proceeding to sets representing more advanced levels of expertise, these training sets will enable users to learn to the extent of their ability and desire. The individual user will be able immediately to track the results both of his or her training and of subsequent efforts at transcribing individual letters and words on 'live', untranscribed, papyri. User conjectures will be tracked in relation to final results reached in combination and collaboration with specialists and human-assisted OCR input. Because the training sets are readily adaptable to classroom instruction, and will consist of specimens incrementally graduated in difficulty, they could find broad use in schools at all levels and in university curricula. One interested population would be Greek nationals, who, familiar at the outset with Greek lettering, showed great interest in deciphering the papyri at a conference recently held by Dr Obbink's research teams in the Sackler Library, Oxford in cooperation with the Greek Ministry of Culture (see http://www.papyrology.ox.ac.uk/POxy/newsitems/index.html).

To ensure further that these potential benefits are realised, we will publicise through several venues:

--our Advanced Multispectral Imaging exhibit in the new Reading and Writing Room of the Ashmolean Museum.

--our Advanced Multispectral Imaging Page: http://www.papyrology.ox.ac.uk/advancedMSI

--radio and television programs that have documented our projects before (Discovery Channel, Der Spiegel, The Times, Telegraph, Nova).

--the semi-annual Newsletters and public engagements of the Herculaneum Society, a registered UK charity.

--through school outreach operations conducted by the Society

--through publication of the Oxyrhynchus Papyri series.

We will also work with and through the agency of the U. of Oxford's Vice-Chancellor's forum on Impact, for which Dr Obbink sat on a panel in the summer of 2009.

The results and techniques of human-assisted OCR performed in cooperation with the astrophysics research teams will be published on our website, as will our GUIs and MSI post-processing algorithms.

Publications

10 25 50
publication icon
Brusuelas, J. H. (2016) Plutarch, Vita Caesaris 45.8.4 - 46.1.3 in The Oxyrhynchus Papyri Vol. LXXXI

publication icon
Brusuelas, J. H. (2016) Polybius, Histories 28.2.6.1 - 8.1 in The Oxyrhynchus Papyri Vol. LXXXI

publication icon
Brusuelas, J. H. (2016) Epictetus, Discourses iv.11.2 - 12.1 in The Oxyrhynchus Papyri Vol. LXXXI

publication icon
Brusuelas, J. H. (2016) Simonides, Elegiae in The Oxyrhynchus Papyri Vol. LXXXI

publication icon
Brusuelas, J. H. (2016) Theognis, Elegiae 1.1117 - 1140 in The Oxyrhynchus Papyri Vol. LXXXI

publication icon
Brusuelas, J. H. (2016) [Plutarch], De proverbiis Alexandrinorum 50 in The Oxyrhynchus Papyri Vol. LXXXI

publication icon
Brusuelas, J.H. (2012) The Oxyrhynchus Papyri Vol. LXXVIII

publication icon
Brusuelas, J.H. (2012) The Oxyrhynchus Papyri Vol. LXXVIII

publication icon
Brusuelas, J.H. (2012) The Oxyrhynchus Papyri Vol. LXXVIII

publication icon
Brusuelas, J.H. (2012) The Oxyrhynchus Papyri Vol. LXXVIII

publication icon
Brusuelas, J. H. (2016) The Oxyrhynchus Papyri Vol. LXXXI

 
Description This project produced the Ancient Lives online platform, which uses a simple web-based interface to engage in crowdsourcing the transcription of a large number of Greek papyri. Combining human computing with machine intelligence, the fundamental goal of Ancient Lives is to rapidly transform image data from papyri into meaningful information that scholars can use to study Greek literature and Greco-Roman Egypt; information that once took generations to produce. Since its launch in 2011, the project has recorded well over 1.5 million transcriptions, the work of over 915,621 users. In order to process this big data, we have thus overseen development of algorithms for extracting the consensus transcriptions, the automated production of a database of published and unpublished Greek texts in Unicode, and the adaptation of bioinformatics algorithms for automated identification of texts from known authors. Ancient Lives users have particularly assisted in identifying new texts by Greek authors such as Simonides, Plutarch, Menander, and Epictetus. Concurrent with the aforementioned development, Ancient Lives has demonstrated that the general public, even with little to no training in the ancient Greek language, and based on the simple method of pattern recognition, can provide transcription data that papyrologists can use to enhance, evolve, and expedite their workflow. Moreover, Ancient Lives has generated a further research initiative that examines how we can use these crowdsourced transcriptions as a foundation for creating digital critical editions of ancient literary and paraliterary fragments.
Exploitation Route As a crowdsourced transcription project based on image data, Ancient Lives uniquely collects character input via x,y coordinates rather than text fields. Since the dataset is handwritten text on ancient Greek papyrus, a virtual keyboard for entering character data was required, as not everyone around the globe has a keyboard capable of entering ancient Greek. Subsequently, we have produced the Ancient Lives pipeline, the method for algorithmically extracting consensus transcriptions and line sequencing (creation of lines of text in a Unicode text file) after aggregating the x,y coordinate data. Initially this was done in MATLAB using kernel density estimation, a common machine learning technique. That method, however, took days to process the data. As an alternative method, in 2015 we designed and developed a new computational pipeline (using only Python) that both executes in only a few minutes and produces consensus letter identifications with higher accuracy; this approach both identifies and leverages user expertise in order to increase the reliability of calculated consensus letter identifications. Overall, this methodology can easily be offered as a template to other projects with similar datasets to engage in large-scale crowdsourcing the process of big data. Furthermore, the Classics outreach involved, bringing raw ancient data to the public, continues to generate interest in Classics and ancient Greek literature across generations. The website has particularly been used in classrooms for didactic purposes by teachers at various levels of instruction, from grammar school to undergraduate university classrooms. A new iteration of the Ancient Lives platform is in its final stages of development and will re-launch in Spring 2016. This new platform will also allow users to transcribe Coptic manuscripts, as well as papyri from collections outside of Oxford.
Sectors Digital/Communication/Information Technologies (including Software),Education,Culture, Heritage, Museums and Collections

URL http://www.papyrology.ox.ac.uk/Ancient_Lives/
 
Description The Classic outreach involved in Ancient Lives, bringing raw ancient data to the public, continues to generate interest in Classics and ancient Greek literature across generations. The website has particularly been used in classrooms for didactic purposes by teachers at various levels of instruction, from grammar school to undergraduate university classrooms. Ancient Lives is also evolving. In 2017, the project will be relaunched with a new platform that allows users to not only transcribe Coptic manuscripts, but also fragments from collections outside of Oxford. In order to promote further collaboration and outreach, we are becoming even more inclusive. Ancient Lives is a place for anyone in the world to interact with professional papyrologists, to freely and comfortably ask questions about ancient papyri, Greek literature, and early Christianity.
Sector Digital/Communication/Information Technologies (including Software),Education,Culture, Heritage, Museums and Collections
Impact Types Cultural

 
Title Ancient Lives Database 
Description Collection of over 1.5 million crowd-sourced online transcriptions of published and unpublished papyri. See www.ancientlives.org for details of project. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Identification of large number of previously unidentified texts by online community of volunteers, some of which have been (or are about to be) edited and published. 
URL http://www.ancientlives.org
 
Description Ancient Lives in the News 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Interview with the Independent before Dirk Obbink's main lecture at the World Monuments Fund (Britain) series of lectures held at the Royal Geographical Society near the Royal Albert Hall. This was a educational event open to the public.
Year(s) Of Engagement Activity 2016
URL http://www.independent.co.uk/news/science/ancient-egypt-citizen-scientists-reveal-tales-of-tragedy-u...
 
Description Ancient Lives visit to local schools 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Ancient lives demonstration at the Cheney School in Oxford. Via a desktop computer, students were able to try out the Ancient Lives website under the guidance of project staff.
Year(s) Of Engagement Activity 2016