Proteus: Capturing the Big Data Problem of Ancient Literary Fragments

Lead Research Organisation: University of Oxford
Department Name: Classics Faculty

Abstract

The hugely successful Ancient Lives website, a crowd-sourced collaborative online project dealing with the archive of Greek papyri from Oxyrhynchus, which has had tremendous public impact over the past two years, has generated a massive database - over 1.5 million transcriptions of the Oxyrhynchus papyri (over 7 million characters), the work of over 250,00 online collaborators. These include literary, sub-literary and documentary texts. To handle this mass of data with maximum efficiency we now propose to build Proteus, a system comprising of both an innovative digital editing and research tool for scholars working on unpublished papyri and a larger web portal devoted to literary and subliterary fragments (sub-classified for history, biography, and so on). We aim to produce a single on-screen interface to enable comparison between original and edited versions of literary and subliterary texts, and establish an online system that facilitates collaboration between scholars in the editorial process. Proteus' editing tool will change the way papyrologists and editors work, while its web portal, accessible to the world, will both instigate further research and provide an intelligent means of interfacing the conjectures and variant readings that these fragments generate as they are re-edited over time. Whilst constructing this tool, we will also produce a critical edition of Servius' commentary on Vergil Aeneid VI, serving as a prototype for subsequent digital critical editions of his commentary on other books of the epic. This prototype will offer a framework for creating digital editions of ancient commentaries in general, whose complex formatting require innovative ways of digital reading, something beyond the codex experience. Given the biographical and historical nature of many of the texts generated, we will also take the opportunity to update a seminal reference work devoted to fragmentary evidence: the Die Fragmente der Griechischen Historiker Continued. Part IV: Biography and Antiquarian Literature.

The project will thus have five main outputs:
1) The Proteus digital editing tool, through which scholars assigned to work on Oxyrhynchus papyri will edit their fragments and make them ready for print and digital publication.
2) The Proteus digital portal for searching literary and subliterary fragments and the collaborative re-editing of texts. We intend to immediately populate our site with 1,700 unpublished Bodleian papyri, 300 Antinoopolis papyri, and select unpublished Sackler papyri including Homer's Iliad and Homerica, Herodotus and the Septuagint.
3) The digital Servius environment: a prototype interface for digitally researching and creating digital editions of ancient commentaries. Servius' commentaries on Virgil's Aeneid book VI will become accessible online in the first instance.
4) The imaging of hundreds of thousands more unstudied papyri from our archive for processing in Ancient Lives and Proteus.
5) Immediate updating of Die Fragmente der Griechischen Historiker Continued. Part IV: Biography and Antiquarian Literature with new fragments.

Planned Impact

Our project will reach a wide-ranging audience, from school children to professional scholars. It is a digital environment designed to engage both traditional and non-traditional research communities. Its original point of interface, Ancient Lives, is a place where original documents, previously confined to the eyes of experts, are made available to the public through the use of a web interface: interested users go online, browse and access fragments of their choice, and contribute to the project by entering transcriptions, measurements and other information by means of online tools. The resulting data, collated and refined by algorithmic intelligence, will greatly contribute to our knowledge of the Graeco-Roman world.
In capitalizing on our unique and massive database of unedited and unpublished Greek languages texts, we intend now to provide even greater open access to project output. The flow of data through Proteus represents a change in the very methodology of classical philology and papyrology. Scholars from anywhere in the world will be able to edit fragments with the assistance of intelligent linguistic data mining algorithms. This merging of human and machine intelligence is designed to speed up the process of analysis, while at the same time ensuring accuracy is retained. Moreover, Proteus' research and re-editing environment will offer global access to project output, a virtual place where any interested party can access and work with our data. With fragment numbers in excess of one million, and the subsequent data they generate, Proteus will never be static, but continually evolving and changing over time. Better still, for the hundreds of thousands of volunteers that have entered transcriptions on the Ancient Lives site, Proteus will also be the place where they can see the results of their work and even engage further, alongside the scholarly community, in its collaborative re-editing environment.

Publications

10 25 50
 
Description Due to the project having reached the end of its funding cycle and no continuation funding, Proteus is still slowly undergoing some debugging, testing, and upgrades, with a launch date now scheduled for 2018. Still, in terms of development and cyber infrastructure, Proteus has not only achieved its key goals, but also continues to evolve. In developing a platform that digitally captures the evolving data of Greek and Latin literary and subliterary papyri as they are edited and re-edited over time, the Proteus project has created a digital ecosystem for both creating next-generation born digital critical editions and generating the textual criticism that underwrites them.

The Proteus architecture consists of two components: the Digital Editor for Classical Philology (DELPHI) and the Proteus Search Interface. The project is implemented using Python, HTML5, CSS, JavaScript, a PostgreSQL database management system, and PostresSearch for search functionality; these components are packaged together using Django, a high-level Web framework for the Python programming language.

Proteus offers a virtual space for parallel critical editing, a process whereby one or multiple scholars can produce a digital edition and even suggest conjectures through critical notes, all of which are then accessible for future research. Focusing on papyrus fragments in its first iteration, citable, scholarly use is of the utmost importance. But Proteus is not just simply implementing the necessary attributes that make a Greek or Latin edition critical, but embracing the machine for what it is: not a book. A new text editor, data visualization, search, and version control are being employed to re-think how a user interfaces a text that can constantly change.

DELPHI

With DELPHI at the core of Proteus' digital ecosystem, this platform facilitates the creation of born digital critical editions. As a text editor DELPHI's design has been modeled on common Markdown editors and Integrated Development Environments (IDE). Our editor employs its unique Critical Syntax (CSYN) Markdown, a human-readable language for digital philology, alongside Critical Syntax for Papyri (CSYN-P), a modified and improved implementation of the EpiDoc XML schema; it also addresses the markup needs of Herculaneum papyri. In real-time, as users create digital editions using CYSN Markdown, CSYN-P is not only generated automatically, but also the HTML5 preview. DELPHI thus removes the labor-intensive process of XML markup (though a user can select to do so) and improves quality control, since syntactical errors are flagged and easily isolated via the real-time translation. More importantly, in this very same fashion, DELPHI allows editors to generate attributes that make an edition critical and thus citable in scholarship: a critical apparatus, testimonia, diplomatic editions, and even the palaeographical apparatus. Other highlights include: in-browser Greek keyboard and menus for inserting papyrological and critical symbols (no downloaded third party software or keyboards required); a diacritic menu inspired by the Apple OS X Character Accent Menu; JavaScript that makes data in the critical apparatus more engaging; and functionality for digitally editing marginalia and providing translations.
Exploitation Route Proteus and its new digital text editor (DELPHI) have created a simple and stable way for users to create born digital editions without engaging in any XML markup. Furthermore, these born digital editions are critical; since they contain the necessary data, they can be cited by scholars in their research papers. Proteus is thus a model for how Digital Classics projects can not only benefit the Digital Classics community, but also ensure that scholars are actually using and citing these platforms in their research. Moreover, in creating DELPHI, we have substantially improved and upgraded the EpiDoc standard for XML, notably adding numerous tags missing and required for creating digital editions. And, of course, in capturing and storing the way ancient literary and paraliterary fragments change over time, we are not only engaging in the latest research in version control, but also providing a simple and intuitive way of interfacing and searching the data; Proteus does not simply house a digital critical edition of a given text, but it is designed to spawn further editions of that text. The goal has been to move beyond the codex mindset, and this type of innovative development will affect users and creators of digital editions across disciplines. Due the success of our method, we are now also working with the forthcoming Digital Latin Library project in the United States and the nascent Latin Literature Online project based at the University of Barcelona. Instead of papyrus fragments, we are looking at how to adapt Proteus to create born digital editions based on texts transmitted in Medieval codices.
Sectors Digital/Communication/Information Technologies (including Software),Education,Culture, Heritage, Museums and Collections

URL http://www.papyrology.ox.ac.uk/ProteusProject/
 
Description Proteus and its new digital text editor (DELPHI) have created a simple and stable way for users to create born digital editions without engaging in any XML markup. Furthermore, these born digital editions are critical; since they contain the necessary data, they can be cited by scholars in their research papers. Proteus is thus a model for how Digital Classics projects can not only benefit the Digital Classics community, but also ensure that scholars are actually using and citing these platforms in their research. Moreover, in creating DELPHI, we have substantially improved and upgraded the EpiDoc standard for XML. And, of course, in capturing and storing the way ancient literary and paraliterary fragments change over time, we are not only engaging in the latest research in version control, but also providing a simple and intuitive way of interfacing and searching the data; Proteus does not simply house a digital critical edition of a given text, but it is designed to spawn further editions of that text. The goal has been to move beyond the codex mindset, and this type of innovative development will affect users and creators of digital editions across disciplines. Due the success of our method, we are now also working with the forthcoming Digital Latin Library project in the United States and the nascent Latin Literature Online project based at the University of Barcelona.
First Year Of Impact 2016
Sector Digital/Communication/Information Technologies (including Software),Education,Culture, Heritage, Museums and Collections
Impact Types Cultural

 
Title Proteus 
Description The Proteus website and its Digital Editor for Classical Philology (DELPHI) facilitate the creation of born digital critical editions. Using a PostgreSQL database, it stores multiple editions and multiple versions of a given edition in xml files. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact There are two areas of distinct impact: 1) The creation of born digital critical editions without using any xml; the system allows for automated conversion into XML and HTML5 in live time. 2) A version control system that permits multiple editions of one text to created, stored, and accessed in one spot. 
URL http://www.proteusproject.uk/