Shakespeare's Early Editions: Computational Methods for Textual Studies

Lead Research Organisation: De Montfort University

Department Name: School of Humanities

Abstract

We know what William Shakespeare wrote only because in his lifetime, and shortly after it, his works appeared in printed form from various small London publishers. We have none of his manuscripts, so all modern editions of Shakespeare are based on these surviving printed editions. About half of his works appeared during his lifetime in cheap single-play editions known as quartos and in 1623 (seven years after Shakespeare's death) a large collected works edition of 36 of his plays, known as the First Folio, was published with assistance from his fellow actors. Where we have both quarto and Folio versions of a play, they are never identical. Hundreds or thousands of 'variants' ranging from single words to whole lines, speeches, and even scenes are present or absent in one or other edition, or are entirely reworded and/or placed in a different part of the play. Unlike the plays, Shakespeare's poems were well published and present far fewer editorial problems.

Despite centuries of study, we cannot satisfactorily explain the quarto/Folio (Q/F) variants. Some will be errors made in the printing of one or other early edition, or in the prior copying of the lost manuscripts from which those printings were made. Others will be the results of censorship that required the toning down of religious expressions used as swear-words. Others still will be the results of Shakespeare changing his mind and revising a play after first composing it, or his fellow actors changing it with or without his consent. Just which reason explains each variant is hard to say because their results can be similar. As readers and editors of Shakespeare we want to find out which reason explains each variant because we want to correct the printer's errors and censorship but not to undo second thoughts and other kinds of revision in order to show modern readers what Shakespeare actually wrote. Where he or his fellows revised a play, we want to see how it stood before and after the revision in order to understand the motivations for changing it.

The newest discoveries about Shakespeare's habits of writing concern co-authorship. Scholars used to believe that except for short periods at the start and end of his career, Shakespeare habitually wrote on his own, but we now know that as many as one-third of his works were co-written with other dramatists. This has been shown by multiple independent studies using computational stylistics, which measure features of a writer's style that are invisible to the naked eye but can be counted by machines. For the past three decades, prevailing theories of authorship have suggested that where two writers collaborate on a work they blend their styles--effectively imitating one another--so that it would be all but impossible to decide later who wrote each part of the resulting composite work. Computer-aided analysis has proved this to be untrue: personal traits of writing can be discerned even where writers attempt to efface them.

The proposed project will use the latest techniques in computational stylistics to study the problem of the Q/F variants. The techniques are particularly suited to (indeed, were first developed for) the discrimination of random corruption from systematic alteration. This discrimination goes to the heart of the Q/F variants problem: we want to know which differences result from mere errors in transmission and which are something else. Now that we have reliable tools to discriminate authorial styles, and have a reasonable set of baseline style-profiles for most of Shakespeare's fellow dramatists, we ought to be able to see how far artistic revision by Shakespeare and/or his collaborators caused the differences between the early editions, which remain our only access to Shakespeare. The better we understand the Q/F differences, the better account we can give of what Shakespeare actually wrote.

Planned Impact

Who might benefit from the concentrated research phase of the project, and how?

* Individual readers and playgoers of Shakespeare (of all ages) who want to gain a better insight into what he wrote and when, including his collaborative activities, his professional career, and what is irretrievably lost to us because of errors in transmission, will be able to do so from our published results. This impact will begin 1+ years from project end and take the form of long-lasting improvement in artistic enjoyment.

* Theatre groups who want their productions to reflect the current state of knowledge about what Shakespeare wrote and how he did it will be able to do so from our published results. The success of the London replica Globe theatres and the attempts of other companies to emulate it show that paying audiences are deeply concerned with the original conditions under which Shakespeare's artistry was developed, and care about the details of his dramatic creativity. This impact will begin 2+ years from the project end and take the form of improved performances.

* Publishers of Shakespeare editions who want to give their readers--the general public as well as specialists--the latest state of knowledge about Shakespeare's processes of authorship and how they relate to theatrical practice in his time will be able to do so from our published results. This impact will begin 2+ years from project end and take the form of books that are better able to satisfy their readers' intellectual curiosity.

Who might benefit from the leadership/dissemination phase of the project, and how?

* The 'Link' from each host institution for the Travelling Roadshow will benefit from 40 hours of bespoke training in computational methods for textual analysis while in residence at the Centre for Textual Studies at De Montfort University. This impact will begin during the project and permanently enhance the Link's abilities.

* Individuals from each host institution who wish to develop their skills in computational methods for textual analysis will be able to do so by attending Travelling Roadshow as it visits each regional centre. This impact will begin during the project and permanently enhance the attendees' abilities.

* The host institutions for the Travelling Roadshow will benefit in having their members' skills in computational approaches to textual analysis improved, not only broadening their institutional skill- and knowledge-bases in a burgeoning area of research and teaching, but also increasing their institutional capacity to undertake projects using such methodologies. This impact will begin during the project and will form permanent institutional improvement.

* The host institutions for the Travelling Roadshow will benefit from being able to offer two public performances (produced by the PI and performed by his undergraduate students) that give the general public an insight into how computers work and how they are able to store and process texts. These peformances will help host institutions fulfil their own public engagement and outreach agendas. This impact will begin during the project and will form permanent institutional improvement.

* The general public will benefit from attending the Travelling Roadshow's two public performances on how computers work and how they are able to store and process writing. These performances are interactive: audience members will be invited to join in various hands-on activities on the stage. This impact will begin during the project and comprise a societal good of improved public understanding.

* The general public, including school groups and unaffiliated interested amateurs, will benefit from being able to attend the Literary Hackathon at De Montfort University in which hands-on training in computational methods will be applied. This impact will begin during the project and comprise a societal good of improved public understanding, including among school students.

Funded Value:

£249,156

Funded Period:

Nov 16 - Jul 18

Funder:

AHRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

AH/N007654/1

Principal Investigator:

Gabriel Egan

Research Subject:

Drama & theatre studies (20%)

Languages & Literature (20%)

Linguistics (60%)

Research Topic:

Computational Linguistics (20%)

Corpus Linguistics (20%)

English Language & Literature (20%)

Textual Editing & Bibliography (20%)

Theatre & History (20%)

Organisations

People	ORCID iD
Gabriel Egan (Principal Investigator / Fellow)

Publications

Author Name

Title Publication Date Published

10 25 50

Brown P (2022) How the Word Adjacency Network (WAN) works in Digital Scholarship in the Humanities

Brown P (2021) How the Word Adjacency Network (WAN) algorithm works in Digital Scholarship in the Humanities

Colyvas K (2023) Changes in the length of speeches in the plays of William Shakespeare and his contemporaries: A mixed models approach in PLOS ONE

Eisen M (2018) Stylometric analysis of Early Modern period English plays in Digital Scholarship in the Humanities

Moscato P (2022) Multiple regression techniques for modelling dates of first performances of Shakespeare-era plays in Expert Systems with Applications

Segarra S (2019) A Response to Pervez Rizvi's Critique of the Word Adjacency Method for Authorship Attribution in ANQ: A Quarterly Journal of Short Articles, Notes and Reviews

Segarra S (2020) A Response to Rosalind Barber's Critique of the Word Adjacency Method for Authorship Attribution in ANQ: A Quarterly Journal of Short Articles, Notes and Reviews

Shakespeare, W (2017) The New Oxford Shakespeare Complete Works: Critical Reference Edition

Taylor, G (2017) The New Oxford Shakespeare Authorship Companion

Artistic and Creative Products
Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Title	Live theatrical performance called "Yes/No/Maybe"
Description	The theatre company Zoo Indigo were commissioned to produce a one-hour live performance concerning the intersection of computational methods with artistic creativity. The performance they created was called "Yes/No/Maybe" and it toured the UK as part of my Travelling Roadshow on computational methods for textual analysis and was given in Oxford, Bath, Liverpool, Leeds, and Glasgow.
Type Of Art	Performance (Music, Dance, Drama, etc)
Year Produced	2018
Impact	Post-show talks after each performance garnered oral evidence that audiences found its exploration of the intersection of artistic creativity with the increasing digitization of artforms to be both moving and revelatory.


Description	We have discovered that we can in fact objectively distinguish the early editions of Shakespeare's works that have traditionally been called 'bad quartos' (reputed to be texts that were corrupted in various ways) from the editions of the same works that are traditionally known as 'good quartos' and the 'Folio'. We have done this by advanced computational methods described in our original research proposal. We further discovered that we can put authorship and date limits (that is, approximations of when they were written and by whom) on the differences between the quarto and Folio editions of particular Shakespeare works, so that we are able to more confidently describe the processes of authorial and non-authorial revision that separate the early editions.
Exploitation Route	Anyone who wants to can read reports, download our encoded texts, download our software, run similar tests for themselves using our examples.
Sectors	Creative Economy Digital/Communication/Information Technologies (including Software) Education Leisure Activities including Sports Recreation and Tourism
URL	http://see.dmu.ac.uk/blog/admin/static/SEE/389.html


Description	The societal impact was greater public appreciation of the ways that digital methods increasingly engage artistic practice. Also, researchers (including those in charge of arranging for the training of new researchers in the future) reported that they horizons had been broadened regarding the use of digital methods in areas they had not realized were amenable to such approaches.
First Year Of Impact	2018
Sector	Digital/Communication/Information Technologies (including Software),Education,Electronics,Leisure Activities, including Sports, Recreation and Tourism
Impact Types	Cultural Societal Policy & public services


Description	Broadening the horizons of UK researchers undertaking investigations into text
Geographic Reach	National
Policy Influence Type	Influenced training of practitioners or researchers
Impact	The sessions in my Travelling Roadshow that were aimed at researchers in UK universities and their research students were reported by attendees to have increased their capacity for undertaking their own research in the computational analysis of texts and to have opened their eyes to potential forms of intellectual enquiry that they did not know were possible. Attendees reported that they would pursue these possibilities within their own institutions, and in particular would seek to convince researchers not currently using computational methods for textual analysis that this kind of work might have something to offer them. That is exactly what the part of the Roadshow aimed at researchers was intended to achieve in order to fulfil its purpose of helping to transform how text-based research is done in this country.


Description	Talent Development Awards 2021
Amount	£7,235 (GBP)
Funding ID	TDA21\210025
Organisation	The British Academy
Sector	Academic/University
Country	United Kingdom
Start	01/2022
End	12/2022


Title	New software for analysing early modern printed plays and providing statistics about their language use
Description	As part of this project, the Post-Doctoral Research Associate, in collaboration with the PI Prof Egan, wrote software for undertaking certain kinds of statistical analysis of early modern printed plays. This software was written as shell scripts, Python code, and Haskell code and is given away freely on the project website.
Type Of Material	Improvements to research infrastructure
Year Produced	2017
Provided To Others?	Yes
Impact	Too soon to say, the project is still ongoing
URL	http://see.dmu.ac.uk/


Title	ShaLT website
Description	We build a website documenting the entire project and providing open access reports on all our work and the source code for all the software we wrote.
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
Impact	Others in the field have reported that our source code has been valuable to their investigations
URL	http://see.dmu.ac.uk


Title	XML transcriptions of early printed plays by Shakespeare
Description	As part of this project we have paid graduate students to produce high-quality XML transcriptions of all the important early printed editions of all of Shakespeare's plays, using the Text Encoding Initiative tagset in such a way as to maximize the utility of these transcriptions for the purpose of answering questions that arise in computational stylistics.
Type Of Material	Database/Collection of data
Year Produced	2017
Provided To Others?	Yes
Impact	Too soon for impact: the project is ongoing
URL	http://see.dmu.ac.uk/


Description	Collaboration with a team of three electrical engineers at the University of Pennsylvania
Organisation	University of Pennsylvania
Country	United States
Sector	Academic/University
PI Contribution	At the UK end of this collaboration, I provided the knowledge and expertise on the early printed editions of Shakespeare and other writers of his time that shaped the investigation from the beginning. That is, where we went looking was decided by my knowledge of the intellectual field, what facts are known with tolerable certainty and what needed to be discovered. As we ran the computational experiments, I provided commentary on the significance of their results and pointed the team's attention in new directions in the light of them. When we came to writing up the experiments for publication (Segarra et al. 2017) I ensured that the mathematical claims meshed with the literary-historical claims to produce a coherent whole.
Collaborator Contribution	The US end of this collaboration provided the mathematical and computational expertise to run experiments on digital transcriptions of early printed texts by Shakespeare and other writers of his time. They wrote the software code and ran the statistical tests for significance. They wrote the first draft of the write up for publication (Segarra et al. 2017).
Impact	[1] Segarra, Santiago; Mark Eisen; Gabriel Egan; Alejandro Ribeiro (2016) "Attributing the authorship of the _Henry VI_ plays by word adjacency" Shakespeare Quarterly 67: 232-256. Multidisciplinary: computation + programming + statistics + literary history + book history. [2] Segarra, Santiago; Mark Eisen; Gabriel Egan; Alejandro Ribeiro (2017) Stylometric analysis of Early Modern English plays" Digital Scholarship in the Humanities (advance online access). Multidisciplinary: computation + programming + statistics + literary history + book history.
Start Year	2014


Description	Collaboration with the Centre for Literary and Linguistic Computing (CLLC) at the University of Newcastle in Australia under the direction of Professor Hugh Craig
Organisation	University of Newcastle
Country	Australia
Sector	Academic/University
PI Contribution	My team created basic XML transcriptions of early modern printed plays for computational investigation. Using our knowledge of the early modern book history, we judged which plays and which editions to work on first and just how to apply the Text Encoding Initiative (TEI) guidelines in a way that would maximize the usefulness of these transcriptions for our purposes.
Collaborator Contribution	The Australian team took our minimally encoded XML transcriptions and added value to them by applying regularization of variant spellings and the identification of various parts-of-speech that we would look for in our analyses of these texts.
Impact	No outputs/outcomes yet: project still ongoing.
Start Year	2016


Title	A Open Source Python Script for Application of the Word Adjacency Network (WAN) Method of Authorship Attribution
Description	A Word Adjacency Network (WAN) is a mathematical system, a Markov chain, that represents the proximities of certain words within a text. Given a list of words-of-interest, typically the 100 or so function words that comprise about half of all spoken and written English, and a text to process, the WAN computer algorithm will record in a Markov chain (stored internally as a two-dimensional matrix) the averaged distances at which each of the words-of-interest is found from each of the others. It has been demonstrated that the habit of placing certain function words near to other function words is an authorial characteristic that varies from one writer to another in a distinctive and relatively consistent way and hence that a comparison of WANs derived from different texts can provide evidence for attributing authorship in cases where other evidence is not available. The comparison of a WAN derived from one text, say a piece of writing of disputed authorship, and the WAN derived from another text, say the securely attributed works of a plausible candidate for authorship of the disputed text, is a comparison of differing probability distributions. For this comparison the standard measure from Information Theory known as Kullback-Leibler divergence (more colloquially, relative entropy) may be used. When comparing the relative entropy between WANs from multiple texts, the lowest values tend to be between WANs from texts by the same author. This fact is exploited in application of the WAN method to the problem of authorship attribution: amongst the candidate authors for a disputed or suspect text we seek the one whose authorial canon is least different in this regard from the text to be attributed. . The software provided is written in the language Python (version 3) and takes as its input three ASCII text files: 1) a sample of writing, 2) another sample of writing, and 3) a list of words-of-interest. The software creates two Word Adjacency Networks, each representing the proximities of the words-of-interest for one of the two samples of writing, and then calculates and outputs the relative entropy between the two WANs.
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	I have been contacted by other scholars in the field (Dr Ros Barber, Goldsmiths College, Pervez Rizvi, independent scholar) wanting to use to software and have given them additional advice on doing so.
URL	http://gabrielegan.com/WAN/


Description	A Travelling Roadshow on Computers and Text
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Using the money allocated in this grant, I devised a Travelling Roadshow that toured the UK visiting venues in Oxford, Bath, Liverpool, Leeds, and Glasgow. The topic of the Roadshow was computational analysis of texts of all kinds, and it included specific interactive sessions delivered by me and a colleague, Dr Paul Brown, on such themes as the use of statistics in studying texts, the notion of Relative Entropy, an introduction to programming for textual analysis, an introduction to data-mining using standard office productivity applications, the use of XML, XSLT, and XQuery, and an introduction to how computers work. Each Roadshow visit to each town also included a live theatre performance called "Yes/No/Maybe" of which I was the producer that was commissioned and delivered by the theatre company Zoo Indigo.
Year(s) Of Engagement Activity	2018


Description	Text Hackathon
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	85 members of the General Public. aged 13 to 75, came to De Montfort University for a event described thus: << If you are interested in what we can do with computers to extract knowledge from big digital texts, then this event will interest you. How big do we mean? How about all the 19th-century novels? Or all the speeches in the UK Parliament since the Second World War? Or all the printed books published in England between the arrival of printing in 1475 and the year 1800? Or all the 18th-century newspaper reviews of London theatrical performances? Or all the 11,500,000 leaked Panama Papers (= the Mossack Fonseca files)? If there is a big dataset of text to be investigated, this event can show you how to extract knowledge from it. >>
Year(s) Of Engagement Activity	2017
URL	http://cts.dmu.ac.uk/events/hackathon


Description	Training the Links
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The Centre for Textual Studies at De Montfort University has funding from the Arts and Humanities Research Council to take on tour around the UK a Travelling Roadshow on Computational Methods in Textual Studies, during Spring 2018. Each Roadshow event will comprise a series of talks, demos, and hands-on training events for students and staff at a UK university, followed by a public performance in which actors attempt to convey some of the wonderousness of machines being able to store and process human language. Each academic venue hosting a Roadshow has nominated a person, The Link, to liaise between the host and the Centre for Textual Studies. The Links will facilitate the Roadshow coming to their institutions and assist in delivering the sessions of the Roadshow there. Before that, the Links will come to the Centre for Textual Studies at De Montfort University in Leicester to receive a week of specialist training in digital techniques customized to their interests. >>
Year(s) Of Engagement Activity	2017
URL	http://cts.dmu.ac.uk/events/Roadshow-Links

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications