Shakespeare's Early Editions: Computational Methods for Textual Studies
Lead Research Organisation:
De Montfort University
Department Name: School of Humanities
Abstract
We know what William Shakespeare wrote only because in his lifetime, and shortly after it, his works appeared in printed form from various small London publishers. We have none of his manuscripts, so all modern editions of Shakespeare are based on these surviving printed editions. About half of his works appeared during his lifetime in cheap single-play editions known as quartos and in 1623 (seven years after Shakespeare's death) a large collected works edition of 36 of his plays, known as the First Folio, was published with assistance from his fellow actors. Where we have both quarto and Folio versions of a play, they are never identical. Hundreds or thousands of 'variants' ranging from single words to whole lines, speeches, and even scenes are present or absent in one or other edition, or are entirely reworded and/or placed in a different part of the play. Unlike the plays, Shakespeare's poems were well published and present far fewer editorial problems.
Despite centuries of study, we cannot satisfactorily explain the quarto/Folio (Q/F) variants. Some will be errors made in the printing of one or other early edition, or in the prior copying of the lost manuscripts from which those printings were made. Others will be the results of censorship that required the toning down of religious expressions used as swear-words. Others still will be the results of Shakespeare changing his mind and revising a play after first composing it, or his fellow actors changing it with or without his consent. Just which reason explains each variant is hard to say because their results can be similar. As readers and editors of Shakespeare we want to find out which reason explains each variant because we want to correct the printer's errors and censorship but not to undo second thoughts and other kinds of revision in order to show modern readers what Shakespeare actually wrote. Where he or his fellows revised a play, we want to see how it stood before and after the revision in order to understand the motivations for changing it.
The newest discoveries about Shakespeare's habits of writing concern co-authorship. Scholars used to believe that except for short periods at the start and end of his career, Shakespeare habitually wrote on his own, but we now know that as many as one-third of his works were co-written with other dramatists. This has been shown by multiple independent studies using computational stylistics, which measure features of a writer's style that are invisible to the naked eye but can be counted by machines. For the past three decades, prevailing theories of authorship have suggested that where two writers collaborate on a work they blend their styles--effectively imitating one another--so that it would be all but impossible to decide later who wrote each part of the resulting composite work. Computer-aided analysis has proved this to be untrue: personal traits of writing can be discerned even where writers attempt to efface them.
The proposed project will use the latest techniques in computational stylistics to study the problem of the Q/F variants. The techniques are particularly suited to (indeed, were first developed for) the discrimination of random corruption from systematic alteration. This discrimination goes to the heart of the Q/F variants problem: we want to know which differences result from mere errors in transmission and which are something else. Now that we have reliable tools to discriminate authorial styles, and have a reasonable set of baseline style-profiles for most of Shakespeare's fellow dramatists, we ought to be able to see how far artistic revision by Shakespeare and/or his collaborators caused the differences between the early editions, which remain our only access to Shakespeare. The better we understand the Q/F differences, the better account we can give of what Shakespeare actually wrote.
Despite centuries of study, we cannot satisfactorily explain the quarto/Folio (Q/F) variants. Some will be errors made in the printing of one or other early edition, or in the prior copying of the lost manuscripts from which those printings were made. Others will be the results of censorship that required the toning down of religious expressions used as swear-words. Others still will be the results of Shakespeare changing his mind and revising a play after first composing it, or his fellow actors changing it with or without his consent. Just which reason explains each variant is hard to say because their results can be similar. As readers and editors of Shakespeare we want to find out which reason explains each variant because we want to correct the printer's errors and censorship but not to undo second thoughts and other kinds of revision in order to show modern readers what Shakespeare actually wrote. Where he or his fellows revised a play, we want to see how it stood before and after the revision in order to understand the motivations for changing it.
The newest discoveries about Shakespeare's habits of writing concern co-authorship. Scholars used to believe that except for short periods at the start and end of his career, Shakespeare habitually wrote on his own, but we now know that as many as one-third of his works were co-written with other dramatists. This has been shown by multiple independent studies using computational stylistics, which measure features of a writer's style that are invisible to the naked eye but can be counted by machines. For the past three decades, prevailing theories of authorship have suggested that where two writers collaborate on a work they blend their styles--effectively imitating one another--so that it would be all but impossible to decide later who wrote each part of the resulting composite work. Computer-aided analysis has proved this to be untrue: personal traits of writing can be discerned even where writers attempt to efface them.
The proposed project will use the latest techniques in computational stylistics to study the problem of the Q/F variants. The techniques are particularly suited to (indeed, were first developed for) the discrimination of random corruption from systematic alteration. This discrimination goes to the heart of the Q/F variants problem: we want to know which differences result from mere errors in transmission and which are something else. Now that we have reliable tools to discriminate authorial styles, and have a reasonable set of baseline style-profiles for most of Shakespeare's fellow dramatists, we ought to be able to see how far artistic revision by Shakespeare and/or his collaborators caused the differences between the early editions, which remain our only access to Shakespeare. The better we understand the Q/F differences, the better account we can give of what Shakespeare actually wrote.
Planned Impact
Who might benefit from the concentrated research phase of the project, and how?
* Individual readers and playgoers of Shakespeare (of all ages) who want to gain a better insight into what he wrote and when, including his collaborative activities, his professional career, and what is irretrievably lost to us because of errors in transmission, will be able to do so from our published results. This impact will begin 1+ years from project end and take the form of long-lasting improvement in artistic enjoyment.
* Theatre groups who want their productions to reflect the current state of knowledge about what Shakespeare wrote and how he did it will be able to do so from our published results. The success of the London replica Globe theatres and the attempts of other companies to emulate it show that paying audiences are deeply concerned with the original conditions under which Shakespeare's artistry was developed, and care about the details of his dramatic creativity. This impact will begin 2+ years from the project end and take the form of improved performances.
* Publishers of Shakespeare editions who want to give their readers--the general public as well as specialists--the latest state of knowledge about Shakespeare's processes of authorship and how they relate to theatrical practice in his time will be able to do so from our published results. This impact will begin 2+ years from project end and take the form of books that are better able to satisfy their readers' intellectual curiosity.
Who might benefit from the leadership/dissemination phase of the project, and how?
* The 'Link' from each host institution for the Travelling Roadshow will benefit from 40 hours of bespoke training in computational methods for textual analysis while in residence at the Centre for Textual Studies at De Montfort University. This impact will begin during the project and permanently enhance the Link's abilities.
* Individuals from each host institution who wish to develop their skills in computational methods for textual analysis will be able to do so by attending Travelling Roadshow as it visits each regional centre. This impact will begin during the project and permanently enhance the attendees' abilities.
* The host institutions for the Travelling Roadshow will benefit in having their members' skills in computational approaches to textual analysis improved, not only broadening their institutional skill- and knowledge-bases in a burgeoning area of research and teaching, but also increasing their institutional capacity to undertake projects using such methodologies. This impact will begin during the project and will form permanent institutional improvement.
* The host institutions for the Travelling Roadshow will benefit from being able to offer two public performances (produced by the PI and performed by his undergraduate students) that give the general public an insight into how computers work and how they are able to store and process texts. These peformances will help host institutions fulfil their own public engagement and outreach agendas. This impact will begin during the project and will form permanent institutional improvement.
* The general public will benefit from attending the Travelling Roadshow's two public performances on how computers work and how they are able to store and process writing. These performances are interactive: audience members will be invited to join in various hands-on activities on the stage. This impact will begin during the project and comprise a societal good of improved public understanding.
* The general public, including school groups and unaffiliated interested amateurs, will benefit from being able to attend the Literary Hackathon at De Montfort University in which hands-on training in computational methods will be applied. This impact will begin during the project and comprise a societal good of improved public understanding, including among school students.
* Individual readers and playgoers of Shakespeare (of all ages) who want to gain a better insight into what he wrote and when, including his collaborative activities, his professional career, and what is irretrievably lost to us because of errors in transmission, will be able to do so from our published results. This impact will begin 1+ years from project end and take the form of long-lasting improvement in artistic enjoyment.
* Theatre groups who want their productions to reflect the current state of knowledge about what Shakespeare wrote and how he did it will be able to do so from our published results. The success of the London replica Globe theatres and the attempts of other companies to emulate it show that paying audiences are deeply concerned with the original conditions under which Shakespeare's artistry was developed, and care about the details of his dramatic creativity. This impact will begin 2+ years from the project end and take the form of improved performances.
* Publishers of Shakespeare editions who want to give their readers--the general public as well as specialists--the latest state of knowledge about Shakespeare's processes of authorship and how they relate to theatrical practice in his time will be able to do so from our published results. This impact will begin 2+ years from project end and take the form of books that are better able to satisfy their readers' intellectual curiosity.
Who might benefit from the leadership/dissemination phase of the project, and how?
* The 'Link' from each host institution for the Travelling Roadshow will benefit from 40 hours of bespoke training in computational methods for textual analysis while in residence at the Centre for Textual Studies at De Montfort University. This impact will begin during the project and permanently enhance the Link's abilities.
* Individuals from each host institution who wish to develop their skills in computational methods for textual analysis will be able to do so by attending Travelling Roadshow as it visits each regional centre. This impact will begin during the project and permanently enhance the attendees' abilities.
* The host institutions for the Travelling Roadshow will benefit in having their members' skills in computational approaches to textual analysis improved, not only broadening their institutional skill- and knowledge-bases in a burgeoning area of research and teaching, but also increasing their institutional capacity to undertake projects using such methodologies. This impact will begin during the project and will form permanent institutional improvement.
* The host institutions for the Travelling Roadshow will benefit from being able to offer two public performances (produced by the PI and performed by his undergraduate students) that give the general public an insight into how computers work and how they are able to store and process texts. These peformances will help host institutions fulfil their own public engagement and outreach agendas. This impact will begin during the project and will form permanent institutional improvement.
* The general public will benefit from attending the Travelling Roadshow's two public performances on how computers work and how they are able to store and process writing. These performances are interactive: audience members will be invited to join in various hands-on activities on the stage. This impact will begin during the project and comprise a societal good of improved public understanding.
* The general public, including school groups and unaffiliated interested amateurs, will benefit from being able to attend the Literary Hackathon at De Montfort University in which hands-on training in computational methods will be applied. This impact will begin during the project and comprise a societal good of improved public understanding, including among school students.
Organisations
- De Montfort University (Fellow, Lead Research Organisation)
- University of Pennsylvania (Collaboration)
- UNIVERSITY OF NEWCASTLE (Collaboration)
- Loughborough University (Project Partner)
- Centre for Computing History (Project Partner)
- Liverpool John Moores University (Project Partner)
- University of Oxford (Project Partner)
- University of Strathclyde (Project Partner)
People |
ORCID iD |
Gabriel Egan (Principal Investigator / Fellow) |
Publications
Brown P
(2022)
How the Word Adjacency Network (WAN) works
in Digital Scholarship in the Humanities
Brown P
(2021)
How the Word Adjacency Network (WAN) algorithm works
in Digital Scholarship in the Humanities
Eisen M
(2018)
Stylometric analysis of Early Modern period English plays
in Digital Scholarship in the Humanities
Moscato P
(2022)
Multiple regression techniques for modelling dates of first performances of Shakespeare-era plays
in Expert Systems with Applications
Segarra S
(2019)
A Response to Pervez Rizvi's Critique of the Word Adjacency Method for Authorship Attribution
in ANQ: A Quarterly Journal of Short Articles, Notes and Reviews
Segarra S
(2020)
A Response to Rosalind Barber's Critique of the Word Adjacency Method for Authorship Attribution
in ANQ: A Quarterly Journal of Short Articles, Notes and Reviews
Shakespeare, W
(2017)
The New Oxford Shakespeare Complete Works: Critical Reference Edition
Taylor, G
(2017)
The New Oxford Shakespeare Authorship Companion
Title | Live theatrical performance called "Yes/No/Maybe" |
Description | The theatre company Zoo Indigo were commissioned to produce a one-hour live performance concerning the intersection of computational methods with artistic creativity. The performance they created was called "Yes/No/Maybe" and it toured the UK as part of my Travelling Roadshow on computational methods for textual analysis and was given in Oxford, Bath, Liverpool, Leeds, and Glasgow. |
Type Of Art | Performance (Music, Dance, Drama, etc) |
Year Produced | 2018 |
Impact | Post-show talks after each performance garnered oral evidence that audiences found its exploration of the intersection of artistic creativity with the increasing digitization of artforms to be both moving and revelatory. |
Description | We have discovered that we can in fact objectively distinguish the early editions of Shakespeare's works that have traditionally been called 'bad quartos' (reputed to be texts that were corrupted in various ways) from the editions of the same works that are traditionally known as 'good quartos' and the 'Folio'. We have done this by advanced computational methods described in our original research proposal. We further discovered that we can put authorship and date limits (that is, approximations of when they were written and by whom) on the differences between the quarto and Folio editions of particular Shakespeare works, so that we are able to more confidently describe the processes of authorial and non-authorial revision that separate the early editions. |
Exploitation Route | Anyone who wants to can read reports, download our encoded texts, download our software, run similar tests for themselves using our examples. |
Sectors | Creative Economy Digital/Communication/Information Technologies (including Software) Education Leisure Activities including Sports Recreation and Tourism |
URL | http://see.dmu.ac.uk/blog/admin/static/SEE/389.html |
Description | The societal impact was greater public appreciation of the ways that digital methods increasingly engage artistic practice. Also, researchers (including those in charge of arranging for the training of new researchers in the future) reported that they horizons had been broadened regarding the use of digital methods in areas they had not realized were amenable to such approaches. |
First Year Of Impact | 2018 |
Sector | Digital/Communication/Information Technologies (including Software),Education,Electronics,Leisure Activities, including Sports, Recreation and Tourism |
Impact Types | Cultural Societal Policy & public services |
Description | Broadening the horizons of UK researchers undertaking investigations into text |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
Impact | The sessions in my Travelling Roadshow that were aimed at researchers in UK universities and their research students were reported by attendees to have increased their capacity for undertaking their own research in the computational analysis of texts and to have opened their eyes to potential forms of intellectual enquiry that they did not know were possible. Attendees reported that they would pursue these possibilities within their own institutions, and in particular would seek to convince researchers not currently using computational methods for textual analysis that this kind of work might have something to offer them. That is exactly what the part of the Roadshow aimed at researchers was intended to achieve in order to fulfil its purpose of helping to transform how text-based research is done in this country. |
Description | Talent Development Awards 2021 |
Amount | £7,235 (GBP) |
Funding ID | TDA21\210025 |
Organisation | The British Academy |
Sector | Academic/University |
Country | United Kingdom |
Start | 01/2022 |
End | 12/2022 |
Title | New software for analysing early modern printed plays and providing statistics about their language use |
Description | As part of this project, the Post-Doctoral Research Associate, in collaboration with the PI Prof Egan, wrote software for undertaking certain kinds of statistical analysis of early modern printed plays. This software was written as shell scripts, Python code, and Haskell code and is given away freely on the project website. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | Too soon to say, the project is still ongoing |
URL | http://see.dmu.ac.uk/ |
Title | ShaLT website |
Description | We build a website documenting the entire project and providing open access reports on all our work and the source code for all the software we wrote. |
Type Of Material | Database/Collection of data |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | Others in the field have reported that our source code has been valuable to their investigations |
URL | http://see.dmu.ac.uk |
Title | XML transcriptions of early printed plays by Shakespeare |
Description | As part of this project we have paid graduate students to produce high-quality XML transcriptions of all the important early printed editions of all of Shakespeare's plays, using the Text Encoding Initiative tagset in such a way as to maximize the utility of these transcriptions for the purpose of answering questions that arise in computational stylistics. |
Type Of Material | Database/Collection of data |
Year Produced | 2017 |
Provided To Others? | Yes |
Impact | Too soon for impact: the project is ongoing |
URL | http://see.dmu.ac.uk/ |
Description | Collaboration with a team of three electrical engineers at the University of Pennsylvania |
Organisation | University of Pennsylvania |
Country | United States |
Sector | Academic/University |
PI Contribution | At the UK end of this collaboration, I provided the knowledge and expertise on the early printed editions of Shakespeare and other writers of his time that shaped the investigation from the beginning. That is, where we went looking was decided by my knowledge of the intellectual field, what facts are known with tolerable certainty and what needed to be discovered. As we ran the computational experiments, I provided commentary on the significance of their results and pointed the team's attention in new directions in the light of them. When we came to writing up the experiments for publication (Segarra et al. 2017) I ensured that the mathematical claims meshed with the literary-historical claims to produce a coherent whole. |
Collaborator Contribution | The US end of this collaboration provided the mathematical and computational expertise to run experiments on digital transcriptions of early printed texts by Shakespeare and other writers of his time. They wrote the software code and ran the statistical tests for significance. They wrote the first draft of the write up for publication (Segarra et al. 2017). |
Impact | [1] Segarra, Santiago; Mark Eisen; Gabriel Egan; Alejandro Ribeiro (2016) "Attributing the authorship of the _Henry VI_ plays by word adjacency" Shakespeare Quarterly 67: 232-256. Multidisciplinary: computation + programming + statistics + literary history + book history. [2] Segarra, Santiago; Mark Eisen; Gabriel Egan; Alejandro Ribeiro (2017) Stylometric analysis of Early Modern English plays" Digital Scholarship in the Humanities (advance online access). Multidisciplinary: computation + programming + statistics + literary history + book history. |
Start Year | 2014 |
Description | Collaboration with the Centre for Literary and Linguistic Computing (CLLC) at the University of Newcastle in Australia under the direction of Professor Hugh Craig |
Organisation | University of Newcastle |
Country | Australia |
Sector | Academic/University |
PI Contribution | My team created basic XML transcriptions of early modern printed plays for computational investigation. Using our knowledge of the early modern book history, we judged which plays and which editions to work on first and just how to apply the Text Encoding Initiative (TEI) guidelines in a way that would maximize the usefulness of these transcriptions for our purposes. |
Collaborator Contribution | The Australian team took our minimally encoded XML transcriptions and added value to them by applying regularization of variant spellings and the identification of various parts-of-speech that we would look for in our analyses of these texts. |
Impact | No outputs/outcomes yet: project still ongoing. |
Start Year | 2016 |
Title | A Open Source Python Script for Application of the Word Adjacency Network (WAN) Method of Authorship Attribution |
Description | A Word Adjacency Network (WAN) is a mathematical system, a Markov chain, that represents the proximities of certain words within a text. Given a list of words-of-interest, typically the 100 or so function words that comprise about half of all spoken and written English, and a text to process, the WAN computer algorithm will record in a Markov chain (stored internally as a two-dimensional matrix) the averaged distances at which each of the words-of-interest is found from each of the others. It has been demonstrated that the habit of placing certain function words near to other function words is an authorial characteristic that varies from one writer to another in a distinctive and relatively consistent way and hence that a comparison of WANs derived from different texts can provide evidence for attributing authorship in cases where other evidence is not available. The comparison of a WAN derived from one text, say a piece of writing of disputed authorship, and the WAN derived from another text, say the securely attributed works of a plausible candidate for authorship of the disputed text, is a comparison of differing probability distributions. For this comparison the standard measure from Information Theory known as Kullback-Leibler divergence (more colloquially, relative entropy) may be used. When comparing the relative entropy between WANs from multiple texts, the lowest values tend to be between WANs from texts by the same author. This fact is exploited in application of the WAN method to the problem of authorship attribution: amongst the candidate authors for a disputed or suspect text we seek the one whose authorial canon is least different in this regard from the text to be attributed. . The software provided is written in the language Python (version 3) and takes as its input three ASCII text files: 1) a sample of writing, 2) another sample of writing, and 3) a list of words-of-interest. The software creates two Word Adjacency Networks, each representing the proximities of the words-of-interest for one of the two samples of writing, and then calculates and outputs the relative entropy between the two WANs. |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | I have been contacted by other scholars in the field (Dr Ros Barber, Goldsmiths College, Pervez Rizvi, independent scholar) wanting to use to software and have given them additional advice on doing so. |
URL | http://gabrielegan.com/WAN/ |
Description | A Travelling Roadshow on Computers and Text |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Using the money allocated in this grant, I devised a Travelling Roadshow that toured the UK visiting venues in Oxford, Bath, Liverpool, Leeds, and Glasgow. The topic of the Roadshow was computational analysis of texts of all kinds, and it included specific interactive sessions delivered by me and a colleague, Dr Paul Brown, on such themes as the use of statistics in studying texts, the notion of Relative Entropy, an introduction to programming for textual analysis, an introduction to data-mining using standard office productivity applications, the use of XML, XSLT, and XQuery, and an introduction to how computers work. Each Roadshow visit to each town also included a live theatre performance called "Yes/No/Maybe" of which I was the producer that was commissioned and delivered by the theatre company Zoo Indigo. |
Year(s) Of Engagement Activity | 2018 |
Description | Text Hackathon |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | 85 members of the General Public. aged 13 to 75, came to De Montfort University for a event described thus: << If you are interested in what we can do with computers to extract knowledge from big digital texts, then this event will interest you. How big do we mean? How about all the 19th-century novels? Or all the speeches in the UK Parliament since the Second World War? Or all the printed books published in England between the arrival of printing in 1475 and the year 1800? Or all the 18th-century newspaper reviews of London theatrical performances? Or all the 11,500,000 leaked Panama Papers (= the Mossack Fonseca files)? If there is a big dataset of text to be investigated, this event can show you how to extract knowledge from it. >> |
Year(s) Of Engagement Activity | 2017 |
URL | http://cts.dmu.ac.uk/events/hackathon |
Description | Training the Links |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | The Centre for Textual Studies at De Montfort University has funding from the Arts and Humanities Research Council to take on tour around the UK a Travelling Roadshow on Computational Methods in Textual Studies, during Spring 2018. Each Roadshow event will comprise a series of talks, demos, and hands-on training events for students and staff at a UK university, followed by a public performance in which actors attempt to convey some of the wonderousness of machines being able to store and process human language. Each academic venue hosting a Roadshow has nominated a person, The Link, to liaise between the host and the Centre for Textual Studies. The Links will facilitate the Roadshow coming to their institutions and assist in delivering the sessions of the Roadshow there. Before that, the Links will come to the Centre for Textual Studies at De Montfort University in Leicester to receive a week of specialist training in digital techniques customized to their interests. >> |
Year(s) Of Engagement Activity | 2017 |
URL | http://cts.dmu.ac.uk/events/Roadshow-Links |