The Emergence of Egophoricity: a diachronic investigation into the marking of the conscious self

Lead Research Organisation: School of Oriental and African Studies
Department Name: East Asian Languages and Cultures

Abstract

This project looks at the way certain Tibetan and Newar varieties express the perspective of the speaker in the sentence. In Lhasa Tibetan, for example, the auxiliary verb 'yin' can be used in sentences where the speaker is the subject (nga em-chi yin '*I'm* a doctor'), if the speaker wants to identify their personal relation or possession ('di nga'i bu-mo yin 'This is *my* daughter') or if the speaker chooses to emphasise who performed an action ('di khyed-rang-gi gsol-ja yin 'This is your tea [that *I* have made for you]'). Other Tibetan varieties, such as Jirel or South Mustang Tibetan also exhibit egophoric markers like Lhasa Tibetan 'yin', but not always in the same contexts. In Newar varieties that are also spoken in Nepal, however, egophoric marking consists of long vowels in verbal endings rather than separate (auxiliary) verbs (ji Manaj napalan-aa 'I (the speaker) met Manoj as planned' vs. ji Manaj napalan-a 'I met Manoj by coincidence'). Finally, in older stages of both Tibetan and Newar varieties, this egophoric marking cannot be found. The central question that this project aims to answer is how and why specific grammatical markers to indicate the speaker's involvement emerge over time in ways that slightly differ, even in closely related languages. What subtle grammatical clues can be found in olders stages of these languages that in later stages result in egophoric marking?

In this project we first investigate how Present-Day Tibetan and Newar varieties grammatically express the speaker's involvement. For this purpose we will create annotated corpora: digital text collections enriched with linguistic information about the structure and meaning of each element in the sentence. Because there is no data available yet for the highly endangered Lalitpur Newar variety, we will conduct fieldwork in Nepal to document the language and collect texts for our corpora. We then add the same linguistic information to historical texts. Older archive texts in South Mustang Tibetan, for example, will be compared to 18-19th texts written in standard Classical Tibetan to investigate the development of the Present-Day Lhasa Tibetan egophoric marker 'byung', which indicates the speaker is the recipient of an action (khong gis ngar yige btang byung 'He sent *me* a letter.'). Present-Day South Mustang Tibetan also has a verb 'byung', which goes back to Old and Classical Tibetan 'byung' meaning 'receive, get'. But unlike Lhasa Tibetan, this verb in South Mustang Tibetan has not changed into an egophoric auxiliary verb. Because of the extensive and consistent linguistic annotation of our corpora, we will be able to systematically study subtle differences in use of verbs like 'byung'. Since our corpora will not only contain morphosyntactic annotation, but information about meaning and function in discourse context as well, we will be in a unique position to investigate complex grammatical phenomena like egophoricity. Investigating this in a historical context gives us the opportunity to test theories of languages change that make predictions about triggers and mechanisms of change in particular. Are language-internal factors (e.g. changes in phonology) responsible for the emergence of egophoric marking, can language-external factors (language contact) play a role and/or can we observe a combination of factors in these languages that have throughout history been spoken by people in close promixity in Nepal?

Finally, since even closely-related Tibetan and Newar varieties exhibit some significant differences, comparison with egophoric marking on other languages can provide further clues on this complex phenomenon. In the final year of the project, we will therefore put our findings from Tibetan and Newar in crosslinguistic perspective.

Publications

10 25 50
 
Title Lalitpur Newar 2022 
Description Dataset of recordings, plus metadata, for the study of the Lalitpur Newar dialect. Collected in 2022 with the consent of participants for its use for research purposes. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Dataset is too new to have produced much impact yet, but it is the first open access corpus of spoken Newar. 
URL https://zenodo.org/record/7501051
 
Title OCR model for Pracalit for Sanskrit and Newar MSS 16th to 19th C., Ground Truth 
Description Ground truth data (png and xml files) for a an OCR model. Will be continually updated. Originally trained on Transkribus with a PyLaia model created from ground truth data based on transcripts into Pracalit Unicode of four Nepalese manuscripts. The manuscripts used to create this model are Staatsbibliothek zu Berlin's Hitopadesa (MIK I 4851) (mixed Newar and Sanskrit dating to 1561) and Vetalapañcavi?sati (HS. Or. 6414) (Newar dating to 1675) as well as Cambridge Digital Library's Avalokitesvaragu?akara??avyuha (MS Add. 1322) (Sanskrit, 18th century) and the Royal Asiatic Society Online Collection's Madhyamasvaya?bhupura?a (RAS Hodgson MS 23) (Newar and Sanskrit dating to c. 1800). The training was done on 441 pages and validation on 242 pages. This model does not recognise spacing, except for large gaps (i.e. for pictures or string holes). Newar word divider markers may not be represented or may be transcribed as virama. In general, the model is made for MSS with scriptio continua and will transcribe into scriptio continua into Pracalit Unicode. Transcription was performed by Dr Alexander O'Neill (SOAS University of London). Transcription of the Vetalapañcavi?sati (HS. Or. 6414) and Madhyamasvaya?bhupura?a (RAS Hodgson MS 23) was aided by unpublished materials provided by Dr Felix Otter (Philipps-Universität Marburg), as well as the published transcription in Shakya, Min Bahadur, and Shanta Harsha Bajracharya, eds. "Svayambhu Pura?a." Lalitpur: Nagarjuna Institute of Exact Methods, 2001. The transcription of Avalokitesvaragu?akara??avyuha (MS Add. 1322) was aided by the transcription provided by the Digital Sanskrit Buddhist Canon Project based on Lokesh Chandra, "Gu?akara??avyuhasutram," New Delhi: International Academy of Indian Culture, 1999. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact This dataset has made HTR for Newar Pracalit possible for the first time. 
URL https://zenodo.org/record/6967421