Logic in semantic universals

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Philosophy Psychology & Language


Despite the immense diversity of human languages in the world, linguists have discovered robust common properties shared across languages. Since such cross-linguistic universals provide us with a window into the core cognitive basis of the linguistic ability we possess as a species, understanding their nature is a fundamental goal in the scientific study of human language. Research on cross-linguistic universals has been especially fruitful in the study of word meanings. For example, researchers have discovered common patterns in how languages organise colour terms (e.g. if a language has only three colour terms, they tend to be black, white, and red). An important open question concerns the cognitive basis of these universal patterns, i.e., why these universals exist and how they are rooted in the core properties of our ability to use language.

There is a gap in the current research that hinders us from answering this question. While the cognitive basis for the universal patterns in the meanings of content words (such as colour and kinship terms) has been thoroughly investigated, there has been scant research into the cognitive basis of the universals in the meanings of logical words. The situation is pressing because logical words provide the scaffolding for productivity, the central design feature of human language that sets it apart from the communication systems of other species; with the help of logical words like "and", we are able to produce an infinite number of sentences (e.g., we can recursively embed a sentence within a sentence "S and [S and [...]]"). Although linguists have observed many universal properties of the meanings of logical words (e.g. "No language has a single word that corresponds to 'not and'"), why the universals hold has yet to be thoroughly investigated.

Recent theoretical and methodological developments in linguistics and cognitive science finally enable us to fill this gap. Theoretically, advances in formal semantics now make it possible to analyse the meanings of various logical words in a unified framework. Methodologically, new research has shown that distinct aspects of linguistic communication can be parcelled out and investigated systematically in the lab, using a set of experimental paradigms involving learning and transmission of artificial languages. These methods have only just begun to be used to explore word meanings. In this project, I make use of these recent advances to resolve the research question above.

The project involves the following core components, thus integrating insights and methodologies from linguistics, logic, and cognitive science:
-Through cross-linguistic investigation, we will empirically evaluate semantic universals in logical vocabulary hypothesised in the literature. Moreover, we will aim to discover new semantic universals, focusing on understudied areas, such as modal words like "may" and "must" and mental-attitude verbs like "believe" and "be happy".
-We construct and explore semantic/pragmatic theories of the universals and evaluate whether the theories provide explanations for the universal in terms of known core aspects of our linguistic communication systems.
-We test the theories based on behavioural experiments that allows us to measure distinct aspects of linguistic communication systems.

Through the research, we will uncover the cognitive basis of the cross-linguistic universals in logical word meanings. The dataset constructed through the cross-linguistic investigation will be used to develop a set of educational materials/activities that help students in different levels learn transferrable logical reasoning skills through identifying the meanings of logical words (e.g. "and", "or") in English and in foreign languages. Also, the dataset of logical inference patterns will be used to test and train AI systems that aim to automatically extract inferences from texts in multiple languages.
Description There are two key outcomes from the award at the current stage. One is the construction of a dataset of modal expressions in a typologically diverse sample of 24 languages (Akan, Basque, Cantonese, Dutch, Greek, Hausa, Hebrew, Hindi, Hungarian, Igbo, Japanese, Khmer, Kiitharaka, Mandarin, Persian, Russian, Spanish, Telugu, Turkish, Vietnamese, Mapudungun, Korean, Tagalog and Thai). The database contains information about the force and flavours of modal expressions and allows researchers to assess hypotheses about potential cross-linguistic generalisations and variations in this domain. The other outcome is the discovery of a new potential cross-linguistic generalisation in the domain of negative modality. The generalisation states that, if a language lexicalises an impossibility modal, then the language lexicalises *deontic* impossibility, i.e., an item that expresses impossibility in terms of rules and obligations. This project currently investigates potential theoretical explanations of this generalisation.
Exploitation Route In addition to scientific purposes, the dataset constructed will be beneficial for foreign-language education as it illustrates the exact difference between corresponding logical vocabularies across languages. The data will also be beneficial for the education of logical-reasoning skills since they provide natural language examples for discussing logical reasoning in the classroom.

The data concerning the semantic properties of logical words gathered in the project will include a large set of entailment patterns licensed by a variety of logical words in a diverse set of languages. Such patterns are essential for the development of computational systems for natural language inference (NLI), which has applications in Question-Answering, Information Extraction, and Machine Translation. The dataset generated by the project differs from existing datasets in at least two respects. First, the current dataset will involve various inference patterns arising from the fine-grained lexical semantics of clause-embedding predicates. Second, it involves a diverse set of languages, making it useful for systems dealing with cross-linguistic tasks. The societal relevance of the quality assurance of cross-linguistic NLI systems is significant.
Sectors Digital/Communication/Information Technologies (including Software),Education