World Futures: Multimodal Viewpoint Construction by Russian International Media

Lead Research Organisation: University of Oxford

Department Name: Area Studies

Abstract

Discussions of futures have never been more important and challenging than now when the world finds itself fighting COVID-19. People engage in conversations about what our lives will look like after the pandemic, and so do the media.

The media often talk about futures to frame the way we think, and the media funded by the Russian State use viewpoint construction rooted in the depiction of futures as a subtle but powerful approach to manipulate and influence public opinion.

Viewpoint construction techniques are not just verbal but multimodal: they combine words, prosodic cues such as intonation and timing, gesture, and movement, and other non-verbal elements. They rely heavily on culture, history, and other contextual knowledge, and if used for manipulative purposes, lead the viewer to draw a false conclusion, make an assessment, or come to an opinion that is beneficial to the hostile actor, without being aware that this is happening.

In the true spirit of the digital humanities, our 2-year project will fuse cognitive and corpus-driven/based analyses of language, prosody, and gesture with area studies (regional knowledge: culture, literature, history, and society), while leveraging latest developments in machine learning, natural language processing (NLP) and computer vision (CV) to tackle the problem at scale.

We will combine our multimodal analysis of Russian international media broadcasts (English and Russian) with the analysis of audiences' comments on social media to open a window onto viewpoint construction in the audiences' minds and thus provide additional validity to our linguistic analysis. Our project will break new ground in cognitive and corpus linguistics to significantly increase the overall reliability of large media data analysis currently used in linguistics as well as other humanities and the social sciences.

Our approach will help answer questions posed by academic researchers and Western policymakers in relation to future information threats and mitigations, as applied to Russia as a region of strategic importance. Ultimately, the project will go beyond the applied linguistics approach to disinformation analysis. It will enable researchers to test multimodal patterns found in a systematic way, thereby addressing one of the biggest challenges linguists face.

Furthermore, the project will contribute to answering such big questions in multimodal research in theoretical linguistics (semantics, syntax, pragmatics, comparative) as:

1) How are meanings constructed multimodally and how are multimodal meanings perceived?

2) How is the construction of meanings distributed across gestural, prosodic, and verbal modes? Are the "redundant" modes of gesture and prosody really redundant?

3) How do languages with quite distinct grammatical and phonetic properties, rooted in quite diverse historical and cultural backgrounds, realise the ideas of time and space multimodally while pursuing the same overarching communicative and pragmatic goals?

Funded Value:

£283,394

Funded Period:

Feb 22 - Jan 24

Funder:

AHRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

AH/W010720/1

Principal Investigator:

Anna Wilson

Research Subject:

Area Studies (32%)

Linguistics (48%)

Media (16%)

Research Topic:

Applied Linguistics (16%)

Computational Linguistics (16%)

European Studies (32%)

Media & Communication Studies (16%)

Semantics & Pragmatics (16%)

Organisations

People	ORCID iD
Anna Wilson (Principal Investigator)
Philip Torr (Co-Investigator)
Scott Hale (Co-Investigator)
Peter Uhrig (Co-Investigator)
Elinor Payne (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Dykes, N. (2023) A Pipeline for the Creation of Multimodal Corpora from YouTube Videos

Ghaleb, E. (2024) Co-Speech Gesture Detection through Multi-phase Sequence Labeling

Uhrig, P. (2023) Studying time conceptualisation via speech, prosody, and hand gesture: Interweaving manual and computational methods of analysis

Wilson A (2024) World futures through RT's eyes: multimodal dataset and interdisciplinary methodology in Frontiers in Communication

Wilson A. (2024) World Futures through RT's Eyes: Multimodal Dataset and Methodology in Frontiers in Communication. Section: Multimodality of Communication

Key Findings
Research Tools and Methods
Collaboration
Engagement Activities


Description	At this stage in our project we already have some preliminary findings and results stemming both from our manual annotation of data and subsequent manual analysis. Our manual analysis has been informed by our progress on the automation of analysis of speech and gesture. And our work on automation of the said analysis has been in turn informed by the insights gained through our manual analysis. We have presented some of our findings at the conference of the international society of gesture studies in July 2022, and received valuable feedback. Furthermore we have submitted our findings to date for presentation at three major international conferences in the fields of cognitive linguistics, contrastive linguistics, and gesture studies in July-September 2023. Our project (funded by the AHRC and DFG grants) has been extended until 30 October 2024. The project achievements to date include: 1) the development of a new interdisciplinary (linguistics, computer science, computer vision, engineering, area studies) methodology for speech-gesture analysis in media recordings at speed and scale; 2) the creation of large datasets in English, Russian and Georgian languages composed of media recordings with the main focus on RT (formerly Russia Today); 3) the creation of annotated dataset composed of RT videos in English; 4) the processing (using OpenPose software and automatic face recognition) of our own large datasets as well as NewsScape datasets -- a collection of TV news in several languages from multiple regions -- in order to prepare our training data for machine learning; 5) case studies of multimodal (speech and gesture) future depictions in RT videos and beyond which reveal features and patterns specific for disinformation, propaganda, and manipulation; 6) research dissemination: presentation of our research at international conferences (2022-2024) and through several published conference papers and journal articles; 7) communication of our findings to our non-academic partner (Dstl). Overall, our project has already made a huge progress on the path of automating multimodal analysis of media data enabling our team (and a wider research community once all our project research is published) to analyse speech, gestural and corporal behaviour of speakers at speed and scale making it much easier for researchers to see patterns, including detecting manipulation and disinformation which goes beyond just simple fakes. This is especially important for the analysis of future depictions since manipulation grounded in people's imagining of futures can't be successfully dealt with through simple fact-checking or similar methods.
Exploitation Route	Upon the completion of our project in October 2024 our published findings, our datasets, and the software we develop or customise will allow researchers worldwide to conduct analysis of multimodal big data faster and more efficiently. They will also allow our non-academic partner to conduct analysis of disinformation specifically relying on the depictions of futures at scale and speed.
Sectors	Aerospace Defence and Marine


Title	Multimodal Corpus Pipeline
Description	2022-2023: We developed a pipeline specifically for the type of media data collected within our project, including NLP, gesture detection and biometric clustering. 2023-2024: We further developed a pipeline specifically for the type of media data collected within our project, including NLP, gesture detection, biometric clustering, automatic annotation for co-speech gesture, and incorporation of the latter in ELAN software. We collected Georgian language data for our project and developed a pipeline of computation tools specifically for processing Georgian media data for our project (speech recognition, lemmatisation, incorporation in CQPweb). We have developed a new interdisciplinary methodology for the analysis of multimodal data and have created a new multimodal dataset which incorporates manual and automatic annotations for three modalities -- textual, acoustic, and visual.
Type Of Material	Improvements to research infrastructure
Year Produced	2023
Provided To Others?	Yes
Impact	The new methodology which includes the application of our newly developed pipeline of computational tools significantly reduces time needed for manual annotation hence reducing the resources (labour and money) needed for analysis.
URL	https://www.frontiersin.org/articles/10.3389/fcomm.2024.1356702/abstract


Description	Collaboration with Dr Siddharth, University of Edinburgh
Organisation	University of Edinburgh
Department	School of Informatics Edinburgh
Country	United Kingdom
Sector	Academic/University
PI Contribution	Our team provides expertise on qualitative analysis of gesture in communication as well as insights and annotated data resulted from our manual analysis.
Collaborator Contribution	Dr Siddharth provides expertise on computer vision and machine learning guiding the machine learning engineer hired by our research team on the German side.
Impact	Outcomes: various models for detection of gesture are being developed and evaluated.
Start Year	2022


Description	Collaboration with Esam Ghalib of the Institute for Logic, Language and Computation, University of Amsterdam
Organisation	University of Amsterdam
Country	Netherlands
Sector	Academic/University
PI Contribution	Our project team worked together with the research team led by Esam Ghalib to develop the model for automatic detection of co-speech gesture in videos.
Collaborator Contribution	Our project team worked together with the research team led by Esam Ghalib to develop the machine model for automatic detection of co-speech gesture in videos.
Impact	1) Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Özyürek, A. and Fernández, R., 2024. Co-Speech Gesture Detection through Multi-phase Sequence Labeling. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 4007-4015). 2) Scholarly paper "Temporal Alignment and Integration of Audio-Visual Cues for Co-Speech Gesture Detection" co-authored by Ghaleb E., Burenko I., Rosenberg M., Pour W., Toni I., Uhrig P., Wilson, A., Holler J., Ozyurek, A., Fernandez, R. has been prepared to be submitted to "ACM Transactions on Multimedia Computing, Communications, and Applications. Special Issue on Deep Learning for Robust Human Body Language Understanding" on 15 March 2024.
Start Year	2023


Description	Collaboration with Prof. Kita, University of Warwick
Organisation	University of Warwick
Department	Department of Psychology
Country	United Kingdom
Sector	Academic/University
PI Contribution	Our team's findings have contributed to the development of theory and methods in gesture studies (speech-gesture interaction).
Collaborator Contribution	Professor Sotaro Kita has contributed to discussions on theory and methods developing in gesture studies, including insights stemming from the field of psychology.
Impact	Our discussions have contributed to reformulating an aspect of our research design in line with psycholinguistics theory and methods.
Start Year	2022


Description	Policy Engagement
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	We have been engaging in a constant dialogue with our UK non-academic partner as indicated in our grant application via email, phone conversations and participation in the project event in June 2022.
Year(s) Of Engagement Activity	2022,2023,2024

Abstract

Organisations

People

ORCID iD

Publications