World Futures: Multimodal Viewpoint Construction by Russian International Media

Lead Research Organisation: University of Oxford
Department Name: Area Studies

Abstract

Discussions of futures have never been more important and challenging than now when the world finds itself fighting COVID-19. People engage in conversations about what our lives will look like after the pandemic, and so do the media.

The media often talk about futures to frame the way we think, and the media funded by the Russian State use viewpoint construction rooted in the depiction of futures as a subtle but powerful approach to manipulate and influence public opinion.

Viewpoint construction techniques are not just verbal but multimodal: they combine words, prosodic cues such as intonation and timing, gesture, and movement, and other non-verbal elements. They rely heavily on culture, history, and other contextual knowledge, and if used for manipulative purposes, lead the viewer to draw a false conclusion, make an assessment, or come to an opinion that is beneficial to the hostile actor, without being aware that this is happening.

In the true spirit of the digital humanities, our 2-year project will fuse cognitive and corpus-driven/based analyses of language, prosody, and gesture with area studies (regional knowledge: culture, literature, history, and society), while leveraging latest developments in machine learning, natural language processing (NLP) and computer vision (CV) to tackle the problem at scale.

We will combine our multimodal analysis of Russian international media broadcasts (English and Russian) with the analysis of audiences' comments on social media to open a window onto viewpoint construction in the audiences' minds and thus provide additional validity to our linguistic analysis. Our project will break new ground in cognitive and corpus linguistics to significantly increase the overall reliability of large media data analysis currently used in linguistics as well as other humanities and the social sciences.

Our approach will help answer questions posed by academic researchers and Western policymakers in relation to future information threats and mitigations, as applied to Russia as a region of strategic importance. Ultimately, the project will go beyond the applied linguistics approach to disinformation analysis. It will enable researchers to test multimodal patterns found in a systematic way, thereby addressing one of the biggest challenges linguists face.

Furthermore, the project will contribute to answering such big questions in multimodal research in theoretical linguistics (semantics, syntax, pragmatics, comparative) as:

1) How are meanings constructed multimodally and how are multimodal meanings perceived?

2) How is the construction of meanings distributed across gestural, prosodic, and verbal modes? Are the "redundant" modes of gesture and prosody really redundant?

3) How do languages with quite distinct grammatical and phonetic properties, rooted in quite diverse historical and cultural backgrounds, realise the ideas of time and space multimodally while pursuing the same overarching communicative and pragmatic goals?
 
Description At this stage in our project we already have some preliminary findings and results stemming both from our manual annotation of data and subsequent manual analysis. Our manual analysis has been informed by our progress on the automation of analysis of speech and gesture. And our work on automation of the said analysis has been in turn informed by the insights gained through our manual analysis. We have presented some of our findings at the conference of the international society of gesture studies in July 2022, and received valuable feedback. Furthermore we have submitted our findings to date for presentation at three major international conferences in the fields of cognitive linguistics, contrastive linguistics, and gesture studies in July-September 2023.

Our project (funded by the AHRC and DFG grants) has been extended until 30 October 2024. The project achievements to date include: 1) the development of a new interdisciplinary (linguistics, computer science, computer vision, engineering, area studies) methodology for speech-gesture analysis in media recordings at speed and scale; 2) the creation of large datasets in English, Russian and Georgian languages composed of media recordings with the main focus on RT (formerly Russia Today); 3) the creation of annotated dataset composed of RT videos in English; 4) the processing (using OpenPose software and automatic face recognition) of our own large datasets as well as NewsScape datasets -- a collection of TV news in several languages from multiple regions -- in order to prepare our training data for machine learning; 5) case studies of multimodal (speech and gesture) future depictions in RT videos and beyond which reveal features and patterns specific for disinformation, propaganda, and manipulation; 6) research dissemination: presentation of our research at international conferences (2022-2024) and through several published conference papers and journal articles; 7) communication of our findings to our non-academic partner (Dstl). Overall, our project has already made a huge progress on the path of automating multimodal analysis of media data enabling our team (and a wider research community once all our project research is published) to analyse speech, gestural and corporal behaviour of speakers at speed and scale making it much easier for researchers to see patterns, including detecting manipulation and disinformation which goes beyond just simple fakes. This is especially important for the analysis of future depictions since manipulation grounded in people's imagining of futures can't be successfully dealt with through simple fact-checking or similar methods.
Exploitation Route Upon the completion of our project in October 2024 our published findings, our datasets, and the software we develop or customise will allow researchers worldwide to conduct analysis of multimodal big data faster and more efficiently. They will also allow our non-academic partner to conduct analysis of disinformation specifically relying on the depictions of futures at scale and speed.
Sectors Aerospace

Defence and Marine

 
Title Multimodal Corpus Pipeline 
Description 2022-2023: We developed a pipeline specifically for the type of media data collected within our project, including NLP, gesture detection and biometric clustering. 2023-2024: We further developed a pipeline specifically for the type of media data collected within our project, including NLP, gesture detection, biometric clustering, automatic annotation for co-speech gesture, and incorporation of the latter in ELAN software. We collected Georgian language data for our project and developed a pipeline of computation tools specifically for processing Georgian media data for our project (speech recognition, lemmatisation, incorporation in CQPweb). We have developed a new interdisciplinary methodology for the analysis of multimodal data and have created a new multimodal dataset which incorporates manual and automatic annotations for three modalities -- textual, acoustic, and visual. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact The new methodology which includes the application of our newly developed pipeline of computational tools significantly reduces time needed for manual annotation hence reducing the resources (labour and money) needed for analysis. 
URL https://www.frontiersin.org/articles/10.3389/fcomm.2024.1356702/abstract
 
Description Collaboration with Dr Siddharth, University of Edinburgh 
Organisation University of Edinburgh
Department School of Informatics Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Our team provides expertise on qualitative analysis of gesture in communication as well as insights and annotated data resulted from our manual analysis.
Collaborator Contribution Dr Siddharth provides expertise on computer vision and machine learning guiding the machine learning engineer hired by our research team on the German side.
Impact Outcomes: various models for detection of gesture are being developed and evaluated.
Start Year 2022
 
Description Collaboration with Esam Ghalib of the Institute for Logic, Language and Computation, University of Amsterdam 
Organisation University of Amsterdam
Country Netherlands 
Sector Academic/University 
PI Contribution Our project team worked together with the research team led by Esam Ghalib to develop the model for automatic detection of co-speech gesture in videos.
Collaborator Contribution Our project team worked together with the research team led by Esam Ghalib to develop the machine model for automatic detection of co-speech gesture in videos.
Impact 1) Ghaleb, E., Burenko, I., Rasenberg, M., Pouw, W., Uhrig, P., Holler, J., Toni, I., Özyürek, A. and Fernández, R., 2024. Co-Speech Gesture Detection through Multi-phase Sequence Labeling. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 4007-4015). 2) Scholarly paper "Temporal Alignment and Integration of Audio-Visual Cues for Co-Speech Gesture Detection" co-authored by Ghaleb E., Burenko I., Rosenberg M., Pour W., Toni I., Uhrig P., Wilson, A., Holler J., Ozyurek, A., Fernandez, R. has been prepared to be submitted to "ACM Transactions on Multimedia Computing, Communications, and Applications. Special Issue on Deep Learning for Robust Human Body Language Understanding" on 15 March 2024.
Start Year 2023
 
Description Collaboration with Prof. Kita, University of Warwick 
Organisation University of Warwick
Department Department of Psychology
Country United Kingdom 
Sector Academic/University 
PI Contribution Our team's findings have contributed to the development of theory and methods in gesture studies (speech-gesture interaction).
Collaborator Contribution Professor Sotaro Kita has contributed to discussions on theory and methods developing in gesture studies, including insights stemming from the field of psychology.
Impact Our discussions have contributed to reformulating an aspect of our research design in line with psycholinguistics theory and methods.
Start Year 2022
 
Description Policy Engagement 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact We have been engaging in a constant dialogue with our UK non-academic partner as indicated in our grant application via email, phone conversations and participation in the project event in June 2022.
Year(s) Of Engagement Activity 2022,2023,2024