Generative Modelling for Sequenatial Human Behaviour

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

The goal of this research is to analyze and model sequential human behaviour such as speech,
facial animation and gestures. The first step towards achieving this is to identify the factors that
contribute and guide human behaviour. Generative models can reveal hidden structures in the
data which are often referred to as the latent representation of the data. As part of my research
I want to focus on Finding ways to disentangle these latent variables in order to gain control over
different aspects the generated sequences.
I would like to explore recent generative methods and adapt them to handle sequential data.
One such model is generative adversarial networks, which uses a discriminating network to drive the
learning of a generating network. This approach has been very successful for generating static data
and its extension to sequences is a very active area of research. Another advantage of exploring
this approach is that it allows the use of multiple discriminator networks that are capable of
simultaneously capturing various aspects of real human behavioural data.
Another goal of this research is to understand the relationship between the signals that make up
human behaviour because these signals are often linked. A good example of this is the correlation
between human speech and facial animation. I plan to exploit this relationship to perform speech-
driven animation which will greatly reduce the cost of computer generated imagery (CGI).
Additionally, I believe that it is important to research methods that model the changes in signals
rather than the signals themselves. Such models may be better suited to capture the dynamics of
most natural systems, hence I would like to research new network architectures that make these
approaches possible.
Finally as we seek to constantly improve the realism of generated human behavioural data it
is also important to find ways to distinguish generated signals from real ones. Generating content
that is very realistic can have great security implications (i.e. identity theft) and therefore I would
also like to explore ways of distinguishing real and generated human behavioural data.

This research is in line with the goals of EPSRC in the fields of computer graphics and human-
computer interaction since it will enable the fast and efficient generation of animated characters.

Student:

Konstantinos Vougioukas

Period of Study:

Sep 18 - Mar 22

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

2130174

Research Topic:

Unclassified

Organisations

Imperial College London (Lead Research Organisation)

People	ORCID iD
Konstantinos Vougioukas (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Kefalas T (2020) Speech-Driven Facial Animation Using Polynomial Fusion of Features

Shukla A (2020) Visually Guided Self Supervised Learning of Speech Representations

Vougioukas K (2019) Video-Driven Speech Reconstruction Using Generative Adversarial Networks

Vougioukas K (2019) Realistic Speech-Driven Facial Animation with GANs in International Journal of Computer Vision

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/R513052/1			30/09/2018	29/09/2023
2130174	Studentship	EP/R513052/1	30/09/2018	30/03/2022	Konstantinos Vougioukas

Key Findings


Description	The work funded by this award focuses on finding methods for generating realistic human behavioural signals. Our research has shown that it is possible to synthesize facial animation videos and human speech that are often perceived as real by people. Specifically, we have focused on developing models that can translate signals from one modality to another which have many practical applications in fields such as telecommunications. In particular, for speech-driven animation, the generated videos of talking heads we produce are characterised by a high level of realism and even exhibit spontaneous expressions such as blinks. Furthermore, we have shown that intermediate representations obtained by our translation models can be used to improve the performance of emotion or speech recognition systems, without the need for large amounts of annotated training data.
Exploitation Route	The models produced by this research can be used for efficient content generation. They can also be used to create additional data for the training of audiovisual automated recognition systems. As this technology matures it will likely be adopted by the entertainment industry for automated content generation and by telecommunications to improve the transmission of human behavioural signals (i.e. speech, video). These types of generative models can be used to produce features without requiring annotations that will prove useful for a plethora of discriminative tasks (e.g. visual speech recognition, emotion recognition)
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://sites.google.com/view/facial-animation

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects