📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Controlled content creation with large-scale generative model

Lead Research Organisation: University of Oxford

Abstract

Brief description of the context of the research including potential
impact Recent generative models [1, 2, 3] have been successful in generating
high quality visual content that follows a user prompt. However, prompts can
be ambiguous and it is necessary to improve the controllability of generation
so that it can be used for real-life applications. Real content can provide fine grained guidance, which has been used in controlling the motion of objects with
video motion transfer [4, 5]. Controllable architectures including the playable
video line of work [6, 7, 8] also enables interactively updating a scene by applying actions to subjects. Improving these methods would enable high quality
controlled content in multiple modalities (video, 3D, audio etc.). The proposed
research could impact the creative process of virtual scenes, potentially lowering
costs for generating robotic simulations and visual content by graphics artists.
Aims and Objectives The aim is to develop novel generative methods to
control visual scenes including 2D/3D objects that change pose, shape and
appearance or interact with the environment. This takes advantage of priors
learned by large-scale models to generate tailored content prompted through
text, images, sketches, audio, video or 'learned' actions in a playable setting.
Novelty of the research methodology Controlling generative models has
been a long-standing challenge as it aims to render the content usable in practice.
Many techniques have been developed to guide generations [9, 10], however they
lack understanding of multiple modalities and fine-grained control. Additional
training through finetuning limits their applicability to a narrow domain of generations. Existing motion transfer work also have difficulty extracting faithful
disentangled motion information from real content. Ideally, the full capability
of pre-trained models can be preserved while being controlled.
Alignment to EPSRC's strategies and research areas (which EPSRC
research area the project relates to) Further information on the areas
can be found on http://www.epsrc.ac.uk/research/ourportfolio/researchareas/
This project relates to the research area of Artificial Intelligence technologies
(https://www.ukri.org/what-we-do/browse-our-areas-of-investment-and-support/artificial intelligence-technologies/), specifically targeting improving the autonomous and
creative abilities of computer vision models as a tool for humans.
Any companies or collaborators involved Funded by Snap studentship
and co-supervised by Fabio Pizzati (University of Oxford and MBZUAI) and
Aliaksandr Siarohin (Snap).

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/S024050/1 30/09/2019 30/03/2028
2868700 Studentship EP/S024050/1 30/09/2023 29/09/2027 Alex Pondaven