Visual AI: An Open World Interpretable Visual Transformer

Lead Research Organisation: University of Oxford

Department Name: Engineering Science

Abstract

With the advent of deep learning and the availability of big data, it is now possible to train machine learning algorithms for a multitude of visual tasks, such as tagging personal image collections in the cloud, recognizing faces, and 3D shape scanning with phones. However, each of these tasks currently requires training a neural network on a very large image dataset specifically collected and labelled for that task. The resulting networks are good experts for the target task, but they only understand the 'closed world' experienced during training and can 'say' nothing useful about other content, nor can they be applied to other tasks without retraining, nor do they have an ability to explain their decisions or to recognise their limitations. Furthermore, current visual algorithms are usually 'single modal', they 'close their ears' to the other modalities (audio, text) that may be readily available.

The core objective of the Programme is to develop the next generation of audio-visual algorithms that does not have these limitations. We will carry out fundamental research to develop a Visual Transformer capable of visual analysis with the flexibility and interpretability of a human visual system, and aided by the other 'senses' - audio and text. It will be able to continually learn from raw data streams without requiring the traditional 'strong supervision' of a new dataset for each new task, and deliver and distill semantic and geometric information over a multitude of data types (for example, videos with audio, very large scale image and video datasets, and medical images with text records).

The Visual Transformer will be a key component of next generation AI, able to address multiple downstream audio-visual tasks, significantly superseding the current limitations of computer vision systems, and enabling new and far reaching applications.

A second objective addresses transfer and translation. We seek impact in a variety of other academic disciplines and industry which today greatly under-utilise the power of the latest computer vision ideas. We will target these disciplines to enable them to leapfrog the divide between what they use (or do not use) today which is dominated by manual review and highly interactive analysis frame-by-frame, to a new era where automated visual analytics of very large datasets becomes the norm. In short, our goal is to ensure that the newly developed methods are used by industry and academic researchers in other areas, and turned into products for societal and economic benefit. To this end open source software, datasets, and demonstrators will be disseminated on the project website.

The ubiquity of digital images and videos means that every UK citizen may potentially benefit from the Programme research in different ways. One example is smart audio-visual glasses, that can pay attention to a person talking by using their lip movements to mask out other ambient sounds. A second is an app that can answer visual questions (or retrieve matches) for text-queries over large scale audio-visual collections, such as a person's entire personal videos. A third is AI-guided medical screening, that can aid a minimally trained healthcare professional to perform medical scans.

Planned Impact

The proposed programme encompasses new methodology and applied research in computer vision and other modalities (audio, text) that will enable analysis and search of image and video content while learning new things, with human-like flexibility and interpretability. These capabilities will encourage end user take up of computer vision technologies and commercial interest in embedding these technologies in products.

The Programme will have Economic and Societal impact by
1. Enabling UK industry to leverage AI in their activities with a key strategic advantage.
2. Developing new and improved computer vision technologies that will require substantially less training data to solve problems and is thus suitable for commercialisation by a wide range of companies.
3. Enhancing the visual and audio capabilities and knowledge base of UK industries, including small ones.
4. Enhancing quality of life by improving, for instance, healthcare capabilities, surveillance, environmental monitoring, and the means of accessing and enjoying personal digital media.
5. Reducing the cost and risk of collecting manual annotations for deploying AI technology, especially for sensitive data such as medical records.
6. Collaborating directly with companies and organizations that we have already identified, and will work with over the course of the Programme.
7. Training the next generation of computer vision researchers who will be equipped to support the imaging needs of science, technology and wider society for the future.

Impact on Knowledge includes
1. Realisation of new approaches to essential computer vision technology, and the dissemination of research findings through publications, conference presentations, summer school teaching, and the distribution of open source software and image databases.
2. Sharing knowledge with industrial collaborators via Transfer and Application Projects (TAPs) and other activities leading to adoption of advanced computer vision methods across many disciplines of science, engineering and medicine that currently do not use them.
3. Communication of advances to a public audience through website articles, Show and Tell events, social and broadcast media, and other co-ordinated public understanding activities

Funded Value:

£5,912,096

Funded Period:

Dec 20 - Nov 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/T028572/1

Principal Investigator:

Andrew Zisserman

Research Subject:

Info. & commun. Technol. (95%)

Linguistics (5%)

Research Topic:

Artificial Intelligence (25%)

Computational Linguistics (5%)

Image & Vision Computing (70%)

Organisations

People	ORCID iD
Andrew Zisserman (Principal Investigator)
Alison Noble (Co-Investigator)
Andrea Vedaldi (Co-Investigator)
Dima Damen (Co-Investigator)
Hakan Bilen (Co-Investigator)	http://orcid.org/0000-0002-6947-6918

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 > >|

10 25 50

Anciukevicius T (2023) RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and Generation

Asano, Y (2021) PASS: An ImageNet replacement for self-supervised pretraining without human

Bain M (2021) Frozen in time: A joint video and image encoder for end-to-end retrieval

Bain M (2021) Automated audiovisual behavior recognition in wild primates in Science Advances

Bain M (2021) Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval

Bo Zhao (2021) Dataset Condensation with Differentiable Siamese Augmentation

Brown A (2021) Face, Body, Voice: Video Person-Clustering with Multiple Modalities

Brown A (2021) Automated Video Labelling: Identifying Faces by Corroborative Evidence

C Liu (2022) CounTR: Transformer-based Generalised Visual Counting

Chen H (2021) Localizing Visual Sounds the Hard Way

Chen H; (2021) Audio-visual synchronisation in the wild

Choudhury S (2021) The curious layperson: fine-grained image recognition without expert labels

Croitoru I (2021) TeachText: CrossModal Generalized Distillation for Text-Video Retrieval

Dutta A (2021) Visual Analysis of Chapbooks Printed in Scotland

G Zhan (2022) A Tri-Layer Plugin to Improve Occluded Detection

Ge C (2021) Revitalizing CNN Attentions via Transformers in Self-Supervised Visual Representation Learning

Han T (2022) Temporal Alignment Networks for Long-term Video

Härenstam-Nielsen L (2023) Semidefinite Relaxations for Robust Multiview Triangulation

J Xie (2022) Segmenting Moving Objects via an Object-Centric Layered Representation

Jiao J (2021) Quantised Transforming Auto-Encoders: Achieving Equivariance to Arbitrary Transformations in Deep Networks

Karazija L (2021) ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation

Kaul P (2022) Label, Verify, Correct: A Simple Few Shot Object Detection Method

Kazakos E (2021) With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

Kazakos E (2021) Slow-Fast Auditory Streams for Audio Recognition

Policy Influence
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	Royal Society National Academies Data Reform Round Table Consultation
Geographic Reach	National
Policy Influence Type	Participation in a guidance/advisory committee


Description	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group, Chair
Geographic Reach	Multiple continents/international
Policy Influence Type	Participation in a guidance/advisory committee
Impact	Quoting the aims from the report "We have three objectives for this report. Our first objective is that the use cases inspire those collecting and using data to consider the potential benefits of PETs for their own work, or in new collaborations with others. Second, for the evidence we present on barriers to adoption and standardisation to help inform policy decisions to encourage a marketplace for PETs. Finally, through our recommendations, we hope the UK will maximise the opportunity to be a global leader in PETs - both for data security and collaborative analysis - alongside emerging, coordinated efforts to implement PETs in other countries."
URL	https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/From-Privacy-to-Part...


Description	Royal Society Privacy Enhancing Technologies Working Group - policy report published (Chair)
Geographic Reach	National
Policy Influence Type	Participation in a guidance/advisory committee
Impact	The report has contributed to wider discussion of data sharing between government departments and a number of the recommendations have been followed up. It is well cited. A follow-on project is underway with the Alan Turing Institute which will report in 2022. The important message was to show that PETs are maturing as a technology and can be considered enablers to provided trusted sharing of data and to move the conversation away from security and accepting zero risk in sharing data. The work is relevant to not only may research area (health data science) but many other sectors which are data-driven.
URL	https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/privacy-enhancing-te...


Description	Envisioning Dante c.1472- c.1630
Amount	£805,620 (GBP)
Funding ID	AH/W005220/1
Organisation	Arts & Humanities Research Council (AHRC)
Sector	Public
Country	United Kingdom
Start	09/2022
End	09/2025


Description	Royal Society Research Professorship Enhanced research Expenses
Amount	£100,000 (GBP)
Funding ID	RF\ERE\210331
Organisation	The Royal Society
Sector	Charity/Non Profit
Country	United Kingdom
Start	10/2021
End	03/2024


Description	Studentship
Amount	£154,725 (GBP)
Organisation	Facebook
Sector	Private
Country	United States
Start	10/2021
End	09/2025


Description	Toshiba 2021
Amount	$200,000 (USD)
Organisation	Toshiba
Sector	Private
Country	Japan
Start	07/2021
End	03/2023


Title	EPIC-KITCHENS VISOR
Description	We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	The dataset can be used for audio event detection and the baseline code will be made publicly available.
URL	https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7/


Title	Image Change dataset
Description	Propose a scalable methodology for obtaining a large-scale change detection training dataset by leveraging existing object segmentation benchmarks. Introduce a co-attention based novel architecture that is able to implicitly determine correspondences between an image pair and find changes in the form of bounding box predictions. Contribute four evaluation datasets that cover a variety of domains and transformations, including synthetic image changes, real surveillance images of a 3D scene, and synthetic 3D scenes with camera motion. Evaluate our model on these four datasets and demonstrate zero-shot and beyond training transformation generalization.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023. Future impact to be determined.
URL	https://arxiv.org/pdf/2209.14341.pdf


Title	Localizing Visual Sounds the Hard Way
Description	The objective of this work is to localize sound sources that are visible in a video without using manual annotations. Our key technical contribution is to show that, by training the network to explicitly discriminate challenging image fragments, even for images that do contain the object emitting the sound, we can significantly boost the localization performance.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	Localizing Visual Sounds the Hard Way Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman CVPR, 2021
URL	https://www.robots.ox.ac.uk/~vgg/research/lvs/


Title	PASS: An ImageNet replacement for self-supervised pretraining without humans
Description	PASS is a large-scale image dataset that does not include any humans and which can be used for high-quality pretraining while significantly reducing privacy concerns.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	YM. Asano, C. Rupprecht, A. Zisserman, A. Vedaldi PASS: An ImageNet replacement for self-supervised pretraining without humans NeurIPS Dataset Track, 2021
URL	https://www.robots.ox.ac.uk/~vgg/data/pass/


Title	Semantic Shift Benchmark
Description	Demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. Following the success of modern deep learning systems on closed-set visual recognition tasks, a natural next challenge is open-set recognition (OSR) (Scheirer et al., 2013). In the closed-set setting, a model is tasked with recognizing a set of categories that remain the same during both training and testing phases.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	Future impact to be determined
URL	https://www.robots.ox.ac.uk/~vgg/research/osr/#ssb_suite


Title	Video Person-Clustering Dataset A multi-modal TV-shows and movies dataset
Description	VPCD contains multi-modal annotations (face, body and voice) for all primary and secondary characters from a range of diverse TV-shows and movies.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	A. Brown, V. Kalogeiton, A. Zisserman Face, Body, Voice: Video Person-Clustering with Multiple Modalities
URL	https://www.robots.ox.ac.uk/~vgg/data/Video_Person_Clustering//


Title	Video-text Alignment HTM-Align dataset
Description	The objective is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	Future impacts to be determined
URL	https://www.robots.ox.ac.uk/~vgg/research/tan/


Description	NLS Chapbooks
Organisation	National Library of Scotland
Country	United Kingdom
Sector	Academic/University
PI Contribution	We used our software to search and analysed the illustrations of the chapbooks.
Collaborator Contribution	Partner provided chapbooks in large quatities.
Impact	https://www.robots.ox.ac.uk/~vgg/research/chapbooks/
Start Year	2020


Description	National Consortium of Intelligent Medical Imaging
Organisation	National Consortium of Intelligent Medical Imaging
Sector	Academic/University
PI Contribution	A VisualAI postdoc (Jianbo Jiao) is providing expertise for image-based building deep learning models to assess COVID19 deterioration for hospital-based patients.
Collaborator Contribution	NCIMI is providing access to COVID19 data for a TAP project.
Impact	An initial evaluation of predictive modelling was performed using available covid-19 data. However due to the small size of the data, and the fact that covid treatments for patients have significantly improved and better pathways for patients are in place it was deemed not worth pursuing this work further beyond the preliminary study. A report was written but has not been published.
Start Year	2021


Description	TAP VAI-02 1516 Project
Organisation	University of Copenhagen
Country	Denmark
Sector	Academic/University
PI Contribution	We created a visual search engine using images and metadata supplied by Matilde Malaspina at University of Copenhagen and Barbara Tramelli from University of Venice.
Collaborator Contribution	Partner provided images and metadata.
Impact	A talk at Venice Centre for Digital and Public Humanities (VeDPH) on 9th Dec. 2020
Start Year	2020


Description	TAP-VAI-03 16cIllustration Project
Organisation	Ca' Foscari University of Venice
Country	Italy
Sector	Academic/University
PI Contribution	We created a visually searchable database (https://www.robots.ox.ac.uk/~vgg/research/16ci/lyon/) of 16th century illustrations printed in Lyon.
Collaborator Contribution	Partner provided images and metadata.
Impact	The researchers at Venice Centre for Digital and Public Humanities are using this visual search engine as a research support tool.
Start Year	2021


Description	TAP-VAI-04 Frank-Scholten Archive
Organisation	Leiden University
Country	Netherlands
Sector	Academic/University
PI Contribution	Using our VISE software, we found a match between all the photographs and their corresponding negative in the Frank-Scholten image archive.
Collaborator Contribution	They provide Dataset containing photographs and negatives captured by Frank-Scholten.
Impact	tbc
Start Year	2021


Description	TAP-VAI-08 Fish Pool Trajectory
Organisation	University of Oxford
Department	Department of Zoology
Country	United Kingdom
Sector	Academic/University
PI Contribution	We are developing tools and workflow to detect and track a Picasso triggerfish moving in a fish tank to find the food target.
Collaborator Contribution	They provide videos dataset showing Picasso triggerfish in a fish pool.
Impact	tbc
Start Year	2021


Description	TAP-VAI-09 Fish Tank Obstacles
Organisation	University of Oxford
Department	Department of Zoology
Country	United Kingdom
Sector	Academic/University
PI Contribution	We are developing tools and workflow to detect and track Picasso triggerfish navigating through obstacles to reach a food target in a fish tank.
Collaborator Contribution	They provide videos dataset showing Picasso triggerfish in a fish tank containing obstacles.
Impact	tbc
Start Year	2021


Description	TAP-VAI-10 Czech National Library/ Czech Academy of Sciences
Organisation	National Library of the Czech Republic
Country	Czech Republic
Sector	Public
PI Contribution	We are providing technical support to the RKD team for implementing our VISE software in their platform.
Collaborator Contribution	They are using our software tool (VISE)
Impact	tbc
Start Year	2021


Description	TAP-VAI-11 RKD
Organisation	Netherlands Institute for Art History
Country	Netherlands
Sector	Public
PI Contribution	We are providing technical support to the RKD for implementing visual image search feature in the public facing web portal and internal research using our VGG Image Search Engine (VISE) software (https://www.robots.ox.ac.uk/~vgg/software/vise/).
Collaborator Contribution	The RKD Provide images in millions and they are now using our VISE software for visual search functionality.
Impact	Not yet.
Start Year	2021


Title	Audio-visual synchronisation
Description	The software enables: Audio-visual synchronisation. Requires a model to relate changes in the visual and audio streams. Prior work focused primarily on the synchronisation of talking head videos. In contrast, open-domain videos often have a small visual indication, i.e. sparse in space.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Paper in British Machine Vision Conference (BMVC), 2022. Future impacts to be determined.
URL	https://iashin.ai/SparseSync


Title	Auditory Slow-Fast
Description	Recognising actions using auditory signal only
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Paper won outstanding paper at ICASSP 2021 - 3 papers selected out of 1400 papers. Well-referenced -46 stars. In a followup work by Deepmind [https://arxiv.org/pdf/2111.12124.pdf] this work is referred to as: "We find the Slowfast architecture is good at learning rich repre- sentations required by different domains" extending this work to speech and music audio.
URL	https://github.com/ekazakos/auditory-slow-fast


Title	Find Identical Images (FII)
Description	Identical images have the same image dimension (i.e. image width, image height, number of colour channels) and same pixel value in all corresponding pixel locations. FII is a command line tool to find all identical images in a folder. It can also find images that are common in two folders.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Generalised Visual Counting in Images
Description	Our goal is to develop a generalised visual object counting system, that augments humans' ability for recognising the number of objects in a visual scene. Specifically, generalised visual object counting refers to the problem of identifying the number of the salient objects of arbitrary semantic class in an image (i.e. open-world visual object counting) with arbitrary number of instance "exemplars" provided by the end user, to refer to the particular objects to be counted, i.e. from zero-shot to few-shot object counting.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Future impact to be determined.
URL	https://arxiv.org/pdf/2208.13721.pdf


Title	Generalized Category Discovery
Description	We present a new setting: 'Generalized Category Discovery' and a method to tackle it. Our setting can be succinctly described as: given a dataset, a subset of which has class labels, categorize all unlabelled images in the dataset. The unlabelled images may come from labelled or novel classes. Our method leverages contrastively trained vision transformers to assign labels directly through clustering.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Future impact to be determined
URL	https://github.com/prajwalkr/vtp#readme


Title	Image Counterfeit Spotter
Description	Counterfeit Spotter compares images of suspicious products with a reference image and confirm if it is a real or a fake within seconds, right in your browser.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Still receiving feedback and reporting
URL	https://www.robots.ox.ac.uk/~vgg/software/image-compare/counterfeit-spotter/#usecases


Title	ImageCompare
Description	Image Compare is a lightweight, standalone and offline application to visually compare a pair of images and highlight their differences. This application can be used in desktop computers and mobile phones without requiring installation as it runs entires in a web browser.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Lip Reading
Description	To learn strong lip reading models that can recognise speech in silent videos.
Type Of Technology	Software
Year Produced	2022
Impact	Research shown the best models achieve state-of-the-art results, outperforming prior work trained on public data by a significant margin, and even industrial models trained on orders of magnitude more data. We have also designed a Visual Speech Detection model on top of our lip reading system that obtains state-of-the-art results on this task and even outperforms several audio-visual baselines.
URL	https://www.robots.ox.ac.uk/~vgg/research/vtp-for-lip-reading/


Title	List Annotator (LISA)
Description	List Annotator (LISA) is a standalone and light-weight HTML/CSS/JavaScript based application to efficiently annotate a large list of images. LISA is an open source project developed and maintained by the Visual Geometry Group (VGG) and released under a license that grants its users the freedom to use it for any purpose.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Motion Grouping
Description	This software implements the model as described in the paper. It includes a pre-trained model and inference code to apply to downstream images, as well as the training code to train the model from scratch. It also includes code to evaluate and benchmark the results against existing datasets (DAVIS2016, FBMS59, SegTrackv2, MoCA).
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	This code accompanies the paper: Self-supervised Video Object Segmentation by Motion Grouping Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie. ICCV 2021
URL	https://oxris.ox.ac.uk/viewobject.html?id=1190260&cid=1


Title	VGG Image Annotator (VIA)
Description	VGG Image Annotator is a simple and standalone manual annotation software for image, audio and video. VIA runs in a web browser and does not require any installation or setup. The complete VIA software fits in a single self-contained HTML page of size less than 400 Kilobyte that runs as an offline application in most modern web browsers.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	VGG Image Search Engine (VISE)
Description	VGG Image Search Engine (VISE) is a free and open source software for visual search of a large number of images using an image as a search query.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	VGG Visual Tracker (VV
Description	VGG Visual Tracker (VVT) is a tool for creating bounding box annotations on videos in a semi-automatic fashion, using class agnostic object trackers. VVT runs on modern web browsers (Chrome 65+, Firefox 60+, Safari 11+) and does not require any installation or setup. VVT is a variation of the VGG Image Annotator (VIA) v3 tool and uses the same data format. So, if you are already using VIA v3, the annotations are interoperable with your existing workflow. No changes required.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Visual Analysis of Chapbooks
Description	The chapbooks were produced cheaply to create everyday reading material and were the most popular reading material for the masses [1]. This dataset has been made freely available by the National Library of Scotland (NLS)
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Reduced printing costs, these woodcuts were reused across multiple chapbooks. Helped researchers pursue many related research questions using software tools based on computer vision.
URL	https://data.nls.uk/data/digitised-collections/chapbooks-printed-in-scotland/


Description	(1) VIA: Image and Video Annotation; (2) Image Comparator; (3) Image search and retrieval
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Show and Tell Event at the Oxford Big Data Institute. Researchers at the Big Data Institute are now aware about our computer vision tools that can significantly improve their existing research workflow.
Year(s) Of Engagement Activity	2022
URL	https://www.bdi.ox.ac.uk/


Description	ACH talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Forum for conversations on an expansive definition of digital humanities in a broad array of subject areas, methods, and communities of practice.
Year(s) Of Engagement Activity	2021
URL	https://drive.google.com/file/d/1CN5CDWPf4cLTT1NY9gyP-JxxvsCdzRG-/view


Description	AD-Manual Annotation of Radiology Images using VGG Image Annotator (VIA) online Course
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The course website lists our manual annotation tool (VIA). This provides a lot of exposure to our software, which is available for the world as an open source tool. This tool significantly speed up annotating work for professions that need to annotate large volume of visual data.
Year(s) Of Engagement Activity	2020
URL	https://folio47.wixsite.com/rp-course/radiology-preprocessor-workflow


Description	AD-TAP Outcome Presentation for Leiden University
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Presented the outcome of our collaboration with Leiden University. The team at Leiden University were extremely excited to see the results from our visual search engine. They said that they were "jumping like a child" after seeing the outcome and that this collaboration will lead to many new research projects in related to the Frank Scholten Archives.
Year(s) Of Engagement Activity	2021


Description	AEOLIAN Network workshop presentation
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Presentation describing a project undertaken within the National Librarian of Scotland's Fellowship in Digital Scholarship programme for 2020-1.
Year(s) Of Engagement Activity	2021
URL	https://www.aeolian-network.net/events/workshop-1-employing-machine-learning-and-artificial-intellig...


Description	AI4 LAM online conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Aimed at professionals in the LAM (Libraries, Archives and Museums sector, this was an online workshop teaching the use of several Visual AI tools and giving context to their application in this sector. Issues of attribution, bias and fairness were discussed as well as technical areas.
Year(s) Of Engagement Activity	2022
URL	https://sites.google.com/view/ai4lam/ai4lam-2022-virtual-event


Description	AI4LAM workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Introduce the use of visual AI for collections research, access and management. Using the example of collaborations between Oxford's Visual Geometry Group (VGG) and researchers and curators within the GLAM sector, the speaker will provide a hands-on introduction to VGG's open-source tools for visual search, classification, comparison and annotation.
Year(s) Of Engagement Activity	2021
URL	https://libereurope.eu/event/introduction-to-visual-ai-in-glams-workshop-series-on-applying-and-depl...


Description	AIUM 2021 Special Session Invited Speaker
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited speaker in Session with Title: Deep Learning Applications for New Ultrasound Techniques. Talk was pre-recorded with live questions. This primary audience was medical physicists rather than medical image analysis experts.
Year(s) Of Engagement Activity	2021


Description	AV4D: Visual Learning of Sounds in Spaces
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshop at the European Conference on Computer Vision (ECCV).
Year(s) Of Engagement Activity	2022
URL	https://av4d.org


Description	AWS Human-Machine Collaboratory conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Hosted by the Amazon Web Services (AWS)-funded Human-Machine Collaboratory at Oxford, Giles Bergel gave two talks (one alongside Dan Schofield, another Visual AI ambassador) on Visual AI collaborations and research in fields ranging from primatology to cultural heritage and media studies.
Year(s) Of Engagement Activity	2022
URL	https://www.mpls.ox.ac.uk/innovation-and-business-partnerships/human-machine-collaboration


Description	Aberystwyth Bibliographical Group
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Presented work in tracing woodcut illustrations, their original woodblocks and copies throughout the surviving corpus of British ballads and chapbooks. He discussed how woodcuts in these forms of cheap print served as visual brands for particular titles, genres or producers of cheap print, and demonstrated some of the bibliographical uses of their identification. Showed how computer vision software can strongly support these researches, and may be further applied to printed images of all kinds.
Year(s) Of Engagement Activity	2021
URL	https://www.hugofox.com/community/aberystwyth-bibliographical-group-19783/reports-of-recent-meetings...


Description	Co-organiser of ASMUS2021, a MICCAI workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Advances in Simplifying Medical UltraSound (ASMUS) 2021 is an international workshop that provides a forum for research topics around ultrasound image computing and computer-assisted interventions and robotic systems that utilize ultrasound imaging. It was held in conjunction with MICCAI 2021 in virtual form. Accepted papers were selected based on their scientific contribution, via a double-blind process involving written reviews from at least two external reviewers in addition to a member of the committee. The published work includes reports across a wide range of methodology, research and clinical applications. Advanced deep learning approaches for anatomy recognition, segmentation, registration and skill assessment are the dominant topics, in addition to ultrasound-specific new approaches in augmented reality and remote assistance. Three invited speakers were included in the workshop, and live demos of technologies were given. The meeting had 80+ attendees.
Year(s) Of Engagement Activity	2021
URL	https://miccai-ultrasound.github.io/#/asmus21


Description	ConCode webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Presentation will highlight some of the ways that cultural heritage collections are using computer vision (or visual AI) for collections management and research, focussing particularly on the work of the Oxford Visual Geometry Group and its collaborators.
Year(s) Of Engagement Activity	2021,2022
URL	https://www.youtube.com/watch?v=d4XaZ4bur6Q


Description	Deep Discoveries webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Media (as a channel to the public)
Results and Impact	Discussed how computer vision excels at matching identical features within images and has made progress in broad classification tasks, though the middle ground remains challenging. Visual similarity, which is essential to human visual recognition, is challenging to conceptualise, measure and compute. Outlined some approaches to defining similarity in computational terms, drawing on the experience of the Visual Geometry Group (Oxford) in collaborating with cultural heritage researchers.
Year(s) Of Engagement Activity	2021
URL	https://www.eventbrite.co.uk/e/computer-vision-and-heritage-opportunities-for-research-and-engagemen...


Description	Digital Humanities Annual Conference - Tokyo
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	The project gave a paper and lead a workshop teaching the use of Visual AI software tools for the study of printed illusttrations.
Year(s) Of Engagement Activity	2022
URL	https://dh2022.adho.org/


Description	Digital Humanities Congress Sheffield
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Presentation of Visual AI collaborations on book history to a diverse audience of digital humanists to promote the sharing of knowledge, ideas and techniques within the digital humanities.
Year(s) Of Engagement Activity	2022
URL	https://www.dhi.ac.uk/dhc2022/


Description	Digital Humanities and Book History conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Project research, tools and collaborations presented to a digital humanities audience, working in particular in the field of book history, in which field Visual AI has a high profile
Year(s) Of Engagement Activity	2022
URL	https://dcsco-op.org/past-events/dhbh/


Description	Digital Humanities at Oxford Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	A presentation and two hands-on sessions on Visual AI tools and collaborations in Digital Humanities
Year(s) Of Engagement Activity	2022
URL	https://eng.ox.ac.uk/events/dhoxss-2022/


Description	Digitising, Cataloguing, Searching and Sharing the Medieval and Early-Modern Image: On-Going Projects & Different Methodologies
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Presentation on digitising, Cataloguing, Searching and Sharing the Medieval and Early-Modern Image: On-Going Projects & Different Methodologies
Year(s) Of Engagement Activity	2021


Description	Distinguished Keynote Speaker in Biomedical and Health Data Science in two joint conferences of IEEE EMBS BHI and BSN 2021
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote talk entitled: Simplifying interpretation and acquisition of ultrasound scans, delivered virtually. Abstract: Short Abstract: With the increased availability of low-cost and handheld ultrasound probes, there is interest in simplifying interpretation and acquisition of ultrasound scans through deep-learning based analysis so that ultrasound can be used more widely in healthcare. However, this is not just "all about the algorithm", and successful innovation requires inter-disciplinary thinking and collaborations. In this talk I will overview progress in this area drawing on examples of my laboratory's experiences of working with partners on multi-modal ultrasound imaging, and building assistive algorithms and devices for pregnancy health assessment in high-income and low-and-middle-income country settings. Emerging topics in this area will also be discussed.
Year(s) Of Engagement Activity	2021


Description	Edinburgh CDCS Digitised Documents Series workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Workshop to showcase the state of the art in Visual AI for cultural heritage and the digital humanities, and provide a hands-on introduction to some simple techniques for searching and classifying imagery in books, paintings, photographs and film.
Year(s) Of Engagement Activity	2022
URL	https://www.cdcs.ed.ac.uk/events/visual-ai-and-humanities-introduction


Description	Edinburgh CDCS workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Other audiences
Results and Impact	Workshop to showcase the state of the art in Visual AI for cultural heritage and the digital humanities, and provide a hands-on introduction to some simple techniques for searching and classifying imagery in books, paintings, photographs and film. Introduced participants to the study of bias within AI, as such controversial applications as facial recognition and automated image categorisation.
Year(s) Of Engagement Activity	2021
URL	https://www.cdcs.ed.ac.uk/events/workshop-chapbooks-national-library-scotland


Description	Fantastic Futures Conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Conference to aim to help participants discover: basic concepts of artificial intelligence in the GLAM sector, concrete uses and practices of AI in the GLAM sector, technologies and tools applicable to the GLAM sector's data and collections.
Year(s) Of Engagement Activity	2021
URL	https://www.bnf.fr/en/agendaEN/workshops-tutorials-les-futurs-fantastiques-3rd-conference-about-arti...


Description	Helping Computers See and Understand the World Around Us
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	Science Week Demonstration for Year 3 and Year 4 students at the Cutteslowe Primary School in Oxford
Year(s) Of Engagement Activity	2022
URL	https://www.cutteslowe.oxon.sch.uk/


Description	History of Printed Illustrations webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Presentation, drawing on a recent collaboration with the National Library of Scotland on their chapbook collections, demonstrated how computer vision (or 'visual AI') can support the study of printed illustrations. Demonstrated free software developed for these purposes; discuss its strengths and weaknesses; and consider its overall place within the illustration researcher's toolbox.
Year(s) Of Engagement Activity	2021
URL	https://www.cphc.org.uk/events/2021/7/8/hopin-webinar-ly8r3


Description	ICDAR Hip2021
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Workshop to bring together researchers from various fields working on document image acquisition, restoration, analysis, indexing, and retrieval to make these documents accessible in digital libraries.
Year(s) Of Engagement Activity	2021
URL	https://blog.sbb.berlin/hip2021/


Description	IIIF Community Call
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Media (as a channel to the public)
Results and Impact	Community Call discussing the University of Oxford Visual Geometry Group's work with IIIF and Machine Learning
Year(s) Of Engagement Activity	2021
URL	https://www.youtube.com/watch?v=KXE3-LD6xxI&t=1s


Description	International Computer Vision Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	International Computer Vision Summer School
Year(s) Of Engagement Activity	2022
URL	https://iplab.dmi.unict.it/icvss2022/


Description	Learning 3D Geometry
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Undergraduate students
Results and Impact	Lecture in the computer vision course at the University of Amsterdam.
Year(s) Of Engagement Activity	2022


Description	Learning on Screen - BoB/TRilT Academic Engagement launch
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Project tools and collaborations advertised to researchers seeking to use one of the largest research databases of UK TV programmes, leading to follow up discussions.
Year(s) Of Engagement Activity	2022
URL	https://learningonscreen.ac.uk/guidance/bob-and-trilt-for-research/launch-event/


Description	London Rare Books Summer School - the Digital Book Historian's Toolkit
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Schools
Results and Impact	View of the landscape of digital research in book history, including bibliographic data and content management systems, data visualisation, systems for image sharing and annotation in libraries and archives, computer vision, and (semi-)automated collation. Instead of emphasising mastery of any particular technology, we encouraged computational thinking and digital experimentation to enhance historical research questions and information management.
Year(s) Of Engagement Activity	2021


Description	MIUA 2021 Conference - co-organiser
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	MIUA is a UK-based international conference for the communication of image processing and analysis research and its application to medical imaging and biomedicine. This was the 25th edition of the meeting which was held virtually. 40 papers were presented (27k downloads as of 09-03-2022). MIUA is the principal UK forum for communicating research progress within the community interested in image analysis applied to medicine and related biological science. The meeting is designed for the dissemination and discussion of research in medical image understanding and analysis, and aims to encourage the growth and raise the profile of this multi-disciplinary field by bringing together the various communities including among others:
Year(s) Of Engagement Activity	2021
URL	https://miua2021.com/


Description	Max Planck BibHerz Library Seminar: Reflections on the Digital Turn in the Humanities and the Sciences
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Seminar on how digital technologies have changed approaches to the discovery, study, and presentation of images; what impact the changing dynamic between the analogue and digital manifestation of the book or manuscript has on their working practices; and how this affected their use and questions that are asked or could be asked.
Year(s) Of Engagement Activity	2021
URL	https://www.biblhertz.it/3069990/seminar-series-reflections-on-the-digital-turn-in-the-humanities-an...


Description	NLS Digital Scholarship Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	15 attendees for an annual workshop, which sparked questions and ongoing discussions.
Year(s) Of Engagement Activity	2021


Description	National Academies roundtable on researcher access to data
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The National Academies Data Reform Round Table was a by invitation meeting that discussed some of the current challenges that researchers face with getting access to data for research due to current data protection regulation. The Department for Digital, Culture, Media and Sport (DCMS) was consulting on reforming the UK's data protection regime which formed part of a larger effort to implement the government's National Data Strategy, and specifically Mission 2 of that strategy: 'supporting a pro-growth and trusted data regime'. This issue affects researchers working in computer vision and medical image analysis and this was part of the discussion. In terms of impact/outcome, the meeting output fed into a response that hopefully will have influence (how direct can not be measured/it is too early to determine but I selected this box in the next question for this reason).
Year(s) Of Engagement Activity	2021


Description	National Academies' party conference event speaker
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Speaker on the (virtual) National Academies panel at the Liberal Democrat political party conference which focused on the theme of 'Becoming a "science superpower": will the UK be fit to tackle the next global crisis?'. Briefing: The panel discussions will address how the UK should approach the future, building resilience to future crises and achieving 'superpower' status. The panel will include leading experts representing the National Academies, as well as representatives from the political parties and a journalist Chair. Not aware of any direct impact (see next week) but these sessions are an important part of keeping an open and positive dialogue with MPs.
Year(s) Of Engagement Activity	2021


Description	National Librarian of Scotland's Lecture in Digital Scholarship
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Media (as a channel to the public)
Results and Impact	Introduced research on chapbooks using Visual AI and how machine vision can help others to understand printed heritage collections.
Year(s) Of Engagement Activity	2021
URL	https://www.youtube.com/watch?v=5jkq0iLzMvo&t=10s


Description	Neural Geometry and Rendering: Advances and the Common Objects in 3D Challenge?
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshop at the European Conference on Computer Vision (ECCV).
Year(s) Of Engagement Activity	2022
URL	https://ngr-co3d.github.io


Description	Office for National Statistics, Integrated Data Programme Advisory Group, Member,
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Other audiences
Results and Impact	The Office for National Statistics Integrated Data Programme Advisory Group offers advise to the ONS on its programme aimed at sharing data for pubic good with other organisations. I was invited due to my role as Chair of the Royal Society PETs science policy work together with my research interest in health data science/medical image analysis.
Year(s) Of Engagement Activity	2021,2022


Description	OxML - Oxford Machine Learning Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Gave a lecture at the OxML summer school on Deep Learning.
Year(s) Of Engagement Activity	2021
URL	https://www.oxfordml.school


Description	Practical Applications of IIIF Seminar: Image Registration and IIIF
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Discussing the methods, challenges and possibilities of Image Registration.
Year(s) Of Engagement Activity	2021
URL	https://www.iiconservation.org/content/practical-applications-iiif-seminar-1-image-registration-and-...


Description	Renaissance Society of America Day of Digital Learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	RSA DAY OF DIGITAL LEARNING. Featuring a varied menu of sessions involving hands-on, participatory work with digital tools and resources.
Year(s) Of Engagement Activity	2021
URL	https://rsaddl.hcommons.org/


Description	Renaissance Society of America Day of Digital Learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	An introduction to computer vision - the extraction of information from images - for the purposes of book and art history. Overview of the field, with particular reference to collaborative research performed by the Visual Geometry Group (VGG) at Oxford.
Year(s) Of Engagement Activity	2022
URL	https://rsa2022ddl.hcommons.org/main-page/rsa-ddl-2022-topics/


Description	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group - Chair
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group (policy report), Chair, 2017-19. Also Chair of follow-on to initial report, 2021-.
Year(s) Of Engagement Activity	2019,2020,2021,2022
URL	https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies


Description	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group - Chair
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group (policy report), Chair, 2017-19. Also Chair of follow-on to initial report, 2021-.
Year(s) Of Engagement Activity	2019,2020,2021,2022
URL	https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies


Description	Sight and Sound Workshop at the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2021. This is the description of the workshop: While traditionally visual and audio data have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is appealing, and this workshop will cover recent advances in this direction. It will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Year(s) Of Engagement Activity	2021
URL	https://sightsound.org/2021/


Description	Sixth Form Schools Science Talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Gave talk to lower sixth form students at Magdalen College School on my research. This was part of their lecture series related to the lower sixth form project which provides them with experience of researching a topic. Lots of interesting questions particularly about the global health angle of the research/potential impact and ethics of using AI. In fact the quality of questions was much higher than most technical audience ones! Teacher followup said there was good discussion afterwards.
Year(s) Of Engagement Activity	2022


Description	Summer School on Artificial Intelligence, India
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Lectured at Summer School on "Recognizing Human Actions in Videos", followed by Q & A session.
Year(s) Of Engagement Activity	2021
URL	https://cvit.iiit.ac.in/summerschool2021/index.php


Description	The Sixth Annual Conference for Research Software Engineering
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Research Software Engineers from other Universities got to learn about our methods and processes of developing software tools that are used widely all over the world.
Year(s) Of Engagement Activity	2022
URL	https://virtual.oxfordabstracts.com/#/event/3101/submission/70


Description	University of Stockholm Digital Humanities Now workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Showcase new and ongoing research in the broad Digital Humanities field.
Year(s) Of Engagement Activity	2021
URL	https://su.powerinit.com/Data/Event/EventTemplates/2602/?EventId=879


Description	VGG Image Search Engine (VISE)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Talk at the RKD Netherlands Institute for Art History. The RKD team have integrated our VISE image search engine software into their platform. In this event, all the contributors to the digital platform talked about their work and their software. Our VISE software was introduced to a wider group of international audience.
Year(s) Of Engagement Activity	2022
URL	https://rkd.nl/en/


Description	VisuAI Show and Tell 2021
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Presneted our Visual annotation and Visual search software to potential interetest reseachers, some of whom enquired further and later adopted tools in their research.
Year(s) Of Engagement Activity	2021


Description	VisualAI Show and Tell
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	The event was for the University of Edinburgh. It showcased the software developed by the VisualAI team with the aims of publicising the open source software produced in the project, and of attracting potential collaborators.
Year(s) Of Engagement Activity	2021
URL	https://www.robots.ox.ac.uk/~vgg/projects/visualai/events.html#ST15621


Description	VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity	2021
URL	https://www.robots.ox.ac.uk/~vgg/data/voxceleb/interspeech2021.html


Description	VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 2022
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity	2022
URL	http://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/interspeech2022.html


Description	What do you learn after Developing, Maintaining and Supporting Research Software for 6 years?
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Talk at the Vision, Graphics and Learning (VGL) research group in the Department of Computer Science, University of York. The PhD and Postdocs in the VGL group of University of York became aware about the software development methods and practices for create research software tools used by millions all over the world.
Year(s) Of Engagement Activity	2022
URL	https://www.youtube.com/watch?v=8S0HbFX4HBM

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications