Visual AI: An Open World Interpretable Visual Transformer

Lead Research Organisation: University of Oxford

Department Name: Engineering Science

Abstract

With the advent of deep learning and the availability of big data, it is now possible to train machine learning algorithms for a multitude of visual tasks, such as tagging personal image collections in the cloud, recognizing faces, and 3D shape scanning with phones. However, each of these tasks currently requires training a neural network on a very large image dataset specifically collected and labelled for that task. The resulting networks are good experts for the target task, but they only understand the 'closed world' experienced during training and can 'say' nothing useful about other content, nor can they be applied to other tasks without retraining, nor do they have an ability to explain their decisions or to recognise their limitations. Furthermore, current visual algorithms are usually 'single modal', they 'close their ears' to the other modalities (audio, text) that may be readily available.

The core objective of the Programme is to develop the next generation of audio-visual algorithms that does not have these limitations. We will carry out fundamental research to develop a Visual Transformer capable of visual analysis with the flexibility and interpretability of a human visual system, and aided by the other 'senses' - audio and text. It will be able to continually learn from raw data streams without requiring the traditional 'strong supervision' of a new dataset for each new task, and deliver and distill semantic and geometric information over a multitude of data types (for example, videos with audio, very large scale image and video datasets, and medical images with text records).

The Visual Transformer will be a key component of next generation AI, able to address multiple downstream audio-visual tasks, significantly superseding the current limitations of computer vision systems, and enabling new and far reaching applications.

A second objective addresses transfer and translation. We seek impact in a variety of other academic disciplines and industry which today greatly under-utilise the power of the latest computer vision ideas. We will target these disciplines to enable them to leapfrog the divide between what they use (or do not use) today which is dominated by manual review and highly interactive analysis frame-by-frame, to a new era where automated visual analytics of very large datasets becomes the norm. In short, our goal is to ensure that the newly developed methods are used by industry and academic researchers in other areas, and turned into products for societal and economic benefit. To this end open source software, datasets, and demonstrators will be disseminated on the project website.

The ubiquity of digital images and videos means that every UK citizen may potentially benefit from the Programme research in different ways. One example is smart audio-visual glasses, that can pay attention to a person talking by using their lip movements to mask out other ambient sounds. A second is an app that can answer visual questions (or retrieve matches) for text-queries over large scale audio-visual collections, such as a person's entire personal videos. A third is AI-guided medical screening, that can aid a minimally trained healthcare professional to perform medical scans.

Planned Impact

The proposed programme encompasses new methodology and applied research in computer vision and other modalities (audio, text) that will enable analysis and search of image and video content while learning new things, with human-like flexibility and interpretability. These capabilities will encourage end user take up of computer vision technologies and commercial interest in embedding these technologies in products.

The Programme will have Economic and Societal impact by
1. Enabling UK industry to leverage AI in their activities with a key strategic advantage.
2. Developing new and improved computer vision technologies that will require substantially less training data to solve problems and is thus suitable for commercialisation by a wide range of companies.
3. Enhancing the visual and audio capabilities and knowledge base of UK industries, including small ones.
4. Enhancing quality of life by improving, for instance, healthcare capabilities, surveillance, environmental monitoring, and the means of accessing and enjoying personal digital media.
5. Reducing the cost and risk of collecting manual annotations for deploying AI technology, especially for sensitive data such as medical records.
6. Collaborating directly with companies and organizations that we have already identified, and will work with over the course of the Programme.
7. Training the next generation of computer vision researchers who will be equipped to support the imaging needs of science, technology and wider society for the future.

Impact on Knowledge includes
1. Realisation of new approaches to essential computer vision technology, and the dissemination of research findings through publications, conference presentations, summer school teaching, and the distribution of open source software and image databases.
2. Sharing knowledge with industrial collaborators via Transfer and Application Projects (TAPs) and other activities leading to adoption of advanced computer vision methods across many disciplines of science, engineering and medicine that currently do not use them.
3. Communication of advances to a public audience through website articles, Show and Tell events, social and broadcast media, and other co-ordinated public understanding activities

Funded Value:

£5,912,096

Funded Period:

Dec 20 - Nov 25

Funder:

EPSRC

Project Status:

Active

Project Category:

Research Grant

Project Reference:

EP/T028572/1

Principal Investigator:

Andrew Zisserman

Research Subject:

Info. & commun. Technol. (95%)

Linguistics (5%)

Research Topic:

Artificial Intelligence (25%)

Computational Linguistics (5%)

Image & Vision Computing (70%)

Organisations

People	ORCID iD
Andrew Zisserman (Principal Investigator)
Alison Noble (Co-Investigator)
Andrea Vedaldi (Co-Investigator)
Dima Damen (Co-Investigator)
Hakan Bilen (Co-Investigator)	http://orcid.org/0000-0002-6947-6918

Publications

Author Name

Title Publication Date Published

|< < 1 2 3 4 5 6 7 8 9 10 > >|

10 25 50

Zhao H (2022) Medical Image Computing and Computer Assisted Intervention - MICCAI 2022 - 25th International Conference, Singapore, September 18-22, 2022, Proceedings, Part IV

Zhao H (2023) Simplifying Medical Ultrasound - 4th International Workshop, ASMUS 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, October 8, 2023, Proceedings

Zhao B (2023) Dataset Condensation with Distribution Matching

Zhang C (2023) Computer Vision - ACCV 2022 - 16th Asian Conference on Computer Vision, Macao, China, December 4-8, 2022, Proceedings, Part IV

Zhang C (2023) Helping Hands: An Object-Aware Ego-Centric Video Recognition Model

Zhang C (2021) Temporal Query Networks for Fine-grained Video Understanding

Zhang B (2022) Affinity Attention Graph Neural Network for Weakly Supervised Semantic Segmentation. in IEEE transactions on pattern analysis and machine intelligence

Yu Yang (2022) Learning Foreground-Background Segmentation from Improved Layered GANs

Yeung PH (2021) Learning to map 2D ultrasound images into 3D space with minimal human annotation. in Medical image analysis

Policy Influence
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	1-on-1 Engineers and Policy Fellowship discussion
Geographic Reach	National
Policy Influence Type	Influenced training of practitioners or researchers
URL	https://raeng.org.uk/policyfellowships


Description	Chair of Royal Society Data Science Policy group leading to publication of a report "Science in the age of AI"
Geographic Reach	Multiple continents/international
Policy Influence Type	Participation in a guidance/advisory committee


Description	Royal Society National Academies Data Reform Round Table Consultation
Geographic Reach	National
Policy Influence Type	Participation in a guidance/advisory committee


Description	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group, Chair
Geographic Reach	Multiple continents/international
Policy Influence Type	Participation in a guidance/advisory committee
Impact	Quoting the aims from the report "We have three objectives for this report. Our first objective is that the use cases inspire those collecting and using data to consider the potential benefits of PETs for their own work, or in new collaborations with others. Second, for the evidence we present on barriers to adoption and standardisation to help inform policy decisions to encourage a marketplace for PETs. Finally, through our recommendations, we hope the UK will maximise the opportunity to be a global leader in PETs - both for data security and collaborative analysis - alongside emerging, coordinated efforts to implement PETs in other countries."
URL	https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/From-Privacy-to-Part...


Description	Royal Society Privacy Enhancing Technologies Working Group - policy report published (Chair)
Geographic Reach	National
Policy Influence Type	Participation in a guidance/advisory committee
Impact	The report has contributed to wider discussion of data sharing between government departments and a number of the recommendations have been followed up. It is well cited. A follow-on project is underway with the Alan Turing Institute which will report in 2022. The important message was to show that PETs are maturing as a technology and can be considered enablers to provided trusted sharing of data and to move the conversation away from security and accepting zero risk in sharing data. The work is relevant to not only may research area (health data science) but many other sectors which are data-driven.
URL	https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/privacy-enhancing-te...


Description	Biomedical Research Centre
Amount	£89,000,000 (GBP)
Organisation	National Institute for Health Research
Sector	Public
Country	United Kingdom
Start	12/2022
End	04/2027


Description	EPX0401861 Turing AI World Leading Researcher Fellowship Studentship
Amount	£110,541 (GBP)
Funding ID	EP/Y530517/1
Organisation	United Kingdom Research and Innovation
Sector	Public
Country	United Kingdom
Start	09/2023
End	09/2028


Description	Envisioning Dante c.1472- c.1630
Amount	£805,620 (GBP)
Funding ID	AH/W005220/1
Organisation	Arts & Humanities Research Council (AHRC)
Sector	Public
Country	United Kingdom
Start	08/2022
End	09/2025


Description	Royal Society Research Professorship
Amount	£1,400,000 (GBP)
Funding ID	RSRP\R\241003
Organisation	The Royal Society
Sector	Charity/Non Profit
Country	United Kingdom
Start	03/2024
End	03/2029


Description	Royal Society Research Professorship Enhanced research Expenses
Amount	£100,000 (GBP)
Funding ID	RF\ERE\210331
Organisation	The Royal Society
Sector	Charity/Non Profit
Country	United Kingdom
Start	09/2021
End	03/2024


Description	Studentship
Amount	£154,725 (GBP)
Organisation	Facebook
Sector	Private
Country	United States
Start	09/2021
End	09/2025


Description	Toshiba 2021
Amount	$200,000 (USD)
Organisation	Toshiba
Sector	Private
Country	Japan
Start	06/2021
End	03/2023


Description	Toshiba 2023
Amount	£200,000 (GBP)
Organisation	Toshiba
Sector	Private
Country	Japan
Start	04/2023
End	04/2025


Description	Turing AI Fellowship: Ultra Sound Multi-Modal Video-based Human-Machine Collaboration
Amount	£4,248,942 (GBP)
Funding ID	EP/X040186/1
Organisation	United Kingdom Research and Innovation
Sector	Public
Country	United Kingdom
Start	09/2023
End	09/2028


Title	CAIFE dataset and annotations
Description	The CAIFE dataset is a large fetal echocardiography dataset consisting of freehand video and sweep video, collated from multiple hospitals. A subset of this dataset has been manually annotated by cardiac view, and a large subset automatically labelled. The generation of the dataset was funded by the COCHE project but the dataset is used by other video analysis projects as well. Those projects have contributed annotations to enrich the resource.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	No
Impact	On-going


Title	Coreferenced Image Narratives Dataset
Description	Our Coreferenced Image Narratives (CIN) dataset contains 1880 images from the Localized Narratives dataset [1] that come with long-form text descriptions (narrations) and mouse traces. These images are originally a subset of the test and validation set of the Flickr30k dataset [2] . We annotated this subset with coreference chains and bounding boxes in the image that are linked with the textual coreference chains, and use them only for validation and testing. Note that we also include singletons (i.e., coreference chains of length one). [1] Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari; Connecting Vision and Language with Localized Narratives ; ECCV 2020. [2] Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, Svetlana Lazebnik; Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models ; IJCV 2017.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	The dataset allows extending and evaluating the abilities of the recent powerful large vision and language models. As it has been very recently published, there is only one publication from our group published in the top tier, NLP conference, EMNLP 2023 under the title "Semi-supervised multimodal coreference resolution in image narrations".
URL	https://github.com/VICO-UoE/CIN


Title	EPIC Fields: Marrying 3D Geometry and Video Understanding
Description	We introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Similar to other datasets for neural rendering, EPIC Fields removes the complex and expensive step of reconstructing cameras using photogrammetry, and allows researchers to focus on more interesting modeling problems. We illustrate the challenge of photogrammetry in egocentric videos and propose several technical innovations to address them.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	Upcoming
URL	https://epic-kitchens.github.io/epic-fields/


Title	EPIC-KITCHENS VISOR
Description	We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	The dataset can be used for audio event detection and the baseline code will be made publicly available.
URL	https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7/


Title	Epic-Sounds: A Large-scale Dataset of Actions That Sound
Description	We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from visual labels, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2k non-categorised segments. We train and evaluate two state-of-the-art audio recognition models on our dataset, highlighting the importance of audio-only labels and the limitations of current models to recognise actions that sound.
Type Of Material	Database/Collection of data
Year Produced	2023
Provided To Others?	Yes
Impact	A standard benchmark for testing audio-visual models. Already being cited in major publications
URL	https://epic-kitchens.github.io/epic-sounds/


Title	Image Change dataset
Description	Propose a scalable methodology for obtaining a large-scale change detection training dataset by leveraging existing object segmentation benchmarks. Introduce a co-attention based novel architecture that is able to implicitly determine correspondences between an image pair and find changes in the form of bounding box predictions. Contribute four evaluation datasets that cover a variety of domains and transformations, including synthetic image changes, real surveillance images of a 3D scene, and synthetic 3D scenes with camera motion. Evaluate our model on these four datasets and demonstrate zero-shot and beyond training transformation generalization.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023. Future impact to be determined.
URL	https://arxiv.org/pdf/2209.14341.pdf


Title	Localizing Visual Sounds the Hard Way
Description	The objective of this work is to localize sound sources that are visible in a video without using manual annotations. Our key technical contribution is to show that, by training the network to explicitly discriminate challenging image fragments, even for images that do contain the object emitting the sound, we can significantly boost the localization performance.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	Localizing Visual Sounds the Hard Way Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman CVPR, 2021
URL	https://www.robots.ox.ac.uk/~vgg/research/lvs/


Title	PASS: An ImageNet replacement for self-supervised pretraining without humans
Description	PASS is a large-scale image dataset that does not include any humans and which can be used for high-quality pretraining while significantly reducing privacy concerns.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	YM. Asano, C. Rupprecht, A. Zisserman, A. Vedaldi PASS: An ImageNet replacement for self-supervised pretraining without humans NeurIPS Dataset Track, 2021
URL	https://www.robots.ox.ac.uk/~vgg/data/pass/


Title	PULSE dataset and annotations
Description	A multi-modal dataset consisting of fetal ultrasound video, gaze tracking data, probe movement data and sonographer audio for first, second and third trimester scans. Audio has been translated to text. A large subset of the ultrasound video is automatically annotated in terms of anatomy label (single label per frame). Manual annotation has been done on a smaller subset. This dataset was generated as part of the ERC Advanced Grant PULSE but has been used for research on UKRI projects which have also contributed some analysis methods for automatic annotation that have improved the value of the data set and annotations as a whole. The dataset is a private dataset.
Type Of Material	Database/Collection of data
Year Produced	2019
Provided To Others?	No
Impact	See outputs listed on the PULSE website and PURFECT webpages as examples. An ultrasound pre-trained model (PULSENet) has also been derived which is used as a backbone for other research.


Title	Semantic Shift Benchmark
Description	Demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. Following the success of modern deep learning systems on closed-set visual recognition tasks, a natural next challenge is open-set recognition (OSR) (Scheirer et al., 2013). In the closed-set setting, a model is tasked with recognizing a set of categories that remain the same during both training and testing phases.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	Future impact to be determined
URL	https://www.robots.ox.ac.uk/~vgg/research/osr/#ssb_suite


Title	Video Person-Clustering Dataset A multi-modal TV-shows and movies dataset
Description	VPCD contains multi-modal annotations (face, body and voice) for all primary and secondary characters from a range of diverse TV-shows and movies.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	A. Brown, V. Kalogeiton, A. Zisserman Face, Body, Voice: Video Person-Clustering with Multiple Modalities
URL	https://www.robots.ox.ac.uk/~vgg/data/Video_Person_Clustering//


Title	Video-text Alignment HTM-Align dataset
Description	The objective is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment.
Type Of Material	Database/Collection of data
Year Produced	2022
Provided To Others?	Yes
Impact	Future impacts to be determined
URL	https://www.robots.ox.ac.uk/~vgg/research/tan/


Description	NLS Chapbooks
Organisation	National Library of Scotland
Country	United Kingdom
Sector	Academic/University
PI Contribution	We used our software to search and analysed the illustrations of the chapbooks.
Collaborator Contribution	Partner provided chapbooks in large quatities.
Impact	https://www.robots.ox.ac.uk/~vgg/research/chapbooks/
Start Year	2020


Description	National Consortium of Intelligent Medical Imaging
Organisation	National Consortium of Intelligent Medical Imaging
Sector	Academic/University
PI Contribution	A VisualAI postdoc (Jianbo Jiao) is providing expertise for image-based building deep learning models to assess COVID19 deterioration for hospital-based patients.
Collaborator Contribution	NCIMI is providing access to COVID19 data for a TAP project.
Impact	An initial evaluation of predictive modelling was performed using available covid-19 data. However due to the small size of the data, and the fact that covid treatments for patients have significantly improved and better pathways for patients are in place it was deemed not worth pursuing this work further beyond the preliminary study. A report was written but has not been published.
Start Year	2021


Description	TAP VAI-02 1516 Project
Organisation	University of Copenhagen
Country	Denmark
Sector	Academic/University
PI Contribution	We created a visual search engine using images and metadata supplied by Matilde Malaspina at University of Copenhagen and Barbara Tramelli from University of Venice.
Collaborator Contribution	Partner provided images and metadata.
Impact	A talk at Venice Centre for Digital and Public Humanities (VeDPH) on 9th Dec. 2020
Start Year	2020


Description	TAP-VAI-03 16cIllustration Project
Organisation	Ca' Foscari University of Venice
Country	Italy
Sector	Academic/University
PI Contribution	We created a visually searchable database (https://www.robots.ox.ac.uk/~vgg/research/16ci/lyon/) of 16th century illustrations printed in Lyon.
Collaborator Contribution	Partner provided images and metadata.
Impact	The researchers at Venice Centre for Digital and Public Humanities are using this visual search engine as a research support tool.
Start Year	2021


Description	TAP-VAI-04 Frank-Scholten Archive
Organisation	Leiden University
Country	Netherlands
Sector	Academic/University
PI Contribution	Using our VISE software, we found a match between all the photographs and their corresponding negative in the Frank-Scholten image archive.
Collaborator Contribution	They provide Dataset containing photographs and negatives captured by Frank-Scholten.
Impact	tbc
Start Year	2021


Description	TAP-VAI-08 Fish Pool Trajectory
Organisation	University of Oxford
Department	Department of Zoology
Country	United Kingdom
Sector	Academic/University
PI Contribution	We are developing tools and workflow to detect and track a Picasso triggerfish moving in a fish tank to find the food target.
Collaborator Contribution	They provide videos dataset showing Picasso triggerfish in a fish pool.
Impact	tbc
Start Year	2021


Description	TAP-VAI-09 Fish Tank Obstacles
Organisation	University of Oxford
Department	Department of Zoology
Country	United Kingdom
Sector	Academic/University
PI Contribution	We are developing tools and workflow to detect and track Picasso triggerfish navigating through obstacles to reach a food target in a fish tank.
Collaborator Contribution	They provide videos dataset showing Picasso triggerfish in a fish tank containing obstacles.
Impact	tbc
Start Year	2021


Description	TAP-VAI-10 Czech National Library/ Czech Academy of Sciences
Organisation	National Library of the Czech Republic
Country	Czech Republic
Sector	Public
PI Contribution	We are providing technical support to the RKD team for implementing our VISE software in their platform.
Collaborator Contribution	They are using our software tool (VISE)
Impact	tbc
Start Year	2021


Description	TAP-VAI-11 RKD
Organisation	Netherlands Institute for Art History
Country	Netherlands
Sector	Public
PI Contribution	We are providing technical support to the RKD for implementing visual image search feature in the public facing web portal and internal research using our VGG Image Search Engine (VISE) software (https://www.robots.ox.ac.uk/~vgg/software/vise/).
Collaborator Contribution	The RKD Provide images in millions and they are now using our VISE software for visual search functionality.
Impact	Not yet.
Start Year	2021


Title	Audio-visual synchronisation
Description	The software enables: Audio-visual synchronisation. Requires a model to relate changes in the visual and audio streams. Prior work focused primarily on the synchronisation of talking head videos. In contrast, open-domain videos often have a small visual indication, i.e. sparse in space.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Paper in British Machine Vision Conference (BMVC), 2022. Future impacts to be determined.
URL	https://iashin.ai/SparseSync


Title	Audio-visual synchronisation - Synchformer
Description	An audio-visual synchronization model: the inputs are the audio and visual streams of a video, and the output is the temporal offset. The approach is applicable to both dense and sparse (in time and space) audio-visual synchronization cues (e.g. a person talking (dense in time) or a dog barking (sparse in time)). A particular advantage of the model and training is that it decouples feature extraction from synchronization modeling through multi-modal segment-level contrastive pre-training.
Type Of Technology	Software
Year Produced	2024
Open Source License?	Yes
Impact	Paper at ICASSP 2024. Also won Amazon synchronization challenge, https://wacv2024-workshop-quality-iva.github.io/workshop-quality-iva/index.html#Competition Future impacts to be determined.
URL	https://www.robots.ox.ac.uk/~vgg/research/synchformer/


Title	Auditory Slow-Fast
Description	Recognising actions using auditory signal only
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Paper won outstanding paper at ICASSP 2021 - 3 papers selected out of 1400 papers. Well-referenced -46 stars. In a followup work by Deepmind [https://arxiv.org/pdf/2111.12124.pdf] this work is referred to as: "We find the Slowfast architecture is good at learning rich repre- sentations required by different domains" extending this work to speech and music audio.
URL	https://github.com/ekazakos/auditory-slow-fast


Title	EPIC Fields Code
Description	This section contains the pipeline for the dataset introduced in our paper, "EPIC Fields: Marrying 3D Geometry and Video Understanding." We aim to bridge the domains of 3D geometry and video understanding, leading to innovative advancements in both areas.
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	Upcoming
URL	https://github.com/epic-kitchens/epic-Fields-code


Title	Find Identical Images (FII)
Description	Identical images have the same image dimension (i.e. image width, image height, number of colour channels) and same pixel value in all corresponding pixel locations. FII is a command line tool to find all identical images in a folder. It can also find images that are common in two folders.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Follow Things Around
Description	Software to track "things" (e.g. animals) in a video. The input is the video, the output is a text file specifying a bounding box of the `thing' or `things' in each frame. The method uses `tracking by detection', meaning that no manual annotation is required on the video. Instead a detector for the `thing' is required (and detectors are available pre-trained for multiple classes of animals). Follow Things Around is provided as a Jupyter Notebook for Google Colab. It runs in a web browser, without the need for a GPU. A user can access their data for tracking on their Google Drive.
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	It is too early to say.
URL	https://www.robots.ox.ac.uk/~vgg/software/follow-things-around/


Title	Generalised Visual Counting in Images
Description	Our goal is to develop a generalised visual object counting system, that augments humans' ability for recognising the number of objects in a visual scene. Specifically, generalised visual object counting refers to the problem of identifying the number of the salient objects of arbitrary semantic class in an image (i.e. open-world visual object counting) with arbitrary number of instance "exemplars" provided by the end user, to refer to the particular objects to be counted, i.e. from zero-shot to few-shot object counting.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Future impact to be determined.
URL	https://arxiv.org/pdf/2208.13721.pdf


Title	Generalized Category Discovery
Description	We present a new setting: 'Generalized Category Discovery' and a method to tackle it. Our setting can be succinctly described as: given a dataset, a subset of which has class labels, categorize all unlabelled images in the dataset. The unlabelled images may come from labelled or novel classes. Our method leverages contrastively trained vision transformers to assign labels directly through clustering.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Future impact to be determined
URL	https://github.com/prajwalkr/vtp#readme


Title	Image Counterfeit Spotter
Description	Counterfeit Spotter compares images of suspicious products with a reference image and confirm if it is a real or a fake within seconds, right in your browser.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	Still receiving feedback and reporting
URL	https://www.robots.ox.ac.uk/~vgg/software/image-compare/counterfeit-spotter/#usecases


Title	ImageCompare
Description	Image Compare is a lightweight, standalone and offline application to visually compare a pair of images and highlight their differences. This application can be used in desktop computers and mobile phones without requiring installation as it runs entires in a web browser.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Lip Reading
Description	To learn strong lip reading models that can recognise speech in silent videos.
Type Of Technology	Software
Year Produced	2022
Impact	Research shown the best models achieve state-of-the-art results, outperforming prior work trained on public data by a significant margin, and even industrial models trained on orders of magnitude more data. We have also designed a Visual Speech Detection model on top of our lip reading system that obtains state-of-the-art results on this task and even outperforms several audio-visual baselines.
URL	https://www.robots.ox.ac.uk/~vgg/research/vtp-for-lip-reading/


Title	List Annotator (LISA)
Description	List Annotator (LISA) is a standalone and light-weight HTML/CSS/JavaScript based application to efficiently annotate a large list of images. LISA is an open source project developed and maintained by the Visual Geometry Group (VGG) and released under a license that grants its users the freedom to use it for any purpose.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Motion Grouping
Description	This software implements the model as described in the paper. It includes a pre-trained model and inference code to apply to downstream images, as well as the training code to train the model from scratch. It also includes code to evaluate and benchmark the results against existing datasets (DAVIS2016, FBMS59, SegTrackv2, MoCA).
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	This code accompanies the paper: Self-supervised Video Object Segmentation by Motion Grouping Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie. ICCV 2021
URL	https://oxris.ox.ac.uk/viewobject.html?id=1190260&cid=1


Title	VGG Image Annotator (VIA)
Description	VGG Image Annotator is a simple and standalone manual annotation software for image, audio and video. VIA runs in a web browser and does not require any installation or setup. The complete VIA software fits in a single self-contained HTML page of size less than 400 Kilobyte that runs as an offline application in most modern web browsers.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	VGG Image Search Engine (VISE)
Description	VGG Image Search Engine (VISE) is a free and open source software for visual search of a large number of images using an image as a search query.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	VGG Visual Tracker (VV
Description	VGG Visual Tracker (VVT) is a tool for creating bounding box annotations on videos in a semi-automatic fashion, using class agnostic object trackers. VVT runs on modern web browsers (Chrome 65+, Firefox 60+, Safari 11+) and does not require any installation or setup. VVT is a variation of the VGG Image Annotator (VIA) v3 tool and uses the same data format. So, if you are already using VIA v3, the annotations are interoperable with your existing workflow. No changes required.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Visual Analysis of Chapbooks
Description	The chapbooks were produced cheaply to create everyday reading material and were the most popular reading material for the masses [1]. This dataset has been made freely available by the National Library of Scotland (NLS)
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	Reduced printing costs, these woodcuts were reused across multiple chapbooks. Helped researchers pursue many related research questions using software tools based on computer vision.
URL	https://data.nls.uk/data/digitised-collections/chapbooks-printed-in-scotland/


Title	m-bain/whisperX: v3.0.0
Description	batched inference with faster-whisper backend
Type Of Technology	Software
Year Produced	2023
Open Source License?	Yes
Impact	This repository provides fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization. 25 pull requests
URL	https://zenodo.org/record/7876369


Description	'Humanist in the Loop: Computer Vision by Example for the Study of Early Printed Books', University of Helsinki Digital Humanities Seminar.
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	A presentation to a historical research group in Computational History who are experimenting in computer vision methods themselves. The presentation included comparison of our respective measures and discussion of common challenges and potential solutions.
Year(s) Of Engagement Activity	2023
URL	https://www.helsinki.fi/en/digital-humanities/teaching/digital-humanities-research-seminar


Description	(1) VIA: Image and Video Annotation; (2) Image Comparator; (3) Image search and retrieval
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Show and Tell Event at the Oxford Big Data Institute. Researchers at the Big Data Institute are now aware about our computer vision tools that can significantly improve their existing research workflow.
Year(s) Of Engagement Activity	2022
URL	https://www.bdi.ox.ac.uk/


Description	- How to study early printed books with computer vision: a practical introduction, University of Newcastle-upon-Tyne
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Presentation on and hands-on workshop with Visual AI project software and showcase of successful collaborations in this domain. Outputs included plans for uptake and further development of software on external research projects,
Year(s) Of Engagement Activity	2023
URL	https://www.eventbrite.co.uk/e/how-to-study-early-printed-books-with-computer-vision-a-practical-int...


Description	ACH talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Forum for conversations on an expansive definition of digital humanities in a broad array of subject areas, methods, and communities of practice.
Year(s) Of Engagement Activity	2021
URL	https://drive.google.com/file/d/1CN5CDWPf4cLTT1NY9gyP-JxxvsCdzRG-/view


Description	AD-Manual Annotation of Radiology Images using VGG Image Annotator (VIA) online Course
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The course website lists our manual annotation tool (VIA). This provides a lot of exposure to our software, which is available for the world as an open source tool. This tool significantly speed up annotating work for professions that need to annotate large volume of visual data.
Year(s) Of Engagement Activity	2020
URL	https://folio47.wixsite.com/rp-course/radiology-preprocessor-workflow


Description	AD-TAP Outcome Presentation for Leiden University
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Presented the outcome of our collaboration with Leiden University. The team at Leiden University were extremely excited to see the results from our visual search engine. They said that they were "jumping like a child" after seeing the outcome and that this collaboration will lead to many new research projects in related to the Frank Scholten Archives.
Year(s) Of Engagement Activity	2021


Description	AEOLIAN Network workshop presentation
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Presentation describing a project undertaken within the National Librarian of Scotland's Fellowship in Digital Scholarship programme for 2020-1.
Year(s) Of Engagement Activity	2021
URL	https://www.aeolian-network.net/events/workshop-1-employing-machine-learning-and-artificial-intellig...


Description	AI4 LAM online conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Aimed at professionals in the LAM (Libraries, Archives and Museums sector, this was an online workshop teaching the use of several Visual AI tools and giving context to their application in this sector. Issues of attribution, bias and fairness were discussed as well as technical areas.
Year(s) Of Engagement Activity	2022
URL	https://sites.google.com/view/ai4lam/ai4lam-2022-virtual-event


Description	AI4LAM 2023 Annual Conference, Internet Archive Canada
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Conference for technical professionals and allied researchers in the Libraries, Archives and Museums sector, my participation included a presentation/hands on workshop and follow-on discussions, generating three separate requests for collaboration or more information on Visual AI project tools.
Year(s) Of Engagement Activity	2023
URL	https://ff2023.archive.org/pages/program/


Description	AI4LAM workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Introduce the use of visual AI for collections research, access and management. Using the example of collaborations between Oxford's Visual Geometry Group (VGG) and researchers and curators within the GLAM sector, the speaker will provide a hands-on introduction to VGG's open-source tools for visual search, classification, comparison and annotation.
Year(s) Of Engagement Activity	2021
URL	https://libereurope.eu/event/introduction-to-visual-ai-in-glams-workshop-series-on-applying-and-depl...


Description	AIUM 2021 Special Session Invited Speaker
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited speaker in Session with Title: Deep Learning Applications for New Ultrasound Techniques. Talk was pre-recorded with live questions. This primary audience was medical physicists rather than medical image analysis experts.
Year(s) Of Engagement Activity	2021


Description	AV4D: Visual Learning of Sounds in Spaces
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshop at the European Conference on Computer Vision (ECCV).
Year(s) Of Engagement Activity	2022
URL	https://av4d.org


Description	AWS Human-Machine Collaboratory conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Hosted by the Amazon Web Services (AWS)-funded Human-Machine Collaboratory at Oxford, Giles Bergel gave two talks (one alongside Dan Schofield, another Visual AI ambassador) on Visual AI collaborations and research in fields ranging from primatology to cultural heritage and media studies.
Year(s) Of Engagement Activity	2022
URL	https://www.mpls.ox.ac.uk/innovation-and-business-partnerships/human-machine-collaboration


Description	Aberystwyth Bibliographical Group
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Presented work in tracing woodcut illustrations, their original woodblocks and copies throughout the surviving corpus of British ballads and chapbooks. He discussed how woodcuts in these forms of cheap print served as visual brands for particular titles, genres or producers of cheap print, and demonstrated some of the bibliographical uses of their identification. Showed how computer vision software can strongly support these researches, and may be further applied to printed images of all kinds.
Year(s) Of Engagement Activity	2021
URL	https://www.hugofox.com/community/aberystwyth-bibliographical-group-19783/reports-of-recent-meetings...


Description	Co-chaired Royal Society-CAS Science and AI workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshop on Science and AI - I co-chaired and gave a talk.
Year(s) Of Engagement Activity	2023


Description	Co-organiser of ASMUS2021, a MICCAI workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Advances in Simplifying Medical UltraSound (ASMUS) 2021 is an international workshop that provides a forum for research topics around ultrasound image computing and computer-assisted interventions and robotic systems that utilize ultrasound imaging. It was held in conjunction with MICCAI 2021 in virtual form. Accepted papers were selected based on their scientific contribution, via a double-blind process involving written reviews from at least two external reviewers in addition to a member of the committee. The published work includes reports across a wide range of methodology, research and clinical applications. Advanced deep learning approaches for anatomy recognition, segmentation, registration and skill assessment are the dominant topics, in addition to ultrasound-specific new approaches in augmented reality and remote assistance. Three invited speakers were included in the workshop, and live demos of technologies were given. The meeting had 80+ attendees.
Year(s) Of Engagement Activity	2021
URL	https://miccai-ultrasound.github.io/#/asmus21


Description	Computational ethology and cultural evolution in wild chimpanzees, Department of Evolutionary Anthropology, University of Zurich
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation by Daniel Schofield as part of an Anthropology research symposium and conference. Introduced VGG tools to the Human Evolutionary Ecology Group, University of Zurich and facilitated discussion how visual AI tools could be used for human anthropological rearch.
Year(s) Of Engagement Activity	2023


Description	Computer vision for the investigation of ancient documents Saint-Étienne, 6-7 April 2023
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Two-day workshop on computer vision and historical documents, publicising work done by Visual AI project members, three of who presented (myself, Abhishek Dutta and Prasanna Sridhar) leading to follow-up events in Oxford and (potentially) Milan involving fellow-practitioners and to enhancements to Visual AI open-source software tools (Image Comparator) made by the host organisation.
Year(s) Of Engagement Activity	2023
URL	https://ro2i.hypotheses.org/351


Description	ConCode webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Presentation will highlight some of the ways that cultural heritage collections are using computer vision (or visual AI) for collections management and research, focussing particularly on the work of the Oxford Visual Geometry Group and its collaborators.
Year(s) Of Engagement Activity	2021,2022
URL	https://www.youtube.com/watch?v=d4XaZ4bur6Q


Description	Conference of European National Librarians 'AI in Libraries' Webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation by Horace Lee, with Giles Bergel, on Visual AI's WISE Image Search Engine to members of various national libraries in Europe. A Digital Curator from the British Library expressed her interest in using WISE for the British Library's image collection.
Year(s) Of Engagement Activity	2023
URL	https://www.cenl.org/network-group-ai-in-libraries-webinars-2023/


Description	Deep Discoveries webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Media (as a channel to the public)
Results and Impact	Discussed how computer vision excels at matching identical features within images and has made progress in broad classification tasks, though the middle ground remains challenging. Visual similarity, which is essential to human visual recognition, is challenging to conceptualise, measure and compute. Outlined some approaches to defining similarity in computational terms, drawing on the experience of the Visual Geometry Group (Oxford) in collaborating with cultural heritage researchers.
Year(s) Of Engagement Activity	2021
URL	https://www.eventbrite.co.uk/e/computer-vision-and-heritage-opportunities-for-research-and-engagemen...


Description	Digital Humanities Annual Conference - Tokyo
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	The project gave a paper and lead a workshop teaching the use of Visual AI software tools for the study of printed illusttrations.
Year(s) Of Engagement Activity	2022
URL	https://dh2022.adho.org/


Description	Digital Humanities Congress Sheffield
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Presentation of Visual AI collaborations on book history to a diverse audience of digital humanists to promote the sharing of knowledge, ideas and techniques within the digital humanities.
Year(s) Of Engagement Activity	2022
URL	https://www.dhi.ac.uk/dhc2022/


Description	Digital Humanities and Book History conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Project research, tools and collaborations presented to a digital humanities audience, working in particular in the field of book history, in which field Visual AI has a high profile
Year(s) Of Engagement Activity	2022
URL	https://dcsco-op.org/past-events/dhbh/


Description	Digital Humanities at Oxford Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Two separate presentations at a high-profile, diverse and well-attended event, both of which introduced participants to Visual AI tools and methods. One of the sessions included presentations from other Visual AI project members (Abhishek Dutta; David Pinto; and Prasanna Sridhar). There was some lively debate and some requests for further information about project software
Year(s) Of Engagement Activity	2023
URL	https://web.cvent.com/event/58fc430e-5294-4919-a7a3-c2b14f81a059/websitePage:bc9d128c-098d-49e0-97c9...


Description	Digital Humanities at Oxford Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	A presentation and two hands-on sessions on Visual AI tools and collaborations in Digital Humanities
Year(s) Of Engagement Activity	2022
URL	https://eng.ox.ac.uk/events/dhoxss-2022/


Description	Digitising, Cataloguing, Searching and Sharing the Medieval and Early-Modern Image: On-Going Projects & Different Methodologies
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Presentation on digitising, Cataloguing, Searching and Sharing the Medieval and Early-Modern Image: On-Going Projects & Different Methodologies
Year(s) Of Engagement Activity	2021


Description	Distinguished Keynote Speaker in Biomedical and Health Data Science in two joint conferences of IEEE EMBS BHI and BSN 2021
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote talk entitled: Simplifying interpretation and acquisition of ultrasound scans, delivered virtually. Abstract: Short Abstract: With the increased availability of low-cost and handheld ultrasound probes, there is interest in simplifying interpretation and acquisition of ultrasound scans through deep-learning based analysis so that ultrasound can be used more widely in healthcare. However, this is not just "all about the algorithm", and successful innovation requires inter-disciplinary thinking and collaborations. In this talk I will overview progress in this area drawing on examples of my laboratory's experiences of working with partners on multi-modal ultrasound imaging, and building assistive algorithms and devices for pregnancy health assessment in high-income and low-and-middle-income country settings. Emerging topics in this area will also be discussed.
Year(s) Of Engagement Activity	2021


Description	Edinburgh CDCS Digitised Documents Series workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Workshop to showcase the state of the art in Visual AI for cultural heritage and the digital humanities, and provide a hands-on introduction to some simple techniques for searching and classifying imagery in books, paintings, photographs and film.
Year(s) Of Engagement Activity	2022
URL	https://www.cdcs.ed.ac.uk/events/visual-ai-and-humanities-introduction


Description	Edinburgh CDCS workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Other audiences
Results and Impact	Workshop to showcase the state of the art in Visual AI for cultural heritage and the digital humanities, and provide a hands-on introduction to some simple techniques for searching and classifying imagery in books, paintings, photographs and film. Introduced participants to the study of bias within AI, as such controversial applications as facial recognition and automated image categorisation.
Year(s) Of Engagement Activity	2021
URL	https://www.cdcs.ed.ac.uk/events/workshop-chapbooks-national-library-scotland


Description	Fantastic Futures Conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Conference to aim to help participants discover: basic concepts of artificial intelligence in the GLAM sector, concrete uses and practices of AI in the GLAM sector, technologies and tools applicable to the GLAM sector's data and collections.
Year(s) Of Engagement Activity	2021
URL	https://www.bnf.fr/en/agendaEN/workshops-tutorials-les-futurs-fantastiques-3rd-conference-about-arti...


Description	Helping Computers See and Understand the World Around Us
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	Science Week Demonstration for Year 3 and Year 4 students at the Cutteslowe Primary School in Oxford
Year(s) Of Engagement Activity	2022
URL	https://www.cutteslowe.oxon.sch.uk/


Description	History of Printed Illustrations webinar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Presentation, drawing on a recent collaboration with the National Library of Scotland on their chapbook collections, demonstrated how computer vision (or 'visual AI') can support the study of printed illustrations. Demonstrated free software developed for these purposes; discuss its strengths and weaknesses; and consider its overall place within the illustration researcher's toolbox.
Year(s) Of Engagement Activity	2021
URL	https://www.cphc.org.uk/events/2021/7/8/hopin-webinar-ly8r3


Description	ICDAR Hip2021
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Workshop to bring together researchers from various fields working on document image acquisition, restoration, analysis, indexing, and retrieval to make these documents accessible in digital libraries.
Year(s) Of Engagement Activity	2021
URL	https://blog.sbb.berlin/hip2021/


Description	IIIF Community Call
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Media (as a channel to the public)
Results and Impact	Community Call discussing the University of Oxford Visual Geometry Group's work with IIIF and Machine Learning
Year(s) Of Engagement Activity	2021
URL	https://www.youtube.com/watch?v=KXE3-LD6xxI&t=1s


Description	Indiana University Booklab
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Hands-on workshop teaching the use of Visual AI software outputs to an audience of digital humanities practitioners and students. As well as discussions, this lead to an invitation to return for a longer period of time and to teach the tools in classroom and professional settings.
Year(s) Of Engagement Activity	2023
URL	https://booklab.indiana.edu/news-events/past-events/giles-bergel-2023.html


Description	International Computer Vision Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	International Computer Vision Summer School
Year(s) Of Engagement Activity	2022
URL	https://iplab.dmi.unict.it/icvss2022/


Description	International Conference on Computer Vision, 2023
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman was one of the General Chairs who organized the International Conference on Computer Vision (ICCV) in Paris, France. ICCV is one of the three principal international computer vision conferences. Over 7000 registered to attend the conference.
Year(s) Of Engagement Activity	2023
URL	https://iccv2023.thecvf.com/


Description	Invited talk at University of British Columbia
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Invited talk at the University of British Columbia, Canada. Also included visiting labs and follow up may be writing a grant together and exchange of students.
Year(s) Of Engagement Activity	2023


Description	Learning 3D Geometry
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Undergraduate students
Results and Impact	Lecture in the computer vision course at the University of Amsterdam.
Year(s) Of Engagement Activity	2022


Description	Learning 3D Geometry: From Fusion to Generation
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Talk at the CVPR CV4MR Workshop on Computer Vision for Mixed Reality.
Year(s) Of Engagement Activity	2023
URL	https://cv4mr.github.io


Description	Learning on Screen - BoB/TRilT Academic Engagement launch
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Project tools and collaborations advertised to researchers seeking to use one of the largest research databases of UK TV programmes, leading to follow up discussions.
Year(s) Of Engagement Activity	2022
URL	https://learningonscreen.ac.uk/guidance/bob-and-trilt-for-research/launch-event/


Description	Libraries Rewired: A CILIP Digital Transformation Event
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Abhishek Dutta and Prasanna Sridhar presented Visual AI software and demos to CILIP, the UK professional body for librarians and information professionals. the event raised awareness of Visual AI's work, and attracted the interest of a number of IT suppliers to the sector, who the team are following up with.
Year(s) Of Engagement Activity	2023
URL	https://librariesrewired.org.uk/


Description	London Rare Books Summer School - the Digital Book Historian's Toolkit
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Schools
Results and Impact	View of the landscape of digital research in book history, including bibliographic data and content management systems, data visualisation, systems for image sharing and annotation in libraries and archives, computer vision, and (semi-)automated collation. Instead of emphasising mastery of any particular technology, we encouraged computational thinking and digital experimentation to enhance historical research questions and information management.
Year(s) Of Engagement Activity	2021


Description	MIUA 2021 Conference - co-organiser
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	MIUA is a UK-based international conference for the communication of image processing and analysis research and its application to medical imaging and biomedicine. This was the 25th edition of the meeting which was held virtually. 40 papers were presented (27k downloads as of 09-03-2022). MIUA is the principal UK forum for communicating research progress within the community interested in image analysis applied to medicine and related biological science. The meeting is designed for the dissemination and discussion of research in medical image understanding and analysis, and aims to encourage the growth and raise the profile of this multi-disciplinary field by bringing together the various communities including among others:
Year(s) Of Engagement Activity	2021
URL	https://miua2021.com/


Description	Max Planck BibHerz Library Seminar: Reflections on the Digital Turn in the Humanities and the Sciences
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Seminar on how digital technologies have changed approaches to the discovery, study, and presentation of images; what impact the changing dynamic between the analogue and digital manifestation of the book or manuscript has on their working practices; and how this affected their use and questions that are asked or could be asked.
Year(s) Of Engagement Activity	2021
URL	https://www.biblhertz.it/3069990/seminar-series-reflections-on-the-digital-turn-in-the-humanities-an...


Description	NLS Digital Scholarship Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	15 attendees for an annual workshop, which sparked questions and ongoing discussions.
Year(s) Of Engagement Activity	2021


Description	National Academies roundtable on researcher access to data
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The National Academies Data Reform Round Table was a by invitation meeting that discussed some of the current challenges that researchers face with getting access to data for research due to current data protection regulation. The Department for Digital, Culture, Media and Sport (DCMS) was consulting on reforming the UK's data protection regime which formed part of a larger effort to implement the government's National Data Strategy, and specifically Mission 2 of that strategy: 'supporting a pro-growth and trusted data regime'. This issue affects researchers working in computer vision and medical image analysis and this was part of the discussion. In terms of impact/outcome, the meeting output fed into a response that hopefully will have influence (how direct can not be measured/it is too early to determine but I selected this box in the next question for this reason).
Year(s) Of Engagement Activity	2021


Description	National Academies' party conference event speaker
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Speaker on the (virtual) National Academies panel at the Liberal Democrat political party conference which focused on the theme of 'Becoming a "science superpower": will the UK be fit to tackle the next global crisis?'. Briefing: The panel discussions will address how the UK should approach the future, building resilience to future crises and achieving 'superpower' status. The panel will include leading experts representing the National Academies, as well as representatives from the political parties and a journalist Chair. Not aware of any direct impact (see next week) but these sessions are an important part of keeping an open and positive dialogue with MPs.
Year(s) Of Engagement Activity	2021


Description	National Librarian of Scotland's Lecture in Digital Scholarship
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Media (as a channel to the public)
Results and Impact	Introduced research on chapbooks using Visual AI and how machine vision can help others to understand printed heritage collections.
Year(s) Of Engagement Activity	2021
URL	https://www.youtube.com/watch?v=5jkq0iLzMvo&t=10s


Description	Neural Geometry and Rendering: Advances and the Common Objects in 3D Challenge?
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Workshop at the European Conference on Computer Vision (ECCV).
Year(s) Of Engagement Activity	2022
URL	https://ngr-co3d.github.io


Description	Office for National Statistics, Integrated Data Programme Advisory Group, Member,
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Other audiences
Results and Impact	The Office for National Statistics Integrated Data Programme Advisory Group offers advise to the ONS on its programme aimed at sharing data for pubic good with other organisations. I was invited due to my role as Chair of the Royal Society PETs science policy work together with my research interest in health data science/medical image analysis.
Year(s) Of Engagement Activity	2021,2022


Description	OxML - Oxford Machine Learning Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Gave a lecture at the OxML summer school on Deep Learning.
Year(s) Of Engagement Activity	2021
URL	https://www.oxfordml.school


Description	Practical Applications of IIIF Seminar: Image Registration and IIIF
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Public/other audiences
Results and Impact	Discussing the methods, challenges and possibilities of Image Registration.
Year(s) Of Engagement Activity	2021
URL	https://www.iiconservation.org/content/practical-applications-iiif-seminar-1-image-registration-and-...


Description	Presentation for British Museum Portable Antiquities Scheme
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Horace Lee gave a presentation to members of the British Museum Portable Antiquities Scheme (PAS) on the capabilities of Visual AI's WISE Image Search Engine and how it can be used in archaeology. This was part of an ongoing collaboration with the British Museum
Year(s) Of Engagement Activity	2023


Description	Presentation to The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences symposium "Documenting, Understanding, Preserving Cultural Heritage. Humanities and Digital Technologies for Shaping the Future", Florence, Italy, July 2023
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Presentation by Giles Bergel, Abhishek Dutta and Andrew Zisserman (Visual AI) and Rosario Carvalho et al on the Az Infinitum project, a collaboration with Visual AI that integrated the VGG Image Search Engine (VISE) software in a web application to allow the search of large collections of decorative Portuguese 'azulejo' tiles. The presentation and other outputs (a paper and web application) raised awareness of the use of Visual AI project software in the domains of art history, heritage science and digital humanities, impacting professionals in those domains and raising public interest in the use of computer vision to understand these historical materials.
Year(s) Of Engagement Activity	2023
URL	https://www.timemachine.eu/ltm-projects/az-infinitum-azulejo-indexation-and-referencing-system/


Description	Renaissance Society of America Day of Digital Learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	RSA DAY OF DIGITAL LEARNING. Featuring a varied menu of sessions involving hands-on, participatory work with digital tools and resources.
Year(s) Of Engagement Activity	2021
URL	https://rsaddl.hcommons.org/


Description	Renaissance Society of America Day of Digital Learning
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	An introduction to computer vision - the extraction of information from images - for the purposes of book and art history. Overview of the field, with particular reference to collaborative research performed by the Visual Geometry Group (VGG) at Oxford.
Year(s) Of Engagement Activity	2022
URL	https://rsa2022ddl.hcommons.org/main-page/rsa-ddl-2022-topics/


Description	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group - Chair
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group (policy report), Chair, 2017-19. Also Chair of follow-on to initial report, 2021-.
Year(s) Of Engagement Activity	2019,2020,2021,2022
URL	https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies


Description	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group - Chair
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group (policy report), Chair, 2017-19. Also Chair of follow-on to initial report, 2021-.
Year(s) Of Engagement Activity	2019,2020,2021,2022
URL	https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies


Description	Royal Society and DSIT Workshop on Science and AI Safety
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Royal Society and DSIT Workshop on Science and AI Safety including discussion meeting as well as red teaming activity with postgraduate students. The link below was a high profile output from part of the event. I provided opening comments for the event (but organisation was led by the Royal Society team and DSIT).
Year(s) Of Engagement Activity	2023
URL	https://time.com/6328851/scientists-training-ai-safety/


Description	Royal Society and US National Academy of Sciences Forum on Researcher Access to Data
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Forum description: The pandemic has demonstrated that there is strong public benefit derived from researchers having prompt access to a variety of data sources, such as data from public and government bodies, as well as private companies (in particular, tech companies). There is also significant interest in how we connect and link the different data sources. The Forum will address the evolution of researcher access to data; best practices and lessons learned from fields that are on the forefront of data sharing (i.e., climate studies, astrophysics, biomedicine); and challenges related to pressing societal problems such as online information (and misinformation), modeling for pandemics, and using data in emergencies.
Year(s) Of Engagement Activity	2023


Description	Sight and Sound Workshop at the IEEE Conference on Computer Vision and Pattern Recognition, 2021
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2021. This is the description of the workshop: While traditionally visual and audio data have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is appealing, and this workshop will cover recent advances in this direction. It will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Year(s) Of Engagement Activity	2021
URL	https://sightsound.org/2021/


Description	Sight and Sound Workshop at the IEEE Conference on Computer Vision and Pattern Recognition, 2023
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2023. This is the description of the workshop: While traditionally visual and audio data have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is appealing, and this workshop will cover recent advances in this direction. It will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Year(s) Of Engagement Activity	2023
URL	https://sightsound.org/2023/


Description	Sixth Form Schools Science Talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Gave talk to lower sixth form students at Magdalen College School on my research. This was part of their lecture series related to the lower sixth form project which provides them with experience of researching a topic. Lots of interesting questions particularly about the global health angle of the research/potential impact and ethics of using AI. In fact the quality of questions was much higher than most technical audience ones! Teacher followup said there was good discussion afterwards.
Year(s) Of Engagement Activity	2022


Description	Summer School on Artificial Intelligence, India
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Lectured at Summer School on "Recognizing Human Actions in Videos", followed by Q & A session.
Year(s) Of Engagement Activity	2021
URL	https://cvit.iiit.ac.in/summerschool2021/index.php


Description	Talk at the Machine Learning and Computer Vision Research Group at University of Bristol
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Presentation by Abhishek Dutta and Prasanna Sridhar on Visual AI project tools and workflows, in particular annotation and model training ('Manual Annotation of Images and Video using VIA'), leading to requests for information and further plans.
Year(s) Of Engagement Activity	2023
URL	https://uob-mavi.github.io/people/


Description	Talk at the Staff Meeting of History of Science Museum in Oxford on computer vision for heritage collection management and research
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	Presentation by Giles Bergel and Abhishek Dutta on Visual AI collaborations with cultural heritage organisations (libraries, museums and galleries) including software demos allowing visual search of digital collections. The Digital Collections manager, and other HSM staff, made appointments for follow up meetings and inquiries have been made to the Museum's IT suppliers.
Year(s) Of Engagement Activity	2023


Description	The Sixth Annual Conference for Research Software Engineering
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Research Software Engineers from other Universities got to learn about our methods and processes of developing software tools that are used widely all over the world.
Year(s) Of Engagement Activity	2022
URL	https://virtual.oxfordabstracts.com/#/event/3101/submission/70


Description	Understanding egocentric data in 3D
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	https://ego4d-data.org/workshops/cvpr23
Year(s) Of Engagement Activity	2023


Description	University of Illinois HRI Introduction to Computer Vision for Digital Humanists
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Online workshop delivering training in Visual AI software tools, leasing to discussion on their utility and a follow-up call with a prominent digital humanist working on historical newspapers
Year(s) Of Engagement Activity	2023
URL	https://mediaspace.illinois.edu/media/t/1_arib8duv/28379181


Description	University of Oxford Social Sciences Division 'Common Ground' seminar series: AI and Society,
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Dan Schofield have an introduction of Visual AI to division of social sciences, Oxford and engaged in discussions about ethics and how to foster collaborations between AI engineering teams and social science researchers in Oxford.
Year(s) Of Engagement Activity	2023
URL	https://www.socsci.ox.ac.uk/article/new-common-ground-seminar-series-starts-with-ai-and-society


Description	University of Stockholm Digital Humanities Now workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Showcase new and ongoing research in the broad Digital Humanities field.
Year(s) Of Engagement Activity	2021
URL	https://su.powerinit.com/Data/Event/EventTemplates/2602/?EventId=879


Description	VGG Image Search Engine (VISE)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Talk at the RKD Netherlands Institute for Art History. The RKD team have integrated our VISE image search engine software into their platform. In this event, all the contributors to the digital platform talked about their work and their software. Our VISE software was introduced to a wider group of international audience.
Year(s) Of Engagement Activity	2022
URL	https://rkd.nl/en/


Description	VisuAI Show and Tell 2021
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Presneted our Visual annotation and Visual search software to potential interetest reseachers, some of whom enquired further and later adopted tools in their research.
Year(s) Of Engagement Activity	2021


Description	Visual AI for ethology: chimpanzee behaviour analysis using deep learning, Department of Evolutionary Anthropology, University of Zurich
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Research presentation by Daniel Schofield to the University of Zurich Anthropology department, outlining computer vision applications for ethology as well as introducing Visual AI software.
Year(s) Of Engagement Activity	2023


Description	VisualAI Show and Tell
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	The event was for the University of Edinburgh. It showcased the software developed by the VisualAI team with the aims of publicising the open source software produced in the project, and of attracting potential collaborators.
Year(s) Of Engagement Activity	2021
URL	https://www.robots.ox.ac.uk/~vgg/projects/visualai/events.html#ST15621


Description	VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity	2021
URL	https://www.robots.ox.ac.uk/~vgg/data/voxceleb/interspeech2021.html


Description	VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity	2023
URL	https://mmai.io/datasets/voxceleb/voxsrc/interspeech2023.html


Description	VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 2022
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity	2022
URL	http://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/interspeech2022.html


Description	What do you learn after Developing, Maintaining and Supporting Research Software for 6 years?
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Talk at the Vision, Graphics and Learning (VGL) research group in the Department of Computer Science, University of York. The PhD and Postdocs in the VGL group of University of York became aware about the software development methods and practices for create research software tools used by millions all over the world.
Year(s) Of Engagement Activity	2022
URL	https://www.youtube.com/watch?v=8S0HbFX4HBM


Description	WikiWorkshop presentation of WISE image search engine
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Abhishek Dutta, Horace Lee, Prasanna Sridhar and Andrew Zisserman presented WISE, a multimodal search engine running on over 50 million images from Wikimedia Commons. This lead to follow-up meeting with the Wikimedia Foundation about how WISE can help the foundation to make images searchable, including to find harmful content.
Year(s) Of Engagement Activity	2023
URL	https://wikiworkshop.org/2023/#


Description	Workshop on Studying the Images of Popular Prints: Methods and Theory, Catholic University of Valencia
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Two-day workshop on popular Spanish 'pliegos', and related materials, including a hands on session teaching VGG tools applied to these materials by the Spanish Chapbooks project at Cambridge University, who were present. Outcomes included plans for further development of Cambridge and other Spanish resources and use of project software, and an invitation to speak to a similar project at the University of Geneva in 2025.
Year(s) Of Engagement Activity	2023
URL	http://biblioteca.cchs.csic.es/docs/Poster_Valencia_low.pdf


Description	Workshop: Introduction to Visual AI for Behavioural Research, Department of Anthropology, University of Oxford.
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	Hands-on workshop by Daniel Schofield to the University of Oxford Anthropology Department. introducing visual AI tools and core concepts for using computer vision in anthropological research.
Year(s) Of Engagement Activity	2023


Description	Workshop: Introduction to computer vision tools for primatology: How to annotate, detect and track, Kuching, Malaysia
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	2 hour workshop and presentation from Dan Schofield to attendees of the International Primatological Society introducing Visual AI tools for primatological research.
Year(s) Of Engagement Activity	2023
URL	https://ipskuching.com/programme/


Description	`A statistical learning perspective on reconstructing the 3D world
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Invited talk at the BrainWorlds Freiburg-Oxford Workshop.
Year(s) Of Engagement Activity	2023
URL	https://brainworlds.uni-freiburg.de

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications