Visual AI: An Open World Interpretable Visual Transformer

Lead Research Organisation: University of Oxford
Department Name: Engineering Science

Abstract

With the advent of deep learning and the availability of big data, it is now possible to train machine learning algorithms for a multitude of visual tasks, such as tagging personal image collections in the cloud, recognizing faces, and 3D shape scanning with phones. However, each of these tasks currently requires training a neural network on a very large image dataset specifically collected and labelled for that task. The resulting networks are good experts for the target task, but they only understand the 'closed world' experienced during training and can 'say' nothing useful about other content, nor can they be applied to other tasks without retraining, nor do they have an ability to explain their decisions or to recognise their limitations. Furthermore, current visual algorithms are usually 'single modal', they 'close their ears' to the other modalities (audio, text) that may be readily available.

The core objective of the Programme is to develop the next generation of audio-visual algorithms that does not have these limitations. We will carry out fundamental research to develop a Visual Transformer capable of visual analysis with the flexibility and interpretability of a human visual system, and aided by the other 'senses' - audio and text. It will be able to continually learn from raw data streams without requiring the traditional 'strong supervision' of a new dataset for each new task, and deliver and distill semantic and geometric information over a multitude of data types (for example, videos with audio, very large scale image and video datasets, and medical images with text records).

The Visual Transformer will be a key component of next generation AI, able to address multiple downstream audio-visual tasks, significantly superseding the current limitations of computer vision systems, and enabling new and far reaching applications.

A second objective addresses transfer and translation. We seek impact in a variety of other academic disciplines and industry which today greatly under-utilise the power of the latest computer vision ideas. We will target these disciplines to enable them to leapfrog the divide between what they use (or do not use) today which is dominated by manual review and highly interactive analysis frame-by-frame, to a new era where automated visual analytics of very large datasets becomes the norm. In short, our goal is to ensure that the newly developed methods are used by industry and academic researchers in other areas, and turned into products for societal and economic benefit. To this end open source software, datasets, and demonstrators will be disseminated on the project website.

The ubiquity of digital images and videos means that every UK citizen may potentially benefit from the Programme research in different ways. One example is smart audio-visual glasses, that can pay attention to a person talking by using their lip movements to mask out other ambient sounds. A second is an app that can answer visual questions (or retrieve matches) for text-queries over large scale audio-visual collections, such as a person's entire personal videos. A third is AI-guided medical screening, that can aid a minimally trained healthcare professional to perform medical scans.

Planned Impact

The proposed programme encompasses new methodology and applied research in computer vision and other modalities (audio, text) that will enable analysis and search of image and video content while learning new things, with human-like flexibility and interpretability. These capabilities will encourage end user take up of computer vision technologies and commercial interest in embedding these technologies in products.

The Programme will have Economic and Societal impact by
1. Enabling UK industry to leverage AI in their activities with a key strategic advantage.
2. Developing new and improved computer vision technologies that will require substantially less training data to solve problems and is thus suitable for commercialisation by a wide range of companies.
3. Enhancing the visual and audio capabilities and knowledge base of UK industries, including small ones.
4. Enhancing quality of life by improving, for instance, healthcare capabilities, surveillance, environmental monitoring, and the means of accessing and enjoying personal digital media.
5. Reducing the cost and risk of collecting manual annotations for deploying AI technology, especially for sensitive data such as medical records.
6. Collaborating directly with companies and organizations that we have already identified, and will work with over the course of the Programme.
7. Training the next generation of computer vision researchers who will be equipped to support the imaging needs of science, technology and wider society for the future.

Impact on Knowledge includes
1. Realisation of new approaches to essential computer vision technology, and the dissemination of research findings through publications, conference presentations, summer school teaching, and the distribution of open source software and image databases.
2. Sharing knowledge with industrial collaborators via Transfer and Application Projects (TAPs) and other activities leading to adoption of advanced computer vision methods across many disciplines of science, engineering and medicine that currently do not use them.
3. Communication of advances to a public audience through website articles, Show and Tell events, social and broadcast media, and other co-ordinated public understanding activities
 
Description 1-on-1 Engineers and Policy Fellowship discussion
Geographic Reach National 
Policy Influence Type Influenced training of practitioners or researchers
URL https://raeng.org.uk/policyfellowships
 
Description Chair of Royal Society Data Science Policy group leading to publication of a report "Science in the age of AI"
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a guidance/advisory committee
 
Description Royal Society National Academies Data Reform Round Table Consultation
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
 
Description Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group, Chair
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a guidance/advisory committee
Impact Quoting the aims from the report "We have three objectives for this report. Our first objective is that the use cases inspire those collecting and using data to consider the potential benefits of PETs for their own work, or in new collaborations with others. Second, for the evidence we present on barriers to adoption and standardisation to help inform policy decisions to encourage a marketplace for PETs. Finally, through our recommendations, we hope the UK will maximise the opportunity to be a global leader in PETs - both for data security and collaborative analysis - alongside emerging, coordinated efforts to implement PETs in other countries."
URL https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/From-Privacy-to-Part...
 
Description Royal Society Privacy Enhancing Technologies Working Group - policy report published (Chair)
Geographic Reach National 
Policy Influence Type Participation in a guidance/advisory committee
Impact The report has contributed to wider discussion of data sharing between government departments and a number of the recommendations have been followed up. It is well cited. A follow-on project is underway with the Alan Turing Institute which will report in 2022. The important message was to show that PETs are maturing as a technology and can be considered enablers to provided trusted sharing of data and to move the conversation away from security and accepting zero risk in sharing data. The work is relevant to not only may research area (health data science) but many other sectors which are data-driven.
URL https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/privacy-enhancing-te...
 
Description Biomedical Research Centre
Amount £89,000,000 (GBP)
Organisation National Institute for Health Research 
Sector Public
Country United Kingdom
Start 12/2022 
End 04/2027
 
Description EPX0401861 Turing AI World Leading Researcher Fellowship Studentship
Amount £110,541 (GBP)
Funding ID EP/Y530517/1 
Organisation United Kingdom Research and Innovation 
Sector Public
Country United Kingdom
Start 09/2023 
End 09/2028
 
Description Envisioning Dante c.1472- c.1630
Amount £805,620 (GBP)
Funding ID AH/W005220/1 
Organisation Arts & Humanities Research Council (AHRC) 
Sector Public
Country United Kingdom
Start 08/2022 
End 09/2025
 
Description Royal Society Research Professorship
Amount £1,400,000 (GBP)
Funding ID RSRP\R\241003 
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 03/2024 
End 03/2029
 
Description Royal Society Research Professorship Enhanced research Expenses
Amount £100,000 (GBP)
Funding ID RF\ERE\210331 
Organisation The Royal Society 
Sector Charity/Non Profit
Country United Kingdom
Start 09/2021 
End 03/2024
 
Description Studentship
Amount £154,725 (GBP)
Organisation Facebook 
Sector Private
Country United States
Start 09/2021 
End 09/2025
 
Description Toshiba 2021
Amount $200,000 (USD)
Organisation Toshiba 
Sector Private
Country Japan
Start 06/2021 
End 03/2023
 
Description Toshiba 2023
Amount £200,000 (GBP)
Organisation Toshiba 
Sector Private
Country Japan
Start 04/2023 
End 04/2025
 
Description Turing AI Fellowship: Ultra Sound Multi-Modal Video-based Human-Machine Collaboration
Amount £4,248,942 (GBP)
Funding ID EP/X040186/1 
Organisation United Kingdom Research and Innovation 
Sector Public
Country United Kingdom
Start 09/2023 
End 09/2028
 
Title CAIFE dataset and annotations 
Description The CAIFE dataset is a large fetal echocardiography dataset consisting of freehand video and sweep video, collated from multiple hospitals. A subset of this dataset has been manually annotated by cardiac view, and a large subset automatically labelled. The generation of the dataset was funded by the COCHE project but the dataset is used by other video analysis projects as well. Those projects have contributed annotations to enrich the resource. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? No  
Impact On-going 
 
Title Coreferenced Image Narratives Dataset 
Description Our Coreferenced Image Narratives (CIN) dataset contains 1880 images from the Localized Narratives dataset [1] that come with long-form text descriptions (narrations) and mouse traces. These images are originally a subset of the test and validation set of the Flickr30k dataset [2] . We annotated this subset with coreference chains and bounding boxes in the image that are linked with the textual coreference chains, and use them only for validation and testing. Note that we also include singletons (i.e., coreference chains of length one). [1] Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari; Connecting Vision and Language with Localized Narratives ; ECCV 2020. [2] Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, Svetlana Lazebnik; Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models ; IJCV 2017. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact The dataset allows extending and evaluating the abilities of the recent powerful large vision and language models. As it has been very recently published, there is only one publication from our group published in the top tier, NLP conference, EMNLP 2023 under the title "Semi-supervised multimodal coreference resolution in image narrations". 
URL https://github.com/VICO-UoE/CIN
 
Title EPIC Fields: Marrying 3D Geometry and Video Understanding 
Description We introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Similar to other datasets for neural rendering, EPIC Fields removes the complex and expensive step of reconstructing cameras using photogrammetry, and allows researchers to focus on more interesting modeling problems. We illustrate the challenge of photogrammetry in egocentric videos and propose several technical innovations to address them. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact Upcoming 
URL https://epic-kitchens.github.io/epic-fields/
 
Title EPIC-KITCHENS VISOR 
Description We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact The dataset can be used for audio event detection and the baseline code will be made publicly available. 
URL https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7/
 
Title Epic-Sounds: A Large-scale Dataset of Actions That Sound 
Description We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from visual labels, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2k non-categorised segments. We train and evaluate two state-of-the-art audio recognition models on our dataset, highlighting the importance of audio-only labels and the limitations of current models to recognise actions that sound. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact A standard benchmark for testing audio-visual models. Already being cited in major publications 
URL https://epic-kitchens.github.io/epic-sounds/
 
Title Image Change dataset 
Description Propose a scalable methodology for obtaining a large-scale change detection training dataset by leveraging existing object segmentation benchmarks. Introduce a co-attention based novel architecture that is able to implicitly determine correspondences between an image pair and find changes in the form of bounding box predictions. Contribute four evaluation datasets that cover a variety of domains and transformations, including synthetic image changes, real surveillance images of a 3D scene, and synthetic 3D scenes with camera motion. Evaluate our model on these four datasets and demonstrate zero-shot and beyond training transformation generalization. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023. Future impact to be determined. 
URL https://arxiv.org/pdf/2209.14341.pdf
 
Title Localizing Visual Sounds the Hard Way 
Description The objective of this work is to localize sound sources that are visible in a video without using manual annotations. Our key technical contribution is to show that, by training the network to explicitly discriminate challenging image fragments, even for images that do contain the object emitting the sound, we can significantly boost the localization performance. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact Localizing Visual Sounds the Hard Way Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman CVPR, 2021 
URL https://www.robots.ox.ac.uk/~vgg/research/lvs/
 
Title PASS: An ImageNet replacement for self-supervised pretraining without humans 
Description PASS is a large-scale image dataset that does not include any humans and which can be used for high-quality pretraining while significantly reducing privacy concerns. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact YM. Asano, C. Rupprecht, A. Zisserman, A. Vedaldi PASS: An ImageNet replacement for self-supervised pretraining without humans NeurIPS Dataset Track, 2021 
URL https://www.robots.ox.ac.uk/~vgg/data/pass/
 
Title PULSE dataset and annotations 
Description A multi-modal dataset consisting of fetal ultrasound video, gaze tracking data, probe movement data and sonographer audio for first, second and third trimester scans. Audio has been translated to text. A large subset of the ultrasound video is automatically annotated in terms of anatomy label (single label per frame). Manual annotation has been done on a smaller subset. This dataset was generated as part of the ERC Advanced Grant PULSE but has been used for research on UKRI projects which have also contributed some analysis methods for automatic annotation that have improved the value of the data set and annotations as a whole. The dataset is a private dataset. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? No  
Impact See outputs listed on the PULSE website and PURFECT webpages as examples. An ultrasound pre-trained model (PULSENet) has also been derived which is used as a backbone for other research. 
 
Title Semantic Shift Benchmark 
Description Demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. Following the success of modern deep learning systems on closed-set visual recognition tasks, a natural next challenge is open-set recognition (OSR) (Scheirer et al., 2013). In the closed-set setting, a model is tasked with recognizing a set of categories that remain the same during both training and testing phases. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Future impact to be determined 
URL https://www.robots.ox.ac.uk/~vgg/research/osr/#ssb_suite
 
Title Video Person-Clustering Dataset A multi-modal TV-shows and movies dataset 
Description VPCD contains multi-modal annotations (face, body and voice) for all primary and secondary characters from a range of diverse TV-shows and movies. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact A. Brown, V. Kalogeiton, A. Zisserman Face, Body, Voice: Video Person-Clustering with Multiple Modalities 
URL https://www.robots.ox.ac.uk/~vgg/data/Video_Person_Clustering//
 
Title Video-text Alignment HTM-Align dataset 
Description The objective is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Future impacts to be determined 
URL https://www.robots.ox.ac.uk/~vgg/research/tan/
 
Description NLS Chapbooks 
Organisation National Library of Scotland
Country United Kingdom 
Sector Academic/University 
PI Contribution We used our software to search and analysed the illustrations of the chapbooks.
Collaborator Contribution Partner provided chapbooks in large quatities.
Impact https://www.robots.ox.ac.uk/~vgg/research/chapbooks/
Start Year 2020
 
Description National Consortium of Intelligent Medical Imaging 
Organisation National Consortium of Intelligent Medical Imaging
Sector Academic/University 
PI Contribution A VisualAI postdoc (Jianbo Jiao) is providing expertise for image-based building deep learning models to assess COVID19 deterioration for hospital-based patients.
Collaborator Contribution NCIMI is providing access to COVID19 data for a TAP project.
Impact An initial evaluation of predictive modelling was performed using available covid-19 data. However due to the small size of the data, and the fact that covid treatments for patients have significantly improved and better pathways for patients are in place it was deemed not worth pursuing this work further beyond the preliminary study. A report was written but has not been published.
Start Year 2021
 
Description TAP VAI-02 1516 Project 
Organisation University of Copenhagen
Country Denmark 
Sector Academic/University 
PI Contribution We created a visual search engine using images and metadata supplied by Matilde Malaspina at University of Copenhagen and Barbara Tramelli from University of Venice.
Collaborator Contribution Partner provided images and metadata.
Impact A talk at Venice Centre for Digital and Public Humanities (VeDPH) on 9th Dec. 2020
Start Year 2020
 
Description TAP-VAI-03 16cIllustration Project 
Organisation Ca' Foscari University of Venice
Country Italy 
Sector Academic/University 
PI Contribution We created a visually searchable database (https://www.robots.ox.ac.uk/~vgg/research/16ci/lyon/) of 16th century illustrations printed in Lyon.
Collaborator Contribution Partner provided images and metadata.
Impact The researchers at Venice Centre for Digital and Public Humanities are using this visual search engine as a research support tool.
Start Year 2021
 
Description TAP-VAI-04 Frank-Scholten Archive 
Organisation Leiden University
Country Netherlands 
Sector Academic/University 
PI Contribution Using our VISE software, we found a match between all the photographs and their corresponding negative in the Frank-Scholten image archive.
Collaborator Contribution They provide Dataset containing photographs and negatives captured by Frank-Scholten.
Impact tbc
Start Year 2021
 
Description TAP-VAI-08 Fish Pool Trajectory 
Organisation University of Oxford
Department Department of Zoology
Country United Kingdom 
Sector Academic/University 
PI Contribution We are developing tools and workflow to detect and track a Picasso triggerfish moving in a fish tank to find the food target.
Collaborator Contribution They provide videos dataset showing Picasso triggerfish in a fish pool.
Impact tbc
Start Year 2021
 
Description TAP-VAI-09 Fish Tank Obstacles 
Organisation University of Oxford
Department Department of Zoology
Country United Kingdom 
Sector Academic/University 
PI Contribution We are developing tools and workflow to detect and track Picasso triggerfish navigating through obstacles to reach a food target in a fish tank.
Collaborator Contribution They provide videos dataset showing Picasso triggerfish in a fish tank containing obstacles.
Impact tbc
Start Year 2021
 
Description TAP-VAI-10 Czech National Library/ Czech Academy of Sciences 
Organisation National Library of the Czech Republic
Country Czech Republic 
Sector Public 
PI Contribution We are providing technical support to the RKD team for implementing our VISE software in their platform.
Collaborator Contribution They are using our software tool (VISE)
Impact tbc
Start Year 2021
 
Description TAP-VAI-11 RKD 
Organisation Netherlands Institute for Art History
Country Netherlands 
Sector Public 
PI Contribution We are providing technical support to the RKD for implementing visual image search feature in the public facing web portal and internal research using our VGG Image Search Engine (VISE) software (https://www.robots.ox.ac.uk/~vgg/software/vise/).
Collaborator Contribution The RKD Provide images in millions and they are now using our VISE software for visual search functionality.
Impact Not yet.
Start Year 2021
 
Title Audio-visual synchronisation 
Description The software enables: Audio-visual synchronisation. Requires a model to relate changes in the visual and audio streams. Prior work focused primarily on the synchronisation of talking head videos. In contrast, open-domain videos often have a small visual indication, i.e. sparse in space. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact Paper in British Machine Vision Conference (BMVC), 2022. Future impacts to be determined. 
URL https://iashin.ai/SparseSync
 
Title Audio-visual synchronisation - Synchformer 
Description An audio-visual synchronization model: the inputs are the audio and visual streams of a video, and the output is the temporal offset. The approach is applicable to both dense and sparse (in time and space) audio-visual synchronization cues (e.g. a person talking (dense in time) or a dog barking (sparse in time)). A particular advantage of the model and training is that it decouples feature extraction from synchronization modeling through multi-modal segment-level contrastive pre-training. 
Type Of Technology Software 
Year Produced 2024 
Open Source License? Yes  
Impact Paper at ICASSP 2024. Also won Amazon synchronization challenge, https://wacv2024-workshop-quality-iva.github.io/workshop-quality-iva/index.html#Competition Future impacts to be determined. 
URL https://www.robots.ox.ac.uk/~vgg/research/synchformer/
 
Title Auditory Slow-Fast 
Description Recognising actions using auditory signal only 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Paper won outstanding paper at ICASSP 2021 - 3 papers selected out of 1400 papers. Well-referenced -46 stars. In a followup work by Deepmind [https://arxiv.org/pdf/2111.12124.pdf] this work is referred to as: "We find the Slowfast architecture is good at learning rich repre- sentations required by different domains" extending this work to speech and music audio. 
URL https://github.com/ekazakos/auditory-slow-fast
 
Title EPIC Fields Code 
Description This section contains the pipeline for the dataset introduced in our paper, "EPIC Fields: Marrying 3D Geometry and Video Understanding." We aim to bridge the domains of 3D geometry and video understanding, leading to innovative advancements in both areas. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact Upcoming 
URL https://github.com/epic-kitchens/epic-Fields-code
 
Title Find Identical Images (FII) 
Description Identical images have the same image dimension (i.e. image width, image height, number of colour channels) and same pixel value in all corresponding pixel locations. FII is a command line tool to find all identical images in a folder. It can also find images that are common in two folders. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact tbc 
 
Title Follow Things Around 
Description Software to track "things" (e.g. animals) in a video. The input is the video, the output is a text file specifying a bounding box of the `thing' or `things' in each frame. The method uses `tracking by detection', meaning that no manual annotation is required on the video. Instead a detector for the `thing' is required (and detectors are available pre-trained for multiple classes of animals). Follow Things Around is provided as a Jupyter Notebook for Google Colab. It runs in a web browser, without the need for a GPU. A user can access their data for tracking on their Google Drive. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact It is too early to say. 
URL https://www.robots.ox.ac.uk/~vgg/software/follow-things-around/
 
Title Generalised Visual Counting in Images 
Description Our goal is to develop a generalised visual object counting system, that augments humans' ability for recognising the number of objects in a visual scene. Specifically, generalised visual object counting refers to the problem of identifying the number of the salient objects of arbitrary semantic class in an image (i.e. open-world visual object counting) with arbitrary number of instance "exemplars" provided by the end user, to refer to the particular objects to be counted, i.e. from zero-shot to few-shot object counting. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact Future impact to be determined. 
URL https://arxiv.org/pdf/2208.13721.pdf
 
Title Generalized Category Discovery 
Description We present a new setting: 'Generalized Category Discovery' and a method to tackle it. Our setting can be succinctly described as: given a dataset, a subset of which has class labels, categorize all unlabelled images in the dataset. The unlabelled images may come from labelled or novel classes. Our method leverages contrastively trained vision transformers to assign labels directly through clustering. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact Future impact to be determined 
URL https://github.com/prajwalkr/vtp#readme
 
Title Image Counterfeit Spotter 
Description Counterfeit Spotter compares images of suspicious products with a reference image and confirm if it is a real or a fake within seconds, right in your browser. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact Still receiving feedback and reporting 
URL https://www.robots.ox.ac.uk/~vgg/software/image-compare/counterfeit-spotter/#usecases
 
Title ImageCompare 
Description Image Compare is a lightweight, standalone and offline application to visually compare a pair of images and highlight their differences. This application can be used in desktop computers and mobile phones without requiring installation as it runs entires in a web browser. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact tbc 
 
Title Lip Reading 
Description To learn strong lip reading models that can recognise speech in silent videos. 
Type Of Technology Software 
Year Produced 2022 
Impact Research shown the best models achieve state-of-the-art results, outperforming prior work trained on public data by a significant margin, and even industrial models trained on orders of magnitude more data. We have also designed a Visual Speech Detection model on top of our lip reading system that obtains state-of-the-art results on this task and even outperforms several audio-visual baselines. 
URL https://www.robots.ox.ac.uk/~vgg/research/vtp-for-lip-reading/
 
Title List Annotator (LISA) 
Description List Annotator (LISA) is a standalone and light-weight HTML/CSS/JavaScript based application to efficiently annotate a large list of images. LISA is an open source project developed and maintained by the Visual Geometry Group (VGG) and released under a license that grants its users the freedom to use it for any purpose. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact tbc 
 
Title Motion Grouping 
Description This software implements the model as described in the paper. It includes a pre-trained model and inference code to apply to downstream images, as well as the training code to train the model from scratch. It also includes code to evaluate and benchmark the results against existing datasets (DAVIS2016, FBMS59, SegTrackv2, MoCA). 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This code accompanies the paper: Self-supervised Video Object Segmentation by Motion Grouping Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie. ICCV 2021 
URL https://oxris.ox.ac.uk/viewobject.html?id=1190260&cid=1
 
Title VGG Image Annotator (VIA) 
Description VGG Image Annotator is a simple and standalone manual annotation software for image, audio and video. VIA runs in a web browser and does not require any installation or setup. The complete VIA software fits in a single self-contained HTML page of size less than 400 Kilobyte that runs as an offline application in most modern web browsers. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact tbc 
 
Title VGG Image Search Engine (VISE) 
Description VGG Image Search Engine (VISE) is a free and open source software for visual search of a large number of images using an image as a search query. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact tbc 
 
Title VGG Visual Tracker (VV 
Description VGG Visual Tracker (VVT) is a tool for creating bounding box annotations on videos in a semi-automatic fashion, using class agnostic object trackers. VVT runs on modern web browsers (Chrome 65+, Firefox 60+, Safari 11+) and does not require any installation or setup. VVT is a variation of the VGG Image Annotator (VIA) v3 tool and uses the same data format. So, if you are already using VIA v3, the annotations are interoperable with your existing workflow. No changes required. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact tbc 
 
Title Visual Analysis of Chapbooks 
Description The chapbooks were produced cheaply to create everyday reading material and were the most popular reading material for the masses [1]. This dataset has been made freely available by the National Library of Scotland (NLS) 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact Reduced printing costs, these woodcuts were reused across multiple chapbooks. Helped researchers pursue many related research questions using software tools based on computer vision. 
URL https://data.nls.uk/data/digitised-collections/chapbooks-printed-in-scotland/
 
Title m-bain/whisperX: v3.0.0 
Description batched inference with faster-whisper backend 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact This repository provides fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization. 25 pull requests 
URL https://zenodo.org/record/7876369
 
Description 'Humanist in the Loop: Computer Vision by Example for the Study of Early Printed Books', University of Helsinki Digital Humanities Seminar. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A presentation to a historical research group in Computational History who are experimenting in computer vision methods themselves. The presentation included comparison of our respective measures and discussion of common challenges and potential solutions.
Year(s) Of Engagement Activity 2023
URL https://www.helsinki.fi/en/digital-humanities/teaching/digital-humanities-research-seminar
 
Description (1) VIA: Image and Video Annotation; (2) Image Comparator; (3) Image search and retrieval 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Show and Tell Event at the Oxford Big Data Institute. Researchers at the Big Data Institute are now aware about our computer vision tools that can significantly improve their existing research workflow.
Year(s) Of Engagement Activity 2022
URL https://www.bdi.ox.ac.uk/
 
Description - How to study early printed books with computer vision: a practical introduction, University of Newcastle-upon-Tyne 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation on and hands-on workshop with Visual AI project software and showcase of successful collaborations in this domain. Outputs included plans for uptake and further development of software on external research projects,
Year(s) Of Engagement Activity 2023
URL https://www.eventbrite.co.uk/e/how-to-study-early-printed-books-with-computer-vision-a-practical-int...
 
Description ACH talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Forum for conversations on an expansive definition of digital humanities in a broad array of subject areas, methods, and communities of practice.
Year(s) Of Engagement Activity 2021
URL https://drive.google.com/file/d/1CN5CDWPf4cLTT1NY9gyP-JxxvsCdzRG-/view
 
Description AD-Manual Annotation of Radiology Images using VGG Image Annotator (VIA) online Course 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The course website lists our manual annotation tool (VIA). This provides a lot of exposure to our software, which is available for the world as an open source tool. This tool significantly speed up annotating work for professions that need to annotate large volume of visual data.
Year(s) Of Engagement Activity 2020
URL https://folio47.wixsite.com/rp-course/radiology-preprocessor-workflow
 
Description AD-TAP Outcome Presentation for Leiden University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presented the outcome of our collaboration with Leiden University.
The team at Leiden University were extremely excited to see the results from our visual search engine. They said that they were "jumping like a child" after seeing the outcome and that this collaboration will lead to many new research projects in related to the Frank Scholten Archives.
Year(s) Of Engagement Activity 2021
 
Description AEOLIAN Network workshop presentation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Presentation describing a project undertaken within the National Librarian of Scotland's Fellowship in Digital Scholarship programme for 2020-1.
Year(s) Of Engagement Activity 2021
URL https://www.aeolian-network.net/events/workshop-1-employing-machine-learning-and-artificial-intellig...
 
Description AI4 LAM online conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Aimed at professionals in the LAM (Libraries, Archives and Museums sector, this was an online workshop teaching the use of several Visual AI tools and giving context to their application in this sector. Issues of attribution, bias and fairness were discussed as well as technical areas.
Year(s) Of Engagement Activity 2022
URL https://sites.google.com/view/ai4lam/ai4lam-2022-virtual-event
 
Description AI4LAM 2023 Annual Conference, Internet Archive Canada 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Conference for technical professionals and allied researchers in the Libraries, Archives and Museums sector, my participation included a presentation/hands on workshop and follow-on discussions, generating three separate requests for collaboration or more information on Visual AI project tools.
Year(s) Of Engagement Activity 2023
URL https://ff2023.archive.org/pages/program/
 
Description AI4LAM workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Introduce the use of visual AI for collections research, access and management. Using the example of collaborations between Oxford's Visual Geometry Group (VGG) and researchers and curators within the GLAM sector, the speaker will provide a hands-on introduction to VGG's open-source tools for visual search, classification, comparison and annotation.
Year(s) Of Engagement Activity 2021
URL https://libereurope.eu/event/introduction-to-visual-ai-in-glams-workshop-series-on-applying-and-depl...
 
Description AIUM 2021 Special Session Invited Speaker 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Invited speaker in Session with Title: Deep Learning Applications for New Ultrasound Techniques. Talk was pre-recorded with live questions.
This primary audience was medical physicists rather than medical image analysis experts.
Year(s) Of Engagement Activity 2021
 
Description AV4D: Visual Learning of Sounds in Spaces 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop at the European Conference on Computer Vision (ECCV).
Year(s) Of Engagement Activity 2022
URL https://av4d.org
 
Description AWS Human-Machine Collaboratory conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Hosted by the Amazon Web Services (AWS)-funded Human-Machine Collaboratory at Oxford, Giles Bergel gave two talks (one alongside Dan Schofield, another Visual AI ambassador) on Visual AI collaborations and research in fields ranging from primatology to cultural heritage and media studies.
Year(s) Of Engagement Activity 2022
URL https://www.mpls.ox.ac.uk/innovation-and-business-partnerships/human-machine-collaboration
 
Description Aberystwyth Bibliographical Group 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Presented work in tracing woodcut illustrations, their original woodblocks and copies throughout the surviving corpus of British ballads and chapbooks. He discussed how woodcuts in these forms of cheap print served as visual brands for particular titles, genres or producers of cheap print, and demonstrated some of the bibliographical uses of their identification. Showed how computer vision software can strongly support these researches, and may be further applied to printed images of all kinds.
Year(s) Of Engagement Activity 2021
URL https://www.hugofox.com/community/aberystwyth-bibliographical-group-19783/reports-of-recent-meetings...
 
Description Co-chaired Royal Society-CAS Science and AI workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop on Science and AI - I co-chaired and gave a talk.
Year(s) Of Engagement Activity 2023
 
Description Co-organiser of ASMUS2021, a MICCAI workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Advances in Simplifying Medical UltraSound (ASMUS) 2021 is an international workshop that provides a forum for research topics around ultrasound image computing and computer-assisted interventions and robotic systems that utilize ultrasound imaging. It was held in conjunction with MICCAI 2021 in virtual form.

Accepted papers were selected based on their scientific contribution, via a double-blind process involving written reviews from at least two external reviewers in addition to a member of the committee.
The published work includes reports across a wide range of methodology, research and clinical applications. Advanced deep learning approaches for anatomy recognition, segmentation, registration and skill assessment are the dominant topics, in addition to ultrasound-specific new approaches in augmented reality and remote assistance.
Three invited speakers were included in the workshop, and live demos of technologies were given. The meeting had 80+ attendees.
Year(s) Of Engagement Activity 2021
URL https://miccai-ultrasound.github.io/#/asmus21
 
Description Computational ethology and cultural evolution in wild chimpanzees, Department of Evolutionary Anthropology, University of Zurich 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation by Daniel Schofield as part of an Anthropology research symposium and conference. Introduced VGG tools to the Human Evolutionary Ecology Group, University of Zurich and facilitated discussion how visual AI tools could be used for human anthropological rearch.
Year(s) Of Engagement Activity 2023
 
Description Computer vision for the investigation of ancient documents Saint-Étienne, 6-7 April 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Two-day workshop on computer vision and historical documents, publicising work done by Visual AI project members, three of who presented (myself, Abhishek Dutta and Prasanna Sridhar) leading to follow-up events in Oxford and (potentially) Milan involving fellow-practitioners and to enhancements to Visual AI open-source software tools (Image Comparator) made by the host organisation.
Year(s) Of Engagement Activity 2023
URL https://ro2i.hypotheses.org/351
 
Description ConCode webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Presentation will highlight some of the ways that cultural heritage collections are using computer vision (or visual AI) for collections management and research, focussing particularly on the work of the Oxford Visual Geometry Group and its collaborators.
Year(s) Of Engagement Activity 2021,2022
URL https://www.youtube.com/watch?v=d4XaZ4bur6Q
 
Description Conference of European National Librarians 'AI in Libraries' Webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation by Horace Lee, with Giles Bergel, on Visual AI's WISE Image Search Engine to members of various national libraries in Europe. A Digital Curator from the British Library expressed her interest in using WISE for the British Library's image collection.
Year(s) Of Engagement Activity 2023
URL https://www.cenl.org/network-group-ai-in-libraries-webinars-2023/
 
Description Deep Discoveries webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Discussed how computer vision excels at matching identical features within images and has made progress in broad classification tasks, though the middle ground remains challenging. Visual similarity, which is essential to human visual recognition, is challenging to conceptualise, measure and compute. Outlined some approaches to defining similarity in computational terms, drawing on the experience of the Visual Geometry Group (Oxford) in collaborating with cultural heritage researchers.
Year(s) Of Engagement Activity 2021
URL https://www.eventbrite.co.uk/e/computer-vision-and-heritage-opportunities-for-research-and-engagemen...
 
Description Digital Humanities Annual Conference - Tokyo 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact The project gave a paper and lead a workshop teaching the use of Visual AI software tools for the study of printed illusttrations.
Year(s) Of Engagement Activity 2022
URL https://dh2022.adho.org/
 
Description Digital Humanities Congress Sheffield 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Presentation of Visual AI collaborations on book history to a diverse audience of digital humanists to promote the sharing of knowledge, ideas and techniques within the digital humanities.
Year(s) Of Engagement Activity 2022
URL https://www.dhi.ac.uk/dhc2022/
 
Description Digital Humanities and Book History conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Project research, tools and collaborations presented to a digital humanities audience, working in particular in the field of book history, in which field Visual AI has a high profile
Year(s) Of Engagement Activity 2022
URL https://dcsco-op.org/past-events/dhbh/
 
Description Digital Humanities at Oxford Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Two separate presentations at a high-profile, diverse and well-attended event, both of which introduced participants to Visual AI tools and methods. One of the sessions included presentations from other Visual AI project members (Abhishek Dutta; David Pinto; and Prasanna Sridhar). There was some lively debate and some requests for further information about project software
Year(s) Of Engagement Activity 2023
URL https://web.cvent.com/event/58fc430e-5294-4919-a7a3-c2b14f81a059/websitePage:bc9d128c-098d-49e0-97c9...
 
Description Digital Humanities at Oxford Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact A presentation and two hands-on sessions on Visual AI tools and collaborations in Digital Humanities
Year(s) Of Engagement Activity 2022
URL https://eng.ox.ac.uk/events/dhoxss-2022/
 
Description Digitising, Cataloguing, Searching and Sharing the Medieval and Early-Modern Image: On-Going Projects & Different Methodologies 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Other audiences
Results and Impact Presentation on digitising, Cataloguing, Searching and Sharing the Medieval and Early-Modern Image: On-Going Projects & Different Methodologies
Year(s) Of Engagement Activity 2021
 
Description Distinguished Keynote Speaker in Biomedical and Health Data Science in two joint conferences of IEEE EMBS BHI and BSN 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Keynote talk entitled: Simplifying interpretation and acquisition of ultrasound scans, delivered virtually.
Abstract:
Short Abstract:
With the increased availability of low-cost and handheld ultrasound probes, there is interest in simplifying interpretation and acquisition of ultrasound scans
through deep-learning based analysis so that ultrasound can be used more widely in healthcare. However, this is not just "all about the algorithm", and successful innovation
requires inter-disciplinary thinking and collaborations.
In this talk I will overview progress in this area drawing on examples of my laboratory's experiences of working with partners on multi-modal ultrasound imaging, and building
assistive algorithms and devices for pregnancy health assessment in high-income and low-and-middle-income country settings. Emerging topics in this area will also be discussed.
Year(s) Of Engagement Activity 2021
 
Description Edinburgh CDCS Digitised Documents Series workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Workshop to showcase the state of the art in Visual AI for cultural heritage and the digital humanities, and provide a hands-on introduction to some simple techniques for searching and classifying imagery in books, paintings, photographs and film.
Year(s) Of Engagement Activity 2022
URL https://www.cdcs.ed.ac.uk/events/visual-ai-and-humanities-introduction
 
Description Edinburgh CDCS workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact Workshop to showcase the state of the art in Visual AI for cultural heritage and the digital humanities, and provide a hands-on introduction to some simple techniques for searching and classifying imagery in books, paintings, photographs and film. Introduced participants to the study of bias within AI, as such controversial applications as facial recognition and automated image categorisation.
Year(s) Of Engagement Activity 2021
URL https://www.cdcs.ed.ac.uk/events/workshop-chapbooks-national-library-scotland
 
Description Fantastic Futures Conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Conference to aim to help participants discover: basic concepts of artificial intelligence in the GLAM sector, concrete uses and practices of AI in the GLAM sector, technologies and tools applicable to the GLAM sector's data and collections.
Year(s) Of Engagement Activity 2021
URL https://www.bnf.fr/en/agendaEN/workshops-tutorials-les-futurs-fantastiques-3rd-conference-about-arti...
 
Description Helping Computers See and Understand the World Around Us 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Science Week Demonstration for Year 3 and Year 4 students at the Cutteslowe Primary School in Oxford
Year(s) Of Engagement Activity 2022
URL https://www.cutteslowe.oxon.sch.uk/
 
Description History of Printed Illustrations webinar 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Presentation, drawing on a recent collaboration with the National Library of Scotland on their chapbook collections, demonstrated how computer vision (or 'visual AI') can support the study of printed illustrations. Demonstrated free software developed for these purposes; discuss its strengths and weaknesses; and consider its overall place within the illustration researcher's toolbox.
Year(s) Of Engagement Activity 2021
URL https://www.cphc.org.uk/events/2021/7/8/hopin-webinar-ly8r3
 
Description ICDAR Hip2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Workshop to bring together researchers from various fields working on document image acquisition, restoration, analysis, indexing, and retrieval to make these documents accessible in digital libraries.
Year(s) Of Engagement Activity 2021
URL https://blog.sbb.berlin/hip2021/
 
Description IIIF Community Call 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Community Call discussing the University of Oxford Visual Geometry Group's work with IIIF and Machine Learning
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=KXE3-LD6xxI&t=1s
 
Description Indiana University Booklab 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Hands-on workshop teaching the use of Visual AI software outputs to an audience of digital humanities practitioners and students. As well as discussions, this lead to an invitation to return for a longer period of time and to teach the tools in classroom and professional settings.
Year(s) Of Engagement Activity 2023
URL https://booklab.indiana.edu/news-events/past-events/giles-bergel-2023.html
 
Description International Computer Vision Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact International Computer Vision Summer School
Year(s) Of Engagement Activity 2022
URL https://iplab.dmi.unict.it/icvss2022/
 
Description International Conference on Computer Vision, 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Andrew Zisserman was one of the General Chairs who organized the International Conference on Computer Vision (ICCV) in Paris, France. ICCV is one of the three principal international computer vision conferences. Over 7000 registered to attend the conference.
Year(s) Of Engagement Activity 2023
URL https://iccv2023.thecvf.com/
 
Description Invited talk at University of British Columbia 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Invited talk at the University of British Columbia, Canada. Also included visiting labs and follow up may be writing a grant together and exchange of students.
Year(s) Of Engagement Activity 2023
 
Description Learning 3D Geometry 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Undergraduate students
Results and Impact Lecture in the computer vision course at the University of Amsterdam.
Year(s) Of Engagement Activity 2022
 
Description Learning 3D Geometry: From Fusion to Generation 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Talk at the CVPR CV4MR Workshop on Computer Vision for Mixed Reality.
Year(s) Of Engagement Activity 2023
URL https://cv4mr.github.io
 
Description Learning on Screen - BoB/TRilT Academic Engagement launch 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact Project tools and collaborations advertised to researchers seeking to use one of the largest research databases of UK TV programmes, leading to follow up discussions.
Year(s) Of Engagement Activity 2022
URL https://learningonscreen.ac.uk/guidance/bob-and-trilt-for-research/launch-event/
 
Description Libraries Rewired: A CILIP Digital Transformation Event 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Abhishek Dutta and Prasanna Sridhar presented Visual AI software and demos to CILIP, the UK professional body for librarians and information professionals. the event raised awareness of Visual AI's work, and attracted the interest of a number of IT suppliers to the sector, who the team are following up with.
Year(s) Of Engagement Activity 2023
URL https://librariesrewired.org.uk/
 
Description London Rare Books Summer School - the Digital Book Historian's Toolkit 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact View of the landscape of digital research in book history, including bibliographic data and content management systems, data visualisation, systems for image sharing and annotation in libraries and archives, computer vision, and (semi-)automated collation. Instead of emphasising mastery of any particular technology, we encouraged computational thinking and digital experimentation to enhance historical research questions and information management.
Year(s) Of Engagement Activity 2021
 
Description MIUA 2021 Conference - co-organiser 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact MIUA is a UK-based international conference for the communication of image processing and analysis research and its application to medical imaging and biomedicine. This was the 25th edition of the meeting which was held virtually. 40 papers were presented (27k downloads as of 09-03-2022). MIUA is the principal UK forum for communicating research progress within the community interested in image analysis applied to medicine and related biological science. The meeting is designed for the dissemination and discussion of research in medical image understanding and analysis, and aims to encourage the growth and raise the profile of this multi-disciplinary field by bringing together the various communities including among others:
Year(s) Of Engagement Activity 2021
URL https://miua2021.com/
 
Description Max Planck BibHerz Library Seminar: Reflections on the Digital Turn in the Humanities and the Sciences 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Seminar on how digital technologies have changed approaches to the discovery, study, and presentation of images; what impact the changing dynamic between the analogue and digital manifestation of the book or manuscript has on their working practices; and how this affected their use and questions that are asked or could be asked.
Year(s) Of Engagement Activity 2021
URL https://www.biblhertz.it/3069990/seminar-series-reflections-on-the-digital-turn-in-the-humanities-an...
 
Description NLS Digital Scholarship Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact 15 attendees for an annual workshop, which sparked questions and ongoing discussions.
Year(s) Of Engagement Activity 2021
 
Description National Academies roundtable on researcher access to data 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The National Academies Data Reform Round Table was a by invitation meeting that discussed some of the current challenges that researchers face with getting access to data for research due to current data protection regulation. The Department for Digital, Culture, Media and Sport (DCMS) was consulting on
reforming the UK's data protection regime which formed part of a larger effort to implement the government's National Data Strategy, and specifically Mission 2 of that strategy: 'supporting a pro-growth and trusted data regime'. This issue affects researchers working in computer vision and medical image analysis and this was part of the discussion.

In terms of impact/outcome, the meeting output fed into a response that hopefully will have influence (how direct can not be measured/it is too early to determine but I selected this box in the next question for this reason).
Year(s) Of Engagement Activity 2021
 
Description National Academies' party conference event speaker 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Speaker on the (virtual) National Academies panel at the Liberal Democrat political party conference which focused on the theme of 'Becoming a "science superpower": will the UK be fit to tackle the next global crisis?'.

Briefing: The panel discussions will address how the UK should approach the future, building resilience to future crises and achieving 'superpower' status. The panel will include leading experts representing the National Academies, as well as representatives from the political parties and a journalist Chair.

Not aware of any direct impact (see next week) but these sessions are an important part of keeping an open and positive dialogue with MPs.
Year(s) Of Engagement Activity 2021
 
Description National Librarian of Scotland's Lecture in Digital Scholarship 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Media (as a channel to the public)
Results and Impact Introduced research on chapbooks using Visual AI and how machine vision can help others to understand printed heritage collections.
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=5jkq0iLzMvo&t=10s
 
Description Neural Geometry and Rendering: Advances and the Common Objects in 3D Challenge? 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Workshop at the European Conference on Computer Vision (ECCV).
Year(s) Of Engagement Activity 2022
URL https://ngr-co3d.github.io
 
Description Office for National Statistics, Integrated Data Programme Advisory Group, Member, 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact The Office for National Statistics Integrated Data Programme Advisory Group offers advise to the ONS on its programme aimed at sharing data for pubic good with other organisations. I was invited due to my role as Chair of the Royal Society PETs science policy work together with my research interest in health data science/medical image analysis.
Year(s) Of Engagement Activity 2021,2022
 
Description OxML - Oxford Machine Learning Summer School 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Gave a lecture at the OxML summer school on Deep Learning.
Year(s) Of Engagement Activity 2021
URL https://www.oxfordml.school
 
Description Practical Applications of IIIF Seminar: Image Registration and IIIF 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Public/other audiences
Results and Impact Discussing the methods, challenges and possibilities of Image Registration.
Year(s) Of Engagement Activity 2021
URL https://www.iiconservation.org/content/practical-applications-iiif-seminar-1-image-registration-and-...
 
Description Presentation for British Museum Portable Antiquities Scheme 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Horace Lee gave a presentation to members of the British Museum Portable Antiquities Scheme (PAS) on the capabilities of Visual AI's WISE Image Search Engine and how it can be used in archaeology. This was part of an ongoing collaboration with the British Museum
Year(s) Of Engagement Activity 2023
 
Description Presentation to The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences symposium "Documenting, Understanding, Preserving Cultural Heritage. Humanities and Digital Technologies for Shaping the Future", Florence, Italy, July 2023 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation by Giles Bergel, Abhishek Dutta and Andrew Zisserman (Visual AI) and Rosario Carvalho et al on the Az Infinitum project, a collaboration with Visual AI that integrated the VGG Image Search Engine (VISE) software in a web application to allow the search of large collections of decorative Portuguese 'azulejo' tiles. The presentation and other outputs (a paper and web application) raised awareness of the use of Visual AI project software in the domains of art history, heritage science and digital humanities, impacting professionals in those domains and raising public interest in the use of computer vision to understand these historical materials.
Year(s) Of Engagement Activity 2023
URL https://www.timemachine.eu/ltm-projects/az-infinitum-azulejo-indexation-and-referencing-system/
 
Description Renaissance Society of America Day of Digital Learning 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact RSA DAY OF DIGITAL LEARNING. Featuring a varied menu of sessions involving hands-on, participatory work with digital tools and resources.
Year(s) Of Engagement Activity 2021
URL https://rsaddl.hcommons.org/
 
Description Renaissance Society of America Day of Digital Learning 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact An introduction to computer vision - the extraction of information from images - for the purposes of book and art history. Overview of the field, with particular reference to collaborative research performed by the Visual Geometry Group (VGG) at Oxford.
Year(s) Of Engagement Activity 2022
URL https://rsa2022ddl.hcommons.org/main-page/rsa-ddl-2022-topics/
 
Description Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group - Chair 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group (policy report), Chair, 2017-19. Also Chair of follow-on to initial report, 2021-.
Year(s) Of Engagement Activity 2019,2020,2021,2022
URL https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies
 
Description Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group - Chair 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group (policy report), Chair, 2017-19. Also Chair of follow-on to initial report, 2021-.
Year(s) Of Engagement Activity 2019,2020,2021,2022
URL https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies
 
Description Royal Society and DSIT Workshop on Science and AI Safety 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Royal Society and DSIT Workshop on Science and AI Safety including discussion meeting as well as red teaming activity with postgraduate students. The link below was a high profile output from part of the event. I provided opening comments for the event (but organisation was led by the Royal Society team and DSIT).
Year(s) Of Engagement Activity 2023
URL https://time.com/6328851/scientists-training-ai-safety/
 
Description Royal Society and US National Academy of Sciences Forum on Researcher Access to Data 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Forum description: The pandemic has demonstrated that there is strong public benefit derived from researchers having
prompt access to a variety of data sources, such as data from public and government bodies, as well as
private companies (in particular, tech companies). There is also significant interest in how we connect
and link the different data sources. The Forum will address the evolution of researcher access to data;
best practices and lessons learned from fields that are on the forefront of data sharing (i.e., climate
studies, astrophysics, biomedicine); and challenges related to pressing societal problems such as online
information (and misinformation), modeling for pandemics, and using data in emergencies.
Year(s) Of Engagement Activity 2023
 
Description Sight and Sound Workshop at the IEEE Conference on Computer Vision and Pattern Recognition, 2021 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2021. This is the description of the workshop: While traditionally visual and audio data have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition.

Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is appealing, and this workshop will cover recent advances in this direction. It will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Year(s) Of Engagement Activity 2021
URL https://sightsound.org/2021/
 
Description Sight and Sound Workshop at the IEEE Conference on Computer Vision and Pattern Recognition, 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2023. This is the description of the workshop: While traditionally visual and audio data have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is appealing, and this workshop will cover recent advances in this direction. It will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Year(s) Of Engagement Activity 2023
URL https://sightsound.org/2023/
 
Description Sixth Form Schools Science Talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact Gave talk to lower sixth form students at Magdalen College School on my research. This was part of their lecture series related to the lower sixth form project which provides them with experience of researching a topic. Lots of interesting questions particularly about the global health angle of the research/potential impact and ethics of using AI. In fact the quality of questions was much higher than most technical audience ones! Teacher followup said there was good discussion afterwards.
Year(s) Of Engagement Activity 2022
 
Description Summer School on Artificial Intelligence, India 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Lectured at Summer School on "Recognizing Human Actions in Videos", followed by Q & A session.
Year(s) Of Engagement Activity 2021
URL https://cvit.iiit.ac.in/summerschool2021/index.php
 
Description Talk at the Machine Learning and Computer Vision Research Group at University of Bristol 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact Presentation by Abhishek Dutta and Prasanna Sridhar on Visual AI project tools and workflows, in particular annotation and model training ('Manual Annotation of Images and Video using VIA'), leading to requests for information and further plans.
Year(s) Of Engagement Activity 2023
URL https://uob-mavi.github.io/people/
 
Description Talk at the Staff Meeting of History of Science Museum in Oxford on computer vision for heritage collection management and research 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presentation by Giles Bergel and Abhishek Dutta on Visual AI collaborations with cultural heritage organisations (libraries, museums and galleries) including software demos allowing visual search of digital collections. The Digital Collections manager, and other HSM staff, made appointments for follow up meetings and inquiries have been made to the Museum's IT suppliers.
Year(s) Of Engagement Activity 2023
 
Description The Sixth Annual Conference for Research Software Engineering 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Research Software Engineers from other Universities got to learn about our methods and processes of developing software tools that are used widely all over the world.
Year(s) Of Engagement Activity 2022
URL https://virtual.oxfordabstracts.com/#/event/3101/submission/70
 
Description Understanding egocentric data in 3D 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact https://ego4d-data.org/workshops/cvpr23
Year(s) Of Engagement Activity 2023
 
Description University of Illinois HRI Introduction to Computer Vision for Digital Humanists 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Online workshop delivering training in Visual AI software tools, leasing to discussion on their utility and a follow-up call with a prominent digital humanist working on historical newspapers
Year(s) Of Engagement Activity 2023
URL https://mediaspace.illinois.edu/media/t/1_arib8duv/28379181
 
Description University of Oxford Social Sciences Division 'Common Ground' seminar series: AI and Society, 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dan Schofield have an introduction of Visual AI to division of social sciences, Oxford and engaged in discussions about ethics and how to foster collaborations between AI engineering teams and social science researchers in Oxford.
Year(s) Of Engagement Activity 2023
URL https://www.socsci.ox.ac.uk/article/new-common-ground-seminar-series-starts-with-ai-and-society
 
Description University of Stockholm Digital Humanities Now workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Showcase new and ongoing research in the broad Digital Humanities field.
Year(s) Of Engagement Activity 2021
URL https://su.powerinit.com/Data/Event/EventTemplates/2602/?EventId=879
 
Description VGG Image Search Engine (VISE) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Talk at the RKD Netherlands Institute for Art History. The RKD team have integrated our VISE image search engine software into their platform. In this event, all the contributors to the digital platform talked about their work and their software. Our VISE software was introduced to a wider group of international audience.
Year(s) Of Engagement Activity 2022
URL https://rkd.nl/en/
 
Description VisuAI Show and Tell 2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Presneted our Visual annotation and Visual search software to potential interetest reseachers, some of whom enquired further and later adopted tools in their research.
Year(s) Of Engagement Activity 2021
 
Description Visual AI for ethology: chimpanzee behaviour analysis using deep learning, Department of Evolutionary Anthropology, University of Zurich 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Research presentation by Daniel Schofield to the University of Zurich Anthropology department, outlining computer vision applications for ethology as well as introducing Visual AI software.
Year(s) Of Engagement Activity 2023
 
Description VisualAI Show and Tell 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact The event was for the University of Edinburgh. It showcased the software developed by the VisualAI team with the aims of publicising the open source software produced in the project, and of attracting potential collaborators.
Year(s) Of Engagement Activity 2021
URL https://www.robots.ox.ac.uk/~vgg/projects/visualai/events.html#ST15621
 
Description VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity 2021
URL https://www.robots.ox.ac.uk/~vgg/data/voxceleb/interspeech2021.html
 
Description VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity 2023
URL https://mmai.io/datasets/voxceleb/voxsrc/interspeech2023.html
 
Description VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 2022 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity 2022
URL http://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/interspeech2022.html
 
Description What do you learn after Developing, Maintaining and Supporting Research Software for 6 years? 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Talk at the Vision, Graphics and Learning (VGL) research group in the Department of Computer Science, University of York. The PhD and Postdocs in the VGL group of University of York became aware about the software development methods and practices for create research software tools used by millions all over the world.
Year(s) Of Engagement Activity 2022
URL https://www.youtube.com/watch?v=8S0HbFX4HBM
 
Description WikiWorkshop presentation of WISE image search engine 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Abhishek Dutta, Horace Lee, Prasanna Sridhar and Andrew Zisserman presented WISE, a multimodal search engine running on over 50 million images from Wikimedia Commons. This lead to follow-up meeting with the Wikimedia Foundation about how WISE can help the foundation to make images searchable, including to find harmful content.
Year(s) Of Engagement Activity 2023
URL https://wikiworkshop.org/2023/#
 
Description Workshop on Studying the Images of Popular Prints: Methods and Theory, Catholic University of Valencia 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact Two-day workshop on popular Spanish 'pliegos', and related materials, including a hands on session teaching VGG tools applied to these materials by the Spanish Chapbooks project at Cambridge University, who were present. Outcomes included plans for further development of Cambridge and other Spanish resources and use of project software, and an invitation to speak to a similar project at the University of Geneva in 2025.
Year(s) Of Engagement Activity 2023
URL http://biblioteca.cchs.csic.es/docs/Poster_Valencia_low.pdf
 
Description Workshop: Introduction to Visual AI for Behavioural Research, Department of Anthropology, University of Oxford. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Hands-on workshop by Daniel Schofield to the University of Oxford Anthropology Department. introducing visual AI tools and core concepts for using computer vision in anthropological research.
Year(s) Of Engagement Activity 2023
 
Description Workshop: Introduction to computer vision tools for primatology: How to annotate, detect and track, Kuching, Malaysia 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact 2 hour workshop and presentation from Dan Schofield to attendees of the International Primatological Society introducing Visual AI tools for primatological research.
Year(s) Of Engagement Activity 2023
URL https://ipskuching.com/programme/
 
Description `A statistical learning perspective on reconstructing the 3D world 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Invited talk at the BrainWorlds Freiburg-Oxford Workshop.
Year(s) Of Engagement Activity 2023
URL https://brainworlds.uni-freiburg.de