Seebibyte: Visual Search for the Era of Big Data

Lead Research Organisation: University of Oxford

Department Name: Engineering Science

Abstract

The Programme is organised into two themes.

Research theme one will develop new computer vision algorithms to enable efficient search and description of vast image and video datasets - for example of the entire video archive of the BBC. Our vision is that anything visual should be searchable for, in the manner of a Google search of the web: by specifying a query, and having results returned immediately, irrespective of the size of the data. Such enabling capabilities will have widespread application both for general image/video search - consider how Google's web search has opened up new areas - and also for designing customized solutions for searching.
A second aspect of theme 1 is to automatically extract detailed descriptions of the visual content. The aim here is to achieve human like performance and beyond, for example in recognizing configurations of parts and spatial layout, counting and delineating objects, or recognizing human actions and inter-actions in videos, significantly superseding the current limitations of computer vision systems, and enabling new and far reaching applications. The new algorithms will learn automatically, building on recent breakthroughs in large scale discriminative and deep machine learning. They will be capable of weakly-supervised learning, for example from images and videos downloaded from the internet, and require very little human supervision.

The second theme addresses transfer and translation. This also has two aspects. The first is to apply the new computer vision methodologies to `non-natural' sensors and devices, such as ultrasound imaging and X-ray, which have different characteristics (noise, dimension, invariances) to the standard RGB channels of data captured by `natural' cameras (iphones, TV cameras). The second aspect of this theme is to seek impact in a variety of other disciplines and industry which today greatly under-utilise the power of the latest computer vision ideas. We will target these disciplines to enable them to leapfrog the divide between what they use (or do not use) today which is dominated by manual review and highly interactive analysis frame-by-frame, to a new era where automated efficient sorting, detection and mensuration of very large datasets becomes the norm. In short, our goal is to ensure that the newly developed methods are used by academic researchers in other areas, and turned into products for societal and economic benefit. To this end open source software, datasets, and demonstrators will be disseminated on the project website.

The ubiquity of digital imaging means that every UK citizen may potentially benefit from the Programme research in different ways. One example is an enhanced iplayer that can search for where particular characters appear in a programme, or intelligently fast forward to the next `hugging' sequence. A second is wider deployment of lower cost imaging solutions in healthcare delivery. A third, also motivated by healthcare, is through the employment of new machine learning methods for validating targets for drug discovery based on microscopy images

Planned Impact

The proposed programme encompasses new methodology and applied research in computer vision that will impact not only the imaging field, but other non-imaging disciplines, and it will encourage end-user uptake of imaging technologies and commercial interest in embedding imaging technologies in products. These are the main beneficiaries of programme research.

We have carefully chosen members of our Programme Advisory Board (PAB) and User Group to represent a comprehensive and diverse range of academic and industry interests and expect them to challenge us to ensure that the impact of the Programme is realised. We will ensure that both the PAB and the User Group are constantly refreshed with appropriate representatives.

The Programme will have Economic and Societal impact by
1. Developing new and improved computer vision technologies for commercialisation by a wide range of companies;
2. Enhancing the Big Data capabilities and knowledge base of UK industries.
3. Enhancing quality of life by improving, for instance, healthcare capabilities, surveillance, environmental monitoring of roads, and new means of enjoying digital media in the home. Other engineering advances will aim to make a large impact "behind the scenes", for instance to underpin better understanding of biological effects at the individual cell level and characterisation of advanced materials.
4. Training the next generation of computer vision researchers who will be equipped to support the imaging needs of science, technology and wider society for the future;

Impact on Knowledge includes
1. Realisation of new approaches to essential computer vision technology, and the dissemination of research findings through publications and conference presentations and the distribution of open source software and image databases.
2. Sharing knowledge with collaborators via Transfer and Application Projects (TAPs) and other activities leading to adoption of advanced computer vision methods across many disciplines of science, engineering and medicine that currently do no use them.
3. Communication of advances to a public audience through website articles and other co-ordinated public understanding activities.

Funded Value:

£4,467,651

Funded Period:

Jun 15 - Nov 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/M013774/1

Principal Investigator:

Andrew Zisserman

Research Subject:

Info. & commun. Technol. (70%)

Tools, technologies & methods (30%)

Research Topic:

Image & Vision Computing (70%)

Medical Imaging (30%)

Organisations

People	ORCID iD
Andrew Zisserman (Principal Investigator)
Alison Noble (Co-Investigator)
Philip Torr (Co-Investigator)
Andrea Vedaldi (Co-Investigator)
Jens Rittscher (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 3 4 5 6 7 8 9 10 > >|

10 25 50

Rebuffi Sylvestre-Alvise (2018) Efficient parametrization of multi-domain deep neural networks in arXiv e-prints

Sailem H (2017) Discovery of Rare Phenotypes in Cellular Images Using Weakly Supervised Deep Learning

Sailem H (2019) KDML: a machine-learning framework for inference of multi-scale gene functions from genetic perturbation screens

Savochkina E (2022) First Trimester video Saliency Prediction using CLSTMU-NET with Stochastic Augmentation. in Proceedings. IEEE International Symposium on Biomedical Imaging

Schofield D (2019) Chimpanzee face recognition from videos in the wild using deep learning. in Science advances

Sharma H (2021) Machine learning-based analysis of operator pupillary response to assess cognitive workload in clinical ultrasound imaging. in Computers in biology and medicine

Sharma H (2021) Multi-Modal Learning from Video, Eye Tracking, and Pupillometry for Operator Skill Characterization in Clinical Fetal Ultrasound

Sirinukunwattana K (2021) Image-based consensus molecular subtype (imCMS) classification of colorectal cancer using deep learning. in Gut

Sirinukunwattana K (2019) Image-based consensus molecular subtype classification (imCMS) of colorectal cancer using deep learning

Sirinukunwattana K (2018) Medical Image Computing and Computer Assisted Intervention - MICCAI 2018 - 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II

Sirinukunwattana K (2019) Improving the diagnosis and classification of Ph-negative myeloproliferative neoplasms through deep phenotyping

Siris A (2020) Inferring Attention Shift Ranks of Objects for Image Saliency

Tomas Pfister (2015) Flowing ConvNets for Human Pose Estimation in Videos

Valmadre J (2017) End-to-End Representation Learning for Correlation Filter Based Tracking

Vaze S (2020) Low-Memory CNNs Enabling Real-Time Ultrasound Segmentation Towards Mobile Deployment. in IEEE journal of biomedical and health informatics

Vigneault D (2017) Functional Imaging and Modelling of the Heart

Vigneault DM (2018) O-Net (Omega-Net): Fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks. in Medical image analysis

Villarroel M (2019) Non-contact physiological monitoring of preterm infants in the Neonatal Intensive Care Unit in npj Digital Medicine

Wang Y (2020) Differentiating Operator Skill during Routine Fetal Ultrasound Scanning using Probe Motion Tracking. in Medical ultrasound, and preterm, perinatal and paediatric image analysis

Wei D (2018) Learning and Using the Arrow of Time

Wei D. (2018) Learning and Using the Arrow of Time

Wiles O (2018) Learning to Predict 3D Surfaces of Sculptures from Single and Multiple Views in International Journal of Computer Vision

Wiles O (2019) Computer Vision - ECCV 2018 Workshops - Munich, Germany, September 8-14, 2018, Proceedings, Part III

Wiles O (2021) Co-Attention for Conditioned Image Matching

Wiles O (2019) Self-Supervised Learning of Class Embeddings from Video

Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Spin Outs
Engagement Activities


Description	Background: The first goal of Seebibyte was basic research, to develop the next generation of computer vision algorithms able to analyse, describe and search image and video content with human-like capabilities and far beyond. The second goal, was to transfer the latest computer vision methods into other disciplines and industry in order to support the growing number of research communities in the UK (and internationally) who are increasingly generating large, and diverse image and video datasets but are limited by lack of tools to effectively and efficiently interpret their data. This second goal was quite unusual as it focused on solutions to unmet needs where the latest computer vision methods could have large impact. How Seebibyte has met its original ambitions: Seebibyte has succeeded in both the basic research and the transfer and translation themes. In basic research, it has greatly increased the range of entities that can be described and searched for (and counted, and tracked). It has published academic papers in the proceedings of the principal international conferences and journals, including winning conference prizes and in one case a journal prize. The PI and Co-Is have delivered invited talks and keynotes on Seebibyte research at multiple named lectures, conferences, and Summer schools. In transfer and translation, it has successfully penetrated other disciplines, in particular Digital Humanities and medical image analysis, including both joint publications (see below) and citations to the Seebibyte software packages. Other dissemination outputs include: software, datasets, and engagement including Transfer and Application Projects (TAPs) with academics from other disciplines and with industry, Show & Tell days, targeted talks and workshops, and social media and web presence. Most significant achievements: The Seebibyte web pages at http://www.seebibyte.org/ are a (public facing) record of many of the Programme Grant outputs, including: demos, case studies, software, publications and research. Basic research: The Programme Grant has led to the publication of a substantial number of papers at the (very competitive) principal international computer vision, and machine learning conferences and workshops (between 20 and 25 publications per year in 2018, 2019 & 2020). Several of these papers have been awarded best paper prizes in the conferences or journals where they appeared. Several papers published in 2018 have already been cited over a hundred times (Google Scholar). In medical image analysis Seebibyte researchers have contributed a number of methodological papers that are examples of rapid transfer of the latest computer vision methodologies to medical imaging, including RNNs for fetal heart ultrasound video analysis (oral presentation at the MICCAI 2017 medical computation conference), and a best paper prize at FIMH 2017. Research on automating gradings from MRI Spine images was awarded a best paper prize at MICCAI 2016, and was also awarded a prize when it was published in the clinical literature. Transfer and Application' Projects (TAPs): These have been a core part of the Transfer and Translation theme, and a large number (more than 24) have been completed. Several have amply succeeded in reaching entirely new disciplines and have resulted in joint publications in high prestige journals. Examples are: - In Material Science: ``Crystal nucleation in metallic alloys using x-ray radiography and machine learning'', E. Liotti, C. Arteta, A. Zisserman, A. Lui, V. Lempitsky and P. S. Grant, Science advances, 2018. - In Zoology: ``Time-lapse imagery and volunteer classifications from the Zooniverse Penguin Watch project'', F. M. Jones, C. Allen, C. Arteta, J. Arthur, C. Black, L. M. Emmerson, R. Freeman, G. Hines, C. J. Lintott, Z. Mach\'ackov\'a, G. Miller, R. Simpson, C. Southwell, H. R. Torsey, A. Zisserman, T. Hart, Scientific Data, Volume 5, 2018. - In Anthropology: ``Chimpanzee face recognition from videos in the wild using deep learning'', D. Schofield, A. Nagrani, A. Zisserman, M. Hayashi, T. Matsuzawa, D. Biro and S. Carvalho, Science Advances, 2019. The work, particularly on penguins and chimpanzees has had significant media attention, with articles in New Scientist, MIT Tech Review, TechXplore and many others. Many other TAPs were with Digital Humanities researchers through the work of our Digital Humanities Ambassador. Open source software packages: The open source software released throughout Seebibyte has had a huge impact, with one package (VIA) being used over two million times by industry and academics (and also cited in publications over 250 times (Google Scholar)). The following packages were released (and are available from the web page http://www.seebibyte.org/software.html): - VFF: Face Search in images and video - SVT: Visual Tracker in videos - VIA: Image and Video Annotator - VIC: Image Classification engine - VISE: Image Search engine - Image Comparator In each case the package includes online demonstrations, guides and walk-throughs for use and installation, and documentation. Datasets: released throughout the PG have also had a significant impact with one (VoxCeleb) being downloaded over four thousand times. They have covered a diverse set of projects including: Lip Reading Datasets from a partnership with the BBC (the LRW and LRS2 datasets); A large scale audio-visual dataset of human speech (the VoxCeleb dataset); Counting (penguins) in the wild from a partnership with Zooniverse; and Looking at each other in videos.
Exploitation Route	The release of software as open source has nurtured the development of an open source community around some of the software packages which not only provides feedback but also contributes code to add new features and improve existing features. This community has continued after the end of the Programme Grant.
Sectors	Digital/Communication/Information Technologies (including Software),Healthcare,Culture, Heritage, Museums and Collections,Retail,Transport
URL	http://www.seebibyte.org/


Description	Beyond Academia, Seebibyte has significantly improved the uptake of computer vision and visual AI in the cultural heritage sectors. We have disseminated project outcomes to such cultural heritage institutions such as the British Library, National Archives and Ashmolean Museum, and to companies serving the sector such as Montala, Preservica, Capture and Topfoto. These organisations now have: - a better understanding of computer vision - knowledge of Seebibyte project software and datasets The British Library, Topfoto, Ashmolean and others have benefited from the impact of VGG software and project personnel time in organising their digital collections, leading to savings in infrastructure and collection management costs. For example, the British Library recognised the value of the Seebibyte engagement in a Runner-Up award in its annual Labs competition for 2019-20, showcased at https://blogs.bl.uk/digital-scholarship/2020/10/bl-labs-public-award-runner-up-research-2019-automated-labelling-of-people-in-video-archives.html
First Year Of Impact	2018
Sector	Healthcare,Culture, Heritage, Museums and Collections,Retail,Transport
Impact Types	Cultural,Societal,Economic


Description	AI and healthcare Research Briefing
Geographic Reach	National
Policy Influence Type	Citation in other policy documents
Impact	There are various applications of Artificial Intelligence (AI) in healthcare, such as helping clinicians to make decisions, monitoring patient health, and automating routine administrative tasks. This POSTnote gives an overview of these uses, and their potential impacts on the cost and quality of healthcare, and on the workforce. It summarises the challenges to wider adoption of AI in healthcare, including those relating to safety, privacy, data-sharing, trust, accountability and health inequalities. It also outlines some of the regulations relevant to AI, and how these may change. As healthcare is a devolved issue, policies on healthcare AI differ across the UK. This POSTnote focusses on regulations and policies relevant to England.
URL	https://post.parliament.uk/research-briefings/post-pn-0637/


Description	UKRI AI strategy
Geographic Reach	National
Policy Influence Type	Citation in other policy documents
URL	https://www.ukri.org/wp-content/uploads/2021/02/UKRI-120221-TransformingOurWorldWithAI.pdf


Description	AWS Machine Learning Research Awards Program
Amount	$225,000 (USD)
Organisation	Amazon.com
Sector	Private
Country	United States
Start	02/2018
End	01/2020


Description	An Agreement but not A Scheme
Amount	$20,000 (USD)
Organisation	Adobe Inc.
Sector	Private
Country	United States
Start	07/2020
End	06/2021


Description	Big Data Science in Medicine and Healthcare
Amount	£55,000 (GBP)
Organisation	University of Oxford
Department	Oxford Martin School
Sector	Academic/University
Country	United Kingdom
Start	04/2017
End	03/2020


Description	By Agreement but not a Scheme
Amount	$10,000 (USD)
Organisation	Google
Sector	Private
Country	United States
Start	04/2020
End	03/2021


Description	CALOPUS - Computer Assisted LOw-cost Point-of-case UltraSound
Amount	£1,013,662 (GBP)
Funding ID	EP/R013853/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	02/2018
End	07/2022


Description	EPSRC Centre for Doctoral Training in Health Data Science
Amount	£6,640,406 (GBP)
Funding ID	EP/S02428X/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2019
End	09/2027


Description	ERC Advanced Grant
Amount	€ 2,500,000 (EUR)
Organisation	European Research Council (ERC)
Sector	Public
Country	Belgium
Start	11/2016
End	10/2021


Description	ERC Starting Grant
Amount	€ 1,500,000 (EUR)
Organisation	European Research Council (ERC)
Sector	Public
Country	Belgium
Start	08/2015
End	09/2020


Description	ERC-2020-COG
Amount	€ 184,947,839 (EUR)
Funding ID	101001212
Organisation	European Research Council (ERC)
Sector	Public
Country	Belgium
Start	08/2021
End	07/2026


Description	End to End Translation of British Sign Language
Amount	£971,921 (GBP)
Funding ID	EP/R03298X/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	07/2018
End	06/2021


Description	Fellowship for Oxford Student Arsha Nagrani
Amount	$180,000 (USD)
Organisation	Google
Sector	Private
Country	United States
Start	10/2018
End	09/2021


Description	GCRF: Growing Research Capability Call
Amount	£8,000,000 (GBP)
Funding ID	MR/P027938/1
Organisation	Medical Research Council (MRC)
Sector	Public
Country	United Kingdom
Start	10/2017
End	09/2021


Description	IARPA BAA-16-13
Amount	$1,196,818 (USD)
Organisation	Intelligence Advanced Research Projects Activity
Sector	Public
Country	United States
Start	09/2017
End	09/2021


Description	Innovate UK Smart Grant
Amount	£62,489 (GBP)
Funding ID	71653
Organisation	Innovate UK
Sector	Public
Country	United Kingdom
Start	11/2020
End	04/2022


Description	Research Collaboration relating to DNN-based Face Recognition for Surveillance
Amount	£200,000 (GBP)
Organisation	Toshiba
Sector	Private
Country	Japan
Start	10/2017
End	09/2019


Description	Retiming People's Motion in Videos
Amount	$10,000 (USD)
Organisation	Google
Sector	Private
Country	United States
Start	10/2019


Description	Scholarship for Andrea Vedaldi's Students
Amount	£1,000,000 (GBP)
Organisation	Facebook
Sector	Private
Country	United States
Start	10/2018
End	09/2025


Description	Studenships for VGG Oxford
Amount	£320,000 (GBP)
Organisation	Google
Sector	Private
Country	United States
Start	10/2018
End	09/2022


Description	The National Librarian of Scotland's Fellowship in Digital Scholarship
Amount	£7,500 (GBP)
Organisation	National Library of Scotland
Sector	Academic/University
Country	United Kingdom
Start	12/2020
End	03/2021


Description	VAoSS
Amount	£200,000 (GBP)
Organisation	Nielsen Holdings PLC
Sector	Private
Country	United States
Start	04/2019
End	03/2021


Description	Visual AI: An Open World Interpretable Visual Transformer
Amount	£5,912,096 (GBP)
Funding ID	EP/T028572/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	12/2020
End	11/2025


Description	Visual Recognition
Amount	£745,090 (GBP)
Organisation	Continental AG
Sector	Private
Country	Germany
Start	11/2016
End	12/2019


Description	Visual Recognition II
Amount	£877,887 (GBP)
Organisation	Continental AG
Sector	Private
Country	Germany
Start	01/2020
End	12/2022


Title	3D Shape Attributes and the CMU-Oxford Sculpture Dataset
Description	The CMU-Oxford Sculpture dataset contains 143K images depicting 2197 works of art by 242 artists. Each image comes with 12 labels for each of the 3D Shape Attributes defined in our CVPR paper. We additionally provide sample MATLAB code that illustrates reading the data and evaluating a method.
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
Impact	"We have shown that 3D attributes can be inferred directly from images at quite high quality. These attributes open a number of possibilities of applications and extensions. One immediate application is to use this system to complement metric reconstruction: shape attributes can serve as a top-down cue for driving reconstruction that works even on unknown objects. Another area of investigation is explic¬itly formulating our problem in terms of relative attributes: many of our attributes (e.g., planarity) are better modeled in relative terms. Finally, we plan to investigate which cues (e.g., texture, edges) are being used to infer these attributes." A publication titled "3D Shape Attributes" authored by D.F. Fouhey, A.Gupta and A.Zisserman resulted from this research and was presented at IEEE CVPR 2016.
URL	http://www.robots.ox.ac.uk/~vgg/data/sculptures/


Title	BBC-Oxford Lip Reading Dataset
Description	The dataset consists of up to 1000 utterances of 500 different words, spoken by hundreds of different speakers. All videos are 29 frames (1.16 seconds) in length, and the word occurs in the middle of the video.
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
Impact	Publications have resulted form this research and an award has been won: [1] J. S. Chung, A. Zisserman Lip Reading in the Wild - Best Student Paper Award Asian Conference on Computer Vision, 2016 [2] J. S. Chung, A. Zisserman Out of time: automated lip sync in the wild Workshop on Multi-view Lip-reading, ACCV, 2016
URL	http://www.robots.ox.ac.uk/~vgg/data/lip_reading/


Title	Celebrity in Places Dataset
Description	The dataset contains over 38k images of celebrities in different types of scenes. There are 4611 celebrities and 16 places involved. The images were obtained using Google Image Search and verified by human annotation.
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
Impact	Publications have resulted from this research based on this dataset. Y. Zhong, R. Arandjelovic, A. Zisserman Faces in Places: Compound Query Retrieval British Machine Vision Conference, 2016
URL	http://www.robots.ox.ac.uk/~vgg/data/celebrity_in_places/


Title	Condensed Movies
Description	It is a large-scale video dataset, featuring clips from movies with detailed captions.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	Publication titled "Condensed Movies: Story Based Retrieval with Contextual Embeddings" ACCV 2020. M. Bain, A. Nagrani, A. Brown, A. Zisserman
URL	https://www.robots.ox.ac.uk/~vgg/data/condensed-movies/


Title	Count, Crop and Recognise: Fine-Grained Recognition in the Wild
Description	The goal of this paper is to label all the animal individuals present in every frame of a video. Unlike previous methods that have principally concentrated on labelling face tracks, we aim to label individuals even when their faces are not visible. We make the following contributions: (i) we introduce a 'Count, Crop and Recognise' (CCR) multistage recognition process for frame level labelling. The Count and Recognise stages involve specialised CNNs for the task, and we show that this simple staging gives a substantial boost in performance; (ii) we compare the recall using frame based labelling to both face and body track based labelling, and demonstrate the advantage of frame based with CCR for the specified goal; (iii) we introduce a new dataset for chimpanzee recognition in the wild; and (iv) we apply a high-granularity visualisation technique to further understand the learned CNN features for the recognition of chimpanzee individuals.
Type Of Material	Database/Collection of data
Year Produced	2019
Provided To Others?	Yes
Impact	A publication has resulted from using this database: Max Bain, Arsha Nagrani, Daniel Schofield, Andrew Zisserman Count, Crop and Recognise: Fine-Grained Recognition in the Wild Workshop on Computer Vision for Wildlife Conservation, ICCV, 2019 [Oral Presentation] Bibtex \| PDF \| arXiv \| Dataset
URL	https://www.robots.ox.ac.uk/~vgg/research/ccr/


Title	LAEO-Net: revisiting people Looking At Each Other in videos
Description	Capturing the mutual gaze of people is essential for understanding and interpreting the social interactions between them. To this end, this paper addresses the problem of detecting people Looking At Each Other (LAEO) in video sequences. For this purpose, we propose LAEO-Net, a new deep CNN for determining LAEO in videos. In contrast to previous works, LAEO-Net takes spatio-temporal tracks as input and reasons about the whole track. It consists of three branches, one for each character's tracked head and one for their relative position. Moreover, we introduce two new LAEO datasets: UCO-LAEO and AVA-LAEO. A thorough experimental evaluation demonstrates the ability of LAEO-Net to successfully determine if two people are LAEO and the temporal window where it happens. Our model achieves state-of-the-art results on the existing TVHID-LAEO video dataset, significantly outperforming previous approaches.
Type Of Material	Database/Collection of data
Year Produced	2019
Provided To Others?	Yes
Impact	A publication has resulted from using this database: MJ. Marin-Jimenez, V. Kalogeiton, P. Medina-Suarez, A. Zisserman LAEO-Net: revisiting people Looking At Each Other in videos Intl Conference in Computer Vision and Pattern Recognition (CVPR), 2019 Bibtex \| Abstract \| PDF \| All
URL	http://www.robots.ox.ac.uk/~vgg/research/laeonet/


Title	LAOFIW Dataset: Labeled Ancestral Origin Faces in the Wild
Description	LAOFIW is a dataset of 14,000 images divided into four equally sized classes: sub-Saharan Africa, East Asia, Indian subcontinent, Western Europe.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	A publication titled "Turning a Blind Eye: Explicit Removal of Biases and Variation from Deep Neural Network Embeddings " authored by M. Alvi, A. Zisserman, C. Nellaker resulted from this research and was presented at the Workshop on Bias Estimation in Face Analytics, ECCV 2018.
URL	http://www.robots.ox.ac.uk/~vgg/data/laofiw/


Title	Lip Reading Sentences 3 (LRS3) Dataset
Description	The dataset consists of thousands of spoken sentences from TED and TEDx videos. There is no overlap between the videos used to create the test set and the ones used for the pre-train and trainval sets. The dataset statistics are given in the table below.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	A publication has resulted in the research: T. Afouras, J. S. Chung, A. Zisserman LRS3-TED: a large-scale dataset for visual speech recognition
URL	http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs3.html


Title	MoCA - Moving Camouflaged Animals Dataset
Description	MoCA is the largest video dataset for camouflaged animals discovery.The dataset provides both bounding box annotations and motion type labels. 3 main types of motion: Locomotion: when the animal engages in a movement that leads to a significant change of its location within the scene. Deformation: when the animal engages in a more delicate movement that only leads to a change in its pose while remaining in the same location. Still: when the animal remains still.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	Hala Lamdouar, Charig Yang, Weidi Xie, Andrew Zisserman "Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation", ACCV, 2020
URL	https://www.robots.ox.ac.uk/~vgg/data/MoCA/


Title	THE SHERLOCK TV SERIES DATASET
Description	We provide data for all the three episodes of Season 1 of the BBC TV series "Sherlock". Each episode is almost an hour long. The DVDs can be purchased online, for example from Amazon. Face detections, tracks, shots and ground truth annotation for the character's identity are provided in csv format here (README). This folder also contains a few example synchronisation frames for each episode.
Type Of Material	Database/Collection of data
Year Produced	2017
Provided To Others?	Yes
Impact	In using images of actors to recognize characters, we make the following three contributions: We demonstrate that an automated semi-supervised learning approach is able to adapt from the actor's face to the character's face, including the face context of the hair. By building voice models for every character we provide a bridge between frontal faces (for which there is plenty of actor-level supervision) and profile (for which there is very little or none). We use a CNN model pretrained on the VoxCeleb dataset . The model can be downloaded here. By combining face context and speaker identification, we are able to identify characters with partially occluded faces and extreme facial poses. A paper titled "From Benedict Cumberbatch to Sherlock Holmes: Character Identification in TV series without a Script " authored by Arsha Nagrani and Andrew Zisserman resulted from this research and was presented at the British Machine Vision Conference, 2017.
URL	http://www.robots.ox.ac.uk/~vgg/data/Sherlock/


Title	Text Localisation Dataset
Description	This is a synthetically generated dataset, in which word instances are placed in natural scene images, while taking into account the scene layout. The dataset consists of 800 thousand images with approximately 8 million synthetic word instances. Each text instance is annotated with its text-string, word-level and character-level bounding-boxes
Type Of Material	Database/Collection of data
Year Produced	2016
Provided To Others?	Yes
Impact	A publication has resulted from this research: A. Gupta, A. Vedaldi, A. Zisserman Synthetic Data for Text Localisation in Natural Images IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016
URL	http://www.robots.ox.ac.uk/~vgg/data/scenetext


Title	The 'Celebrity Together' Dataset
Description	The 'Celebrity Together' dataset has 194k images containing 546k faces in total, covering 2622 labeled celebrities (same identities as VGGFace Dataset). 59% faces correspond to these 2622 celebrities, and the rest faces are considered as 'unknown' people. The images in this dataset were obtained using Google Image Search and verified by human annotation. Further details of the dataset collection procedure are explained in the paper.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	This dataset contains images that have multiple labeled celebrities per image (see example images above). It therefore can be used as an evaluation benchmark for retrieving a set of identities. Namely, given a query of a set of identities (and one or several face images are provided for each identity), the system should return a ranked list of the dataset images, such that images containing all the query identities are ranked first, followed by images containing all but one, etc. A publication titled "Compact Deep Aggregation for Set Retrieval" authored by Y.Zhong, R. Arandjelovic and A. Zisserman resulted from this research and won the Best Paper Award at ECCV 2018.
URL	http://www.robots.ox.ac.uk/~vgg/data/celebrity_together/


Title	The Oxford-BBC Lip Reading Sentences 2 (LRS2) Dataset
Description	The dataset consists of thousands of spoken sentences from BBC television. Each sentences is up to 100 characters in length. The training, validation and test sets are divided according to broadcast date.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	A publication has resulted as the research of this dataset: T. Afouras, J. S. Chung, A. Senior, O. Vinyals, A. Zisserman Deep Audio-Visual Speech Recognition
URL	http://www.robots.ox.ac.uk/~vgg/data/lip_reading/lrs2.html


Title	VGG-Sound
Description	VGG-Sound is an audio-visual correspondent dataset consisting of short clips of audio sounds, extracted from videos uploaded to YouTube
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	H. Chen, W. Xie, A. Vedaldi, A. Zisserman VGG-SOUND: A LARGE-SCALE AUDIO-VISUAL DATASET ICASSP, 2020
URL	https://www.robots.ox.ac.uk/~vgg/data/vggsound/


Title	VoxCeleb 2: A large scale audio-visual dataset of human speech
Description	VoxCeleb is an audio-visual dataset consisting of short clips of human speech, extracted from interview videos uploaded to YouTube with 7,000 + speakers, over 1 million utterances. over 2000 hours of recording. VoxCeleb contains speech from speakers spanning a wide range of different ethnicities, accents, professions and ages. All speaking face-tracks are captured "in the wild", with background chatter, laughter, overlapping speech, pose variation and different lighting conditions. Each segment is at least 3 seconds long.
Type Of Material	Database/Collection of data
Year Produced	2018
Provided To Others?	Yes
Impact	A publication titled "VoxCeleb2: Deep Speaker Recognition " authored by J.S.Chung, A.Nagrani and A.Zisserman resulted from this research and was presented at Interspeech 2018.
URL	http://www.robots.ox.ac.uk/~vgg/data/voxceleb/


Title	VoxCeleb: a large-scale speaker identification dataset
Description	VoxCeleb contains over 100,000 utterances for 1,251 celebrities, extracted from videos uploaded to YouTube. The dataset is gender balanced, with 55% of the speakers male. The speakers span a wide range of different ethnicities, accents, professions and ages. There are no overlapping identities between development and test sets.
Type Of Material	Database/Collection of data
Year Produced	2017
Provided To Others?	Yes
Impact	"We provide a fully automated and scalable pipeline for audio data collection and use it to create a large-scale speaker identification dataset called VoxCeleb, with 1,251 speakers and over 100,000 utterances. In order to establish benchmark performance, we develop a novel CNN architecture with the ability to deal with variable length audio inputs, which out¬performs traditional state-of-the-art methods for both speaker identification and verification on this dataset." A publication titled "VoxCeleb: a large-scale speaker identification dataset " authored by Arsha Nagrani, Joon Son Chung and Andrew Zisserman resulted from this research and was presented at Interspeech 2017.
URL	http://www.robots.ox.ac.uk/~vgg/data/voxceleb/


Title	VoxConverse
Description	VoxConverse is an audio-visual diarisation dataset consisting of over 50 hours of multispeaker clips of human speech, extracted from YouTube videos
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	Yes
Impact	A VoxCeleb Speaker Recognition Challenge (VoxSRC) is held at Interspeech every year. This is a speaker recognition challenge held on the VoxCeleb and VoxConverse datasets! VoxSRC consists of an online challenge and an accompanying workshop at Interspeech. J. S. Chung, J. Huh, A. Nagrani*, T. Afouras, A. Zisserman Spot the conversation: speaker diarisation in the wild ArXiv, 2020
URL	https://www.robots.ox.ac.uk/~vgg/data/voxconverse/


Title	VoxMovies: Speaker recognition under different domains dataset
Description	VoxMovies is an audio dataset, containing utterances sourced from movies with varying emotion, accents and background noise.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	To benchmark performance of speaker recognition systems on this entirely new domain, VoxMovies contains a number of domain adaptation evaluation sets. A. Brown, J. Huh, A. Nagrani, J. S. Chung, A. Zisserman Playing a Part: Speaker Verification at the Movies International Conference on Acoustics, Speech and Signal Processing, 2021 Equal Contribution.
URL	https://www.robots.ox.ac.uk/~vgg/data/voxmovies/


Description	2017TAP1 - Dante Editions
Organisation	University of Manchester
Country	United Kingdom
Sector	Academic/University
PI Contribution	This project will use the SeebiByte image software to undertake a preliminary investigation of the design features of early printed editions of Dante's Divine Comedy, published between 1472 and 1491, held in and digitized by the John Rylands Library, University of Manchester. By focusing on a single iconic literary text in the first twenty years of its print publication, Manchester can investigate the evolution of the page design, from the first editions which contain the text of the poem only, to later ones of increasing visual and navigational sophistication, as elements such as titles, author biographies, commentaries, rubrics, summaries, page numbers, illustrations, and devotional material are introduced into the object. The use of computer vision techniques will allow Manchester to approach these books and the study of Dante in an entirely new way and will add greatly to our knowledge of early modern book technologies and information design.
Collaborator Contribution	Manchester will supply data for analysis.
Impact	This collaboration is a cross disciplinary work between Visual Geometry and Humanities.
Start Year	2017


Description	2017TAP10 - Botanical Plates
Organisation	University of Hyderabad
Department	Department of Plant Sciences
Country	India
Sector	Academic/University
PI Contribution	1. Manually Annotate 291 plate images using VIA tool 2. Validation test against 281 images known to have been copied from earlier sources 3. Compare all illustrations extracted from 291-plates against images in the "secondary source" (the large image dataset obtained from biodiversitylibrary.org)
Collaborator Contribution	The partner is till yet to provide dataset. The project has stalled.
Impact	This project is a cross-disciplinary collaboration between computer vision and plant science.
Start Year	2017


Description	2017TAP11 - Traherne Digital Collator
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	Seebibyte provided a comparison software, capable of efficiently navigating very large image sets, allowing different visualization s of variants, and offering a range of tools for detailed study.
Collaborator Contribution	The Oxford Trahern supplied the bibliographical and textual expertise needed to understand and define user specifications for digital collation.
Impact	This software was initially developed for the Oxford Traherne project but is now being used by a variety of prestigious scholarly editing projects across the world. Julia J. Smith, General Editor of the Oxford Traherne project, says, "The collation of early modern texts therefore has been revolutionized by the Traherne Digital Collator which is incomparably easier and faster to use, and enables more detail and sophisticated textual analysis than has previously been possible."
Start Year	2017


Description	2017TAP14 - 15cBOOKTRADE
Organisation	University of Cambridge
Department	Faculty of Modern and Medieval Languages
Country	United Kingdom
Sector	Academic/University
PI Contribution	Seebibyte provided tools to track the reuse, copying, borrowing and selling of 15th - century illustration found in certain editions, and made with wooden blocks.
Collaborator Contribution	The partner provided data to be analyzed.
Impact	The Seebibyte research and the partner presented a talk based on this research at the conference Printing Revolution and Society 1450 - 1500 -- Fifty Years that Changed Europe. The visually searchable database is a research support tool for art historians, book historians, philologists and historians of visual and material culture. A journal article has resulted from this research. The Use and Reuse of Printed Illustrations in 15th-Century Printed Venetian Editions. http://doi.org/10.30687/978-88-6969-332-8.
Start Year	2017


Description	2017TAP14 - 15cBOOKTRADE
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	Seebibyte provided tools to track the reuse, copying, borrowing and selling of 15th - century illustration found in certain editions, and made with wooden blocks.
Collaborator Contribution	The partner provided data to be analyzed.
Impact	The Seebibyte research and the partner presented a talk based on this research at the conference Printing Revolution and Society 1450 - 1500 -- Fifty Years that Changed Europe. The visually searchable database is a research support tool for art historians, book historians, philologists and historians of visual and material culture. A journal article has resulted from this research. The Use and Reuse of Printed Illustrations in 15th-Century Printed Venetian Editions. http://doi.org/10.30687/978-88-6969-332-8.
Start Year	2017


Description	2017TAP2 - Visual Design
Organisation	University of Leeds
Department	School of Languages, Cultures and Societies
Country	United Kingdom
Sector	Academic/University
PI Contribution	This project looks at how graphic resources are used in the wild - in specific text genres and locales (languages / cultures / regions). Rather than doing so on the basis of hand-picked examples, intended to illustrate a particular phenomenon, it allows us to ask whether a particular feature or combination of features is found in a particular document. More significantly, it allows us to ask whether the frequency of features varies across corpora of documents - i.e. whether a given feature is more or less common in a given genre or locale.
Collaborator Contribution	Leeds will provide data for the project.
Impact	This project is a cross disciplinary collaboration between computer vision and Arts and Humanities.
Start Year	2017


Description	2017TAP3 - DigiPal (Text)
Organisation	King's College London
Country	United Kingdom
Sector	Academic/University
PI Contribution	The project has two main objectives: to develop a tool to automatically count lines on a medieval manuscript page and to test the potential for image segmentation of phrases (and possibly even letter-forms) on a corpus of medieval Scottish charters written in Latin.
Collaborator Contribution	KCL will supply data to be analysed.
Impact	This project is a cross-disciplinary collaboration between computer vision and humanities.
Start Year	2017


Description	2017TAP4 - DegiPal (Tiling)
Organisation	King's College London
Country	United Kingdom
Sector	Academic/University
PI Contribution	This project will develop a tool to analyse thousands of images of medieval manuscript and sort them according to agreed criteria (e.g. 'does the image contain an illustration?'). The objective is to eliminate material that is not relevant to researchers and to automatically detect the regions of images which are of interest.
Collaborator Contribution	KCL will provide images to be analysed.
Impact	This project is a cross-disciplinary collaboration between computer vision and humanities.
Start Year	2017


Description	2017TAP5 - 19C Books (Matcher)
Organisation	University of Sheffield
Country	United Kingdom
Sector	Academic/University
PI Contribution	Rather than matching a number of illustrations with one specific illustration, it is hoped that by using machine learning, clusters of matches can be found without the need to provide the software with visual attributes of one illustration but to be able to attribute different visual attributes to different clusters of illustrations. By doing this, it will allow the researcher to get to know more about their data and has the potential to lead to unexpected clusters of matches that can initiate further research. Researchers with substantial datasets may not always have particular illustrations in mind that they wish to find matches for. Using machine learning in this way will allow researchers to ask more general questions about their data and provide further lines of enquiry.
Collaborator Contribution	Sheffield provided dataset for the project.
Impact	This project is a cross-disciplinary collaboration between computer vision and humanities.
Start Year	2017


Description	2017TAP6 - 19C Books (Classifier)
Organisation	University of Sheffield
Country	United Kingdom
Sector	Academic/University
PI Contribution	It is the main objective of this project to use machine learning in order to be able to identify the main print processes that were used to produce illustrations in the eighteenth and nineteenth centuries. Rather than focussing upon the iconographic details of the illustration, the aim is to understand the style of the illustration and whether machine learning techniques are a viable way in which to classify style and method as opposed to visual content.
Collaborator Contribution	Sheffield will provide dataset.
Impact	This project is a cross-disciplinary collaboration between computer vision and humanities.
Start Year	2017


Description	2017TAP7 - Cylinder Seals
Organisation	University of Oxford
Department	Faculty of Oriental Studies
Country	United Kingdom
Sector	Academic/University
PI Contribution	This project will seek to answer the question: why has it proven almost impossible to find any matches between physical seals preserved in collections and seal impressions left on tablets or other clay objects? A number of hypotheses readily present themselves. Were seals continuously re-carved so that the number of possible matches is almost nil? Were those seals used to seal documents and objects deposited differently from those worn as amulets and jewellery? Or have more matches not been found simply because the data has been published in a way that does not facilitate answering this question? None of these questions can be answered without fundamentally changing the way seals and seal impressions are ordered, published, and studied. And none of them can be answered through studies of single seals or small collections, they can only be addressed through a large-scale project relying on innovative, data-driven, and, for the most part, computational analysis.
Collaborator Contribution	The Faculty of Oriental Studies will provide dataset.
Impact	This project is a cross-disciplinary collaboration between computer vision and humanities.
Start Year	2017


Description	2017TAP8 - Fleuron (Matcher)
Organisation	University of Cambridge
Country	United Kingdom
Sector	Academic/University
PI Contribution	'Fleuron' was created by automatically extracting images of printers' ornaments and small illustrations from Eighteenth-Century Collections Online (ECCO), a database of 36 million images of pages from eighteenth-century books. Approximately 1.6 million images were extracted, consisting chiefly of printers' ornaments, arrangements of ornamental type, small illustrations, and diagrams. Some extraneous material such as library stamps and chunks of text were extracted, but most of these were filtered out at an early stage. The extracted images have all of the metadata associated with the original images supplied by ECCO, i.e.: the author and date of the book, the place of publication, the printer(s) and/or publishers(s), the genre and language of the book. Image matching will also help us to remove any remaining extraneous material in the database (i.e. images falsely identified as non-textual material).
Collaborator Contribution	Cambridge will provide the dataset.
Impact	This project is a cross-disciplinary collaboration between computer vision and humanities.
Start Year	2017


Description	2017TAP9 - Fleuron (Classifier)
Organisation	University of Cambridge
Country	United Kingdom
Sector	Academic/University
PI Contribution	'Fleuron' was created by automatically extracting images of printers' ornaments and small illustrations from Eighteenth-Century Collections Online (ECCO), a database of 36 million images of pages from eighteenth-century books. Approximately 1.6 million images were extracted, consisting chiefly of printers' ornaments, arrangements of ornamental type, small illustrations, and diagrams. Some extraneous material such as library stamps and chunks of text were extracted, but most of these were filtered out at an early stage. Currently, the keyword searches available to users of 'Fleuron' do not allow the subject matter of the images to be discovered. The keyword searches are useful for the study of ornaments owned particular printers or use in works by particular authors, but they do not significantly advance the study of the ornaments for their own sake (other than by speeding up the process of browsing). Classification would allow users to find particular types of images within the database, and to investigate the history of certain images and themes.
Collaborator Contribution	Cambridge will provide the dataset.
Impact	This project is a cross-disciplinary collaboration between computer vision and humanities.
Start Year	2017


Description	2018TAP2 - British Library Broadcast News
Organisation	The British Library
Country	United Kingdom
Sector	Public
PI Contribution	The goals of the project are 1. To identify people in the BL's Broadcast News system (Feb 2019). 2. (under discussion) to identify stories within the news. Our Contributions: 1. Made possible the automatic identification of famous people in BL videos 2. Creation of a web page to retrieve the videos frames by person name 3. Made possible the retrieval of frames specifying multiple person names
Collaborator Contribution	BL supplied BBC news videos.
Impact	1. We were runners up at the BL Labs Awards for 2019 2. Possible paper submission to ECCV 2020 (Andrew Brown)?
Start Year	2018


Description	2019TAP2 - Chimpanzee Tracking
Organisation	University of Oxford
Department	School of Anthropology and Museum Ethnography
Country	United Kingdom
Sector	Academic/University
PI Contribution	Our contributions: 1. Automatic identification of Chimpanzees in the wild 2. Automatic tracking of animal faces and bodies, which can be generalized to other animals (already tested it with Elephants and it works) 3. A web-based automatic video-data ingestion system
Collaborator Contribution	The partner provided a large amount of videos of chimpanzees.
Impact	two publications have resulted from this research: Count, Crop and Recognise: Fine-Grained Recognition in the Wild M.Bain, A,Nagrani, D.Schofield, A.Zisserman. Workshop on Computer Vision for Wildlife Conservation, ICCV 2019. Chimpanzee face recognition from videos in the wild using deep learning D.Schofield, A.Nagrani, A.Zisserman, M.Hayashi, T.Matsuzawa, D.Biro, S.Carvalho Science advances, Volumn 5, Number 9, 2019
Start Year	2019


Description	2019TAP3 - Ashmolean Digital Archive
Organisation	University of Oxford
Department	Ashmolean Museum
Country	United Kingdom
Sector	Academic/University
PI Contribution	Our contribution: The Ashmolean Museum has a large collection of digital photographs. Using our visual search tools, we analyzed a subset of 262007 images from this collection and found a large number of exact and near duplicate images. We reported these results to the team at Ashmolean Museum who is currently undertaking a migration of these digital assets to a new Digital Asset Management System (DAMS).
Collaborator Contribution	The Ashmolean provided their collection of digital photographs.
Impact	The team responsible for migration of digital assets at Ashmolean Museum have found our results very surprising -- in the sense that they did not expect such a large number of exact duplicates in their digital archive -- and extremely valuable for their migration process. They have also shown an interest in applying such analysis to other subsets of their digital archive and explore additional new projects like corruption detection, automatic classification and labelling, etc.
Start Year	2019


Description	Graphene Defect Detection
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	We provide software algorithm.
Collaborator Contribution	The partner provide dataset and interpretation of the computer analysis.
Impact	Project paper is in Progress.
Start Year	2016


Description	Micrograph Defect Detection
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	We provide software algorithm.
Collaborator Contribution	The partner provide dataset and interpretation of the computer analysis.
Impact	Software has been given to the collaborator.
Start Year	2016


Description	NLS Chapbooks
Organisation	National Library of Scotland
Country	United Kingdom
Sector	Academic/University
PI Contribution	We used our software to search and analysed the illustrations of the chapbooks.
Collaborator Contribution	Partner provided chapbooks in large quatities.
Impact	https://www.robots.ox.ac.uk/~vgg/research/chapbooks/
Start Year	2020


Description	National Consortium of Intelligent Medical Imaging
Organisation	National Consortium of Intelligent Medical Imaging
Sector	Academic/University
PI Contribution	A VisualAI postdoc (Jianbo Jiao) is providing expertise for image-based building deep learning models to assess COVID19 deterioration for hospital-based patients.
Collaborator Contribution	NCIMI is providing access to COVID19 data for a TAP project.
Impact	An initial evaluation of predictive modelling was performed using available covid-19 data. However due to the small size of the data, and the fact that covid treatments for patients have significantly improved and better pathways for patients are in place it was deemed not worth pursuing this work further beyond the preliminary study. A report was written but has not been published.
Start Year	2021


Description	Penguin Counting
Organisation	University of Oxford
Country	United Kingdom
Sector	Academic/University
PI Contribution	We provide software algorithm.
Collaborator Contribution	The collaboration provide dataset and specialised analysing methods.
Impact	Paper: counting in the Wild, by Carlos Arteta, Victor Lempitsky and Andrew Zisserman. This collaboration is between Information Engineering and Zoology disciplines.
Start Year	2016


Description	TAP VAI-02 1516 Project
Organisation	University of Copenhagen
Country	Denmark
Sector	Academic/University
PI Contribution	We created a visual search engine using images and metadata supplied by Matilde Malaspina at University of Copenhagen and Barbara Tramelli from University of Venice.
Collaborator Contribution	Partner provided images and metadata.
Impact	A talk at Venice Centre for Digital and Public Humanities (VeDPH) on 9th Dec. 2020
Start Year	2020


Description	Video Recognition from the Dashboard
Organisation	Continental AG
Country	Germany
Sector	Private
PI Contribution	Working with research engineers to develop recognition in road scenes and human gestures.
Collaborator Contribution	Supplying data.
Impact	N/A
Start Year	2016


Title	Adaptive Text Recognition through Visual Matching
Description	In this work, our objective is to address the problems of generalization and flexibility for text recognition in documents. We introduce a new model that exploits the repetitive nature of characters in languages, and decouples the visual representation learning and linguistic modelling stages. By doing this, we turn text recognition into a shape matching problem, and thereby achieve generalization in appearance and flexibility in classes. We evaluate the new model on both synthetic and real datasets across different alphabets and show that it can handle challenges that traditional architectures are not able to solve without expensive retraining, including: ( i ) it can generalize to unseen fonts without new exemplars from them; ( ii ) it can flexibly change the number of classes, simply by changing the exemplars provided; and ( iii ) it can generalize to new languages and new characters that it has not been trained for by providing a new glyph set. We show significant improvements over state-of-the-art models for all these cases.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	C.Zhang, A.Gupta, A.Zisserman Adaptive Text Recognition through Visual Matching In ECCV, 2020
URL	https://www.robots.ox.ac.uk/~vgg/research/FontAdaptor20/


Title	Automatic grading of Spine MRI (SpineNet_v2)
Description	To combat back pain, we've developed SpineNet, a computer vision-based system to automatically perform a wide range of radiological gradings in spinal magnetic resonance imaging. It is robust and has been validated across several different datasets, showing performance comparable to clinical radiologists.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	R. Windsor, A. Jamaludin, T. Kadir, A. Zisserman. A Convolutional Approach to Vertebrae Detection and Labelling in Whole Spine MRI. MICCAI 2020. link A. Jamaludin, T. Kadir and A. Zisserman. SpineNet: Automated classification and evidence visualization in spinal MRIs. Medical Image Analysis, 41 (Supplement C): 63-73, 2017.
URL	http://zeus.robots.ox.ac.uk/spinenet/2/demo.html


Title	Automatically Discovering and Learning New Visual Categories with Ranking Statistics
Description	We tackle the problem of discovering novel classes in an image collection given labelled examples of other classes. This setting is similar to semi-supervised learning, but significantly harder because there are no labelled examples for the new classes. The challenge, then, is to leverage the information contained in the labelled images in order to learn a general-purpose clustering model and use the latter to identify the new classes in the unlabelled data. In this work we address this problem by combining three ideas: (1) we suggest that the common approach of bootstrapping an image representation using the labeled data only introduces an unwanted bias, and that this can be avoided by using self-supervised learning to train the representation from scratch on the union of labelled and unlabelled data; (2) we use rank statistics to transfer the model's knowledge of the labelled classes to the problem of clustering the unlabelled images; and, (3) we train the data representation by optimizing a joint objective function on the labelled and unlabelled subsets of the data, improving both the supervised classification of the labelled data, and the clustering of the unlabelled data. We evaluate our approach on standard classification benchmarks and outperform current methods for novel category discovery by a significant margin.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	A publication has resulted from this: AUTOMATICALLY DISCOVERING AND LEARNING NEW VISUAL CATEGORIES WITH RANKING STATISTICS, ICLR 2020
URL	http://www.robots.ox.ac.uk/~vgg/research/auto_novel/


Title	Class-Agnostic Counting
Description	Our General Matching Network (GMN), pretrained on video data, can count an object, e.g. windows or columns, specified as an exemplar patch (in red), without additional training. The heat maps indicate the localizations of the counted objects. This image is unseen during training.
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	A publication has resulted from this research: Class-Agnostic Counting Erika Lu, Weidi Xie, and Andrew Zisserman Asian Conference on Computer Vision (ACCV) 2018
URL	http://www.robots.ox.ac.uk/~vgg/publications/


Title	Convnet Human Action Recognitio
Description	This is a model to recognize human actions in video.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	A publication has resulted from this research: Convolutional Two-Stream Network Fusion for Video Action Recognition. C. Feichtenhofer, A. Pinz, A. Zisserman, CVPR, 2016.
URL	http://www.robots.ox.ac.uk/~vgg/software/two_stream_action/


Title	Convnet Keypoint Detection
Description	It is a model based on convolution neural network to automatically detect keypoints (like head, elbow, ankle, etc.) in a photograph of a human body.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	A paper has resulted from this research: V. Belagiannis, A. Zisserman Recurrent Human Pose Estimation arXiv:1605.02914
URL	http://www.robots.ox.ac.uk/~vgg/software/keypoint_detection/


Title	Convnet text spotting
Description	This is a model based on convolution neural network to automatically detect English text in a natural images
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	A publication has resulted from this research: A. Gupta, A. Vedaldi, A. Zisserman Synthetic Data for Text Localisation in Natural Images IEEE Conference on Computer Vision and Pattern Recognition, 2016
URL	http://www.robots.ox.ac.uk/~vgg/software/textspot/


Title	Deep Lip Reading: A comparison of models and an online application
Description	The goal of this work is to develop state-of-the-art models for lip reading -- visual speech recognition. We develop three architectures and compare their accuracy and training times: (i) a recurrent model using LSTMs; (ii) a fully convolutional model; and (iii) the recently proposed transformer model. The recurrent and fully convolutional models are trained with a Connectionist Temporal Classification loss and use an explicit language model for decoding, the transformer is a sequence-to-sequence model. Our best performing model improves the state-of-the-art word error rate on the challenging BBC-Oxford Lip Reading Sentences 2 (LRS2) benchmark dataset by over 20 percent. As a further contribution we investigate the fully convolutional model when used for online (real time) lip reading of continuous speech, and show that it achieves high performance with low latency.
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	T. Afouras, J. S. Chung, A. Zisserman Deep Lip Reading: A comparison of models and an online application INTERSPEECH, 2018
URL	https://www.robots.ox.ac.uk/~vgg/research/deep_lip_reading/


Title	EPIC-Fusion: Audio-Visual Temporal Binding for Egocentric Action Recognition
Description	We focus on multi-modal fusion for egocentric action recognition, and propose a novel architecture for multi-modal temporal-binding, i.e. the combination of modalities within a range of temporal offsets. We train the architecture with three modalities - RGB, Flow and Audio - and combine them with mid-level fusion alongside sparse temporal sampling of fused representations. In contrast with previous works, modalities are fused before temporal aggregation, with shared modality and fusion weights over time. Our proposed architecture is trained end-to-end, outperforming individual modalities as well as late-fusion of modalities.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	We demonstrate the importance of audio in egocentric vision, on per-class basis, for identifying actions as well as interacting objects. Our method achieves state of the art results on both the seen and unseen test sets of the largest egocentric dataset: EPIC-Kitchens, on all metrics using the public leaderboard.
URL	https://ekazakos.github.io/TBN/


Title	Find Identical Images (FII)
Description	Identical images have the same image dimension (i.e. image width, image height, number of colour channels) and same pixel value in all corresponding pixel locations. FII is a command line tool to find all identical images in a folder. It can also find images that are common in two folders.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	Image Comparator (IMCOMP)
Description	Image Comparator (or, IMCOMP) is a web application to automatically compare a pair of images using geometric and photometric transformations. It is an open source project maintained by the Visual Geometry Group.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	Here are some features of IMCOMP: available as an online tool that can be accessed from any modern web browser a large number of visualizations are available to help users spot the difference between two images Supports photometric transformation to compensate for colour differences between two images Supports different types of geometric transformations (e.g. similarity, affine, thin-plate spline, etc.) to enable comparison of images containing many types of deformations. Results can be saved as an image.
URL	http://www.robots.ox.ac.uk/~vgg/software/imcomp/


Title	Lip Synchronisation
Description	This is an sudio-to-video synchronisation network which can be used for audio-visual synchronisation tasks including: (1) removing temporal lags between the audio and visual streams in a video, and (2) determining who is speaking amongst multiple faces in a video.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	A publication has resulted from this research: J. S. Chung, A. Zisserman Out of time: automated lip sync in the wild Workshop on Multi-view Lip-reading, ACCV, 2016
URL	http://www.robots.ox.ac.uk/~vgg/software/lipsync/


Title	List Annotator (LISA)
Description	List Annotator (LISA) is a standalone and light-weight HTML/CSS/JavaScript based application to efficiently annotate a large list of images. LISA is an open source project developed and maintained by the Visual Geometry Group (VGG) and released under a license that grants its users the freedom to use it for any purpose.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	MatConvNet
Description	MatConvNet is a MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision applications. It is simple, efficient, and can run and learn state-of-the-art CNNs. Many pre-trained CNNs for image classification, segmentation, face recognition, and text detection are available.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	The MatConvNet toolbox is widely employed in the researches conducted by researchers in the Visual Geometry Group in the University of Oxford including Text Spotting, Penguin Counting and Human Action Recognition. The researcher Andrea Vedaldi has taught this software in following Summer Schools: Medical Imaging Summer School (MISS), Favignana (Sicily), 2016: (Somewhat) Advanced Convolutional Neural Networks [slides]; Understanding CNNs using visualisation and transformation analysis [slides]; All video lectures from the summer school. iV&L Net Training School 2016. Malta.
URL	http://www.vlfeat.org/matconvnet/


Title	Seebibyte Visual Tracker (SVT)
Description	SVT is a tool to track multiple objects in a video.
Type Of Technology	Webtool/Application
Year Produced	2018
Open Source License?	Yes
Impact	This software does not require any training or fine tuning and all the components to track objects in a video are included with this software.
URL	http://seebibyte.org/


Title	Seeing wake words: Audio-visual Keyword Spotting
Description	The goal of this work is to automatically determine whether and when a word of interest is spoken by a talking face, with or without the audio. We propose a zero-shot method suitable for 'in the wild' videos. Our key contributions are: (1) a novel convolutional architecture, KWS-Net, that uses a similarity map intermediate representation to separate the task into (i) sequence matching, and (ii) pattern detection, to decide whether the word is there and when; (2) we demonstrate that if audio is available, visual keyword spotting improves the performance both for a clean and noisy audio signal. Finally, (3) we show that our method generalises to other languages, specifically French and German, and achieves a comparable performance to English with less language specific data, by fine-tuning the network pre-trained on English. The method exceeds the performance of the previous state-of-the-art visual keyword spotting architecture when trained and tested on the same benchmark, and also that of a state-of-the-art lip reading method.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	Seeing wake words: Audio-visual Keyword Spotting Liliane Momeni, Triantafyllos Afouras, Themos Stafylakis, Samuel Albanie, Andrew Zisserman
URL	https://arxiv.org/abs/2009.01225


Title	Self-Supervised Learning of Audio-Visual Objects from Video
Description	Our objective is to transform a video into a set of discrete audio-visual objects using self-supervised learning. To this end we introduce a model that uses attention to localize and group sound sources, and optical flow to aggregate information over time. We demonstrate the effectiveness of the audio-visual object embeddings that our model learns by using them for four downstream speech-oriented tasks: (a) multi-speaker sound source separation, (b) localizing and tracking speakers, (c) correcting misaligned audio-visual data, and (d) active speaker detection. Using our representation, these tasks can be solved entirely by training on unlabeled video, without the aid of object detectors. We also demonstrate the generality of our method by applying it to non-human speakers, including cartoons and puppets. Our model significantly outperforms other self-supervised approaches, and obtains performance competitive with methods that use supervised face detection.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	T. Afouras, A. Owens, J. S. Chung, A. Zisserman Self-Supervised Learning of Audio-Visual Objects from Video European Conference on Computer Vision, 2020
URL	https://www.robots.ox.ac.uk/~vgg/research/avobjects/


Title	Self-supervised Co-training for Video Representation Learning
Description	The objective of this paper is visual-only self-supervised video representation learning. We make the following contributions: (i) we investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation (InfoNCE) training, showing that this form of supervised contrastive learning leads to a clear improvement in performance; (ii) we propose a novel self-supervised co-training scheme to improve the popular infoNCE loss, exploiting the complementary information from different views, RGB streams and optical flow, of the same data source by using one view to obtain positive class samples for the other; (iii) we thoroughly evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval. In both cases, the proposed approach demonstrates state-of-the-art or comparable performance with other self-supervised approaches, whilst being significantly more efficient to train, i.e. requiring far less training data to achieve similar performance.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	Self-supervised Co-training for Video Representation Learning. Tengda Han, Weidi Xie, Andrew Zisserman NeurIPS, 2020
URL	https://www.robots.ox.ac.uk/~vgg/research/CoCLR/


Title	Self-supervised Learning for Video Correspondence Flow
Description	The objective of this paper is self-supervised learning of feature embeddings that are suitable for matching correspondences along the videos, which we term correspondence flow. By leveraging the natural spatial-temporal coherence in videos, we propose to train a "pointer" that reconstructs a target frame by copying pixels from a reference frame.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	We make the following contributions: First, we introduce a simple information bottleneck that forces the model to learn robust features for correspondence matching, and prevent it from learning trivial solutions, e.g. matching based on low-level colour information. Second, to tackle the challenges from tracker drifting, due to complex object deformations, illumination changes and occlusions, we propose to train a recursive model over long temporal windows with scheduled sampling and cycle consistency. Third, we achieve state-of-the-art performance on DAVIS 2017 video segmentation and JHMDB keypoint tracking tasks, outperforming all previous self-supervised learning approaches by a significant margin. Fourth, in order to shed light on the potential of self-supervised learning on the task of video correspondence flow, we probe the upper bound by training on additional data, i.e. more diverse videos, further demonstrating significant improvements on video segmentation.
URL	https://arxiv.org/abs/1905.00875


Title	Self-supervised Learning from Watching Faces
Description	Fab-Net is a self-supervised framework that learns a face embedding which encodes facial attributes, such as head pose, expression and facial landmarks. It is trained in a self-supervised manner by leveraging video data. Given two frames from the same face track, FAb-Net learns to generate the target frame from a source frame.
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	A presentation resulted from this research: Self-supervised learning of a facial attribute embedding from video. Wiles, O., Koepke, A.S., Zisserman, A. In BMVC, 2018. (Oral)
URL	http://www.robots.ox.ac.uk/~vgg/publications/


Title	Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval
Description	Optimising a ranking-based metric, such as Average Precision (AP), is notoriously challenging due to the fact that it is non-differentiable, and hence cannot be optimised directly using gradient-descent methods. To this end, we introduce an objective that optimises instead a smoothed approximation of AP, coined Smooth-AP. Smooth-AP is a plug-and-play objective function that allows for end-to-end training of deep networks with a simple and elegant implementation. We also present an analysis for why directly optimising the ranking based metric of AP offers benefits over other deep metric learning losses. We apply Smooth-AP to standard retrieval benchmarks : Stanford On- line products and VehicleID, and also evaluate on larger-scale datasets: INaturalist for fine-grained category retrieval, and VGGFace2 and IJB- C for face retrieval. In all cases, we improve the performance over the state-of-the-art, especially for larger-scale datasets, thus demonstrating the effectiveness and scalability of Smooth-AP to real-world scenarios.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval Andrew Brown, Weidi Xie, Vicky Kalogeiton, Andrew Zisserman European Conference on Computer Vision (ECCV), 2020
URL	https://www.robots.ox.ac.uk/~vgg/research/smooth-ap/


Title	Use What You Have: Video retrieval using representations from collaborative experts
Description	Our goal is to condense the multi-modal, extremely high dimensional information from videos into a single, compact video representation for the task of video retrieval using free-form text queries, where the degree of specificity is open-ended. For this we exploit existing knowledge in the form of pre-trained semantic embeddings which include `general' features such as motion, appearance, and scene features from visual content. We also explore the use of more `specific` cues from ASR and OCR which are intermittently available for videos and find that these signals remain challenging to use effectively for retrieval. We propose a collaborative experts model to aggregate information from these different pre-trained experts and assess our approach empirically on five retrieval benchmarks: MSR-VTT, LSMDC, MSVD, DiDeMo, and ActivityNet.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	A poster was presented at the BMVA 2019.
URL	https://arxiv.org/abs/1907.13487


Title	Utterance-level Aggregation for Speaker Recognition in the Wild
Description	The objective of this paper is speaker recognition 'in the wild' - where utterances may be of variable length and also contain irrelevant signals. Crucial elements in the design of deep networks for this task are the type of trunk (frame level) network, and the method of temporal aggregation. We propose a powerful speaker recognition deep network, using a 'thin- ResNet' trunk architecture, and a dictionary-based NetVLAD or GhostVLAD layer to aggregate features across time, that can be trained end-to-end. We show that our network achieves state of the art performance by a significant margin on the VoxCeleb1. test set for speaker recognition, whilst requiring fewer parameters than previous methods. We also investigate the effect of utterance length on performance, and conclude that for 'in the wild' data, a longer length is beneficial
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	Two publications have resulted from this software: W. Xie, A. Nagrani, J. S. Chung, A. Zisserman Utterance-level Aggregation For Speaker Recognition In The Wild International Conference on Acoustics, Speech, and Signal Processing (Oral), 2019 Bibtex \| Abstract \| PDF \| All Y. Zhong, R. Arandjelovic, A. Zisserman GhostVLAD for set-based face recognition Asian Conference on Computer Vision, dec 2018 Bibtex \| Abstract \| PDF \| arXiv \| All
URL	http://www.robots.ox.ac.uk/~vgg/research/speakerID/


Title	VGG Annotation Search and Annotator (VASA)
Description	VGG Annotation Search and Annotator (VASA) is a variation of the VGG Image Annotator (VIA) v2 tool augmented with searching capabilities. VASA runs in a web browser and does not require any installation or setup. The complete VASA software fits in a single self-contained HTML page of size less than 900 Kilobyte that runs as an offline application in most modern web browsers.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	No notable impact yet as the software is new.


Title	VGG Face Finder (VFF)
Description	VFF is a web application that serves as a web engine to perform searches for faces over an user-defined image dataset. It is based on the original application created by VGG to perform visual searchers over a large dataset of images from BBC News.
Type Of Technology	Webtool/Application
Year Produced	2018
Open Source License?	Yes
Impact	Features Performs queries by entering a text or an image Automatically downloads training images from Google Performs automatic training, classification and ranking of results Automatically caches query results Provides an user management interface Allows further query refinement Enables users to create curated queries using their own training images Enables users to create queries based on the metadata of their images Is capable of data ingestion, i.e., users can search their own dataset and define their own metadata Can be executed with GPU support October 2018: The new VFF v1.1 now uses a more accurate CNN for face feature extraction
URL	http://seebibyte.org/


Title	VGG Image Annotator
Description	VGG Image Annotator is a standalone application, with which you can define regions in an image and create a textual description of those regions. Such image regions and descriptions are useful for supervised training of learning algorithms.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	The VIA tool has been employed to annotate large volume of scanned images of 15th Century books in the Faculty of Medieval and Modern Languages in the University of Oxford for the 15th Century Booktrade project (http://15cbooktrade.ox.ac.uk/).
URL	http://www.robots.ox.ac.uk/~vgg/software/via/


Title	VGG Image Annotator (VIA)
Description	VGG Image Annotator (VIA) is an image annotation tool that can be used to define regions in an image and create textual descriptions of those regions.
Type Of Technology	Webtool/Application
Year Produced	2018
Open Source License?	Yes
Impact	Here is a list of some salient features of VIA: based solely on HTML, CSS and Javascript (no external javascript libraries) can be used off-line (full application in a single html file of size < 400KB) requires nothing more than a modern web browser (tested on Firefox, Chrome and Safari) supported region shapes: rectangle, circle, ellipse, polygon, point and polyline import/export of region data in csv and json file format supports bulk update of annotations in image grid view quick update of annotations using on-image annotation editor keyboard shortcuts to speed up annotation
URL	http://seebibyte.org/


Title	VGG Image Annotator (VIA)
Description	VGG Image Annotator is a simple and standalone manual annotation software for image, audio and video. VIA runs in a web browser and does not require any installation or setup. The complete VIA software fits in a single self-contained HTML page of size less than 400 Kilobyte that runs as an offline application in most modern web browsers.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	VGG Image Annotator (VIA) Version 3
Description	VGG Image Annotator is a simple and standalone manual annotation software for image, audio and video. VIA runs in a web browser and does not require any installation or setup. The complete VIA software fits in a single self-contained HTML page of size less than 400 Kilobyte that runs as an offline application in most modern web browsers.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	(19 Feb. 2019) Dr. Abhishek Dutta was awarded the University of Oxford MPLS Early Career Impact Award for developing "the VGG Image Annotator - a widely used open source manual image annotation software application". 'The VIA Annotation Software for Images, Audio and Video' was presented at the 27th ACM international conference on Multimedia, Oct 2019.
URL	https://dl.acm.org/doi/10.1145/3343031.3350535


Title	VGG Image Classification (VIC) Engine
Description	VIC is a web application that serves as a web engine to perform image classification queries over an user-defined image dataset. It is based on the original application created by VGG to perform visual searchers over a large dataset of images from BBC News.
Type Of Technology	Webtool/Application
Year Produced	2017
Open Source License?	Yes
Impact	This software performs following functions: -Performs queries by entering a text or an image -Automatically downloads training images from Google -Performs automatic training, classification and ranking of results -Automatically caches query results -Provides a user management interface -Allows further query refinement -Enables users to create curated queries using their own training images Is capable of data ingestion, i.e., users can search their own dataset and define their own metadata
URL	http://www.robots.ox.ac.uk/~vgg/software/vic/


Title	VGG Image Search Engine (VISE)
Description	VGG Image Search Engine (VISE) is a free and open source software for visual search of a large number of images using an image as a search query.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	tbc


Title	VGG Image Search Engine (VISE)
Description	VISE is a tool that can be used to search a large dataset for images that match any part of a given image.
Type Of Technology	Webtool/Application
Year Produced	2017
Open Source License?	Yes
Impact	This standalone application can be used to make a large collection of images searchable by using image regions as a query.
URL	http://www.robots.ox.ac.uk/~vgg/software/vise/


Title	VGG Text Search (VTS) Engine
Description	The VGG Text Search (VTS) Engine is an open source project developed at the Visual Geometry Group and released under the BSD-2 clause. VTS is a web application that serves as a web engine to perform searches for text strings over an user-defined image dataset. It is based on the original application created by VGG to perform visual searchers over a large dataset of images from BBC News.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	Y. Liu, Z. Wang, H. Jin, I. Wassell Synthetically Supervised Feature Learning for Scene Text Recognition European Conference on Computer Vision (ECCV), 2018.
URL	https://www.robots.ox.ac.uk/~vgg/software/vts/


Title	Video Representation Learning by Dense Predictive Coding
Description	The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for self-supervised representation learning on videos. This learns a dense encoding of spatio-temporal blocks by recurrently predicting future representations; Second, we propose a curriculum training scheme to predict further into the future with progressively less temporal context. This encourages the model to only encode slowly varying spatial-temporal signals, therefore leading to semantic representations; Third, we evaluate the approach by first training the DPC model on the Kinetics-400 dataset with self-supervised learning, and then finetuning the representation on a downstream task, i.e. action recognition.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	With single stream (RGB only), DPC pretrained representations achieve state-of-the-art self-supervised performance on both UCF101(75.7% top1 acc) and HMDB51(35.7% top1 acc), outperforming all previous learning methods by a significant margin, and approaching the performance of a baseline pre-trained on ImageNet. A publication has resulted from this software: Tengda Han, Weidi Xie, Andrew Zisserman Video Representation Learning by Dense Predictive Coding [Oral Presentation] Workshop on Large-scale Holistic Video Understanding, ICCV, 2019 Bibtex \| PDF \| arXiv \| Code \| Video \| All
URL	http://www.robots.ox.ac.uk/~vgg/research/DPC/


Title	You Said That
Description	The software provides a method for generating a video of a talking face using deep learning. The method takes still images of the target face and an audio speech segment as inputs, and generates a video of the target face lip synched with the audio. The method runs in real time and is applicable to faces and audio not seen at training time.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	A publication has resulted from this research: J. S. Chung, A. Jamaludin, A. Zisserman You said that? British Machine Vision Conference, 2017
URL	http://www.robots.ox.ac.uk/~vgg/publications/


Company Name	GROUND TRUTH LABS LTD
Description	Ground Truth Labs' (GTL) mission is to enable the creation of shared digital histology cohorts, to support the discovery of new disease relevant features, and to make these advancements available to clinicians and scientists. By enabling a level of collaboration that is not possible today, GTL will help to advance the quality of diagnostic reporting, especially in specialty disciplines and for rare diseases. With a robust and reliable technology platform, GTL provides the underpinnings for a rapid regulatory approval necessary to help patients. This technology is paired with a business model that rewards cohort creation. GTL offers a platform for building digital cohorts and AI tools to enable quantitative reporting.
Year Established	2019
Impact	The company has just been set up and there are no notable impacts to report.
Website	https://groundtruthlabs.com/


Description	15cILLUSTRATION: Visual Search and Manual Image Annotation- Dutta
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Gave a presentation on using our tools to assist the research in the history of books in 15c. New researchers became familiar with our tools, 1 audience is now collaborating with us in an existing interdisciplinary research project
Year(s) Of Engagement Activity	2019


Description	2021 AIUM AI Summit invited speaker
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The American Institute for Ultrasound in Medicine held an invited-only summit on AI to consider how the organisation should position itself as a leader in AI for ultrasound. I was an invited speaker at the two-day remotely held meeting and participated in discussions on the second day related to shaping the AIUM next steps.
Year(s) Of Engagement Activity	2021


Description	AI4LAM
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Special interest group in AI for Libraries, Arts and Museums
Year(s) Of Engagement Activity	2020
URL	https://sites.google.com/view/ai4lam


Description	AI@Oxford
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	A unique opportunity to see the state-of-the-art in artificial intelligence and machine learning at one of the world's great universities, and meet Oxford's AI experts one-to-one. This event will show you the reality of AI today: what is possible and where the technology is going.
Year(s) Of Engagement Activity	2018
URL	https://www.mpls.ox.ac.uk/upcoming-events/artificial-intelligence-oxford


Description	AIUM 2021 Special Session Invited Speaker
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited speaker in Session with Title: Deep Learning Applications for New Ultrasound Techniques. Talk was pre-recorded with live questions. This primary audience was medical physicists rather than medical image analysis experts.
Year(s) Of Engagement Activity	2021


Description	AVinDH workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	DH is the largest Digital Humanities conference, and attracts a largely academic audience, at all levels. It's diverse, and gives a good sense of what people are up to in all fields of the humanities that involve computers.
Year(s) Of Engagement Activity	2017
URL	https://avindhsig.wordpress.com/workshop-2017-montreal/


Description	Adversarial Machine Learning in Computer Vision CVPR 2020
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	Abstract: Although computer vision models have achieved advanced performance on various recognition tasks in recent years, they are known to be vulnerable against adversarial examples. The existence of adversarial examples reveals that current computer vision models perform differently with human vision system, and on the other hand provides opportunities for understanding and improving these models. In this workshop, we will focus on recent research and future directions for adversarial machine learning in computer vision. We aim at bringing experts from computer vision, machine learning and security communities together to highlight the recent progress in this area, as well as discussing the benefits of integrating recent progress in adversarial machine learning into general computer vision tasks. Specifically, we seek to study adversarial machine learning for not only enhancing the model robustness against adversarial attacks, but also as a guide to diagnose/explain the limitation of current computer vision models as well as potential improving strategies. We hope this workshop can shed light on bridging the gap between the human vision system and computer vision systems, and chart out cross-community collaborations, including computer vision, machine learning and security communities. Topics include but not limited to: ? Adversarial attacks against computer vision tasks ? Real world attacks against computer vision systems ? Improving model robustness against adversarial attacks ? Theoretic understanding of adversarial machine learning ? Applying adversarial machine learning to diagnosing/explaining computer vision models ? Improving the performance of general tasks via adversarial machine learning (e.g. generative models, image captioning, image recognition)
Year(s) Of Engagement Activity	2020
URL	https://adv-workshop-2020.github.io/


Description	Automated Tagging of Image and Video Collections using Face Recognition
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	This was a presentation to British Library staff working on data collection and information/digital curation.
Year(s) Of Engagement Activity	2018


Description	Automated Tagging of Image and Video Collections using Face Recognition at the event No Time to Wait! 3
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The presenter demonstrated how to use Seebibyte developed face recognition software to tag images and video collections. The event was widely attended by 80-100 people from international professions of open media, open standards and digital audiovisual preservation. The event attracted much publicity in social media.
Year(s) Of Engagement Activity	2018


Description	Automated tagging of the BFI archive using face recognition
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	This was a presentation to BFI which was attended by 20 BFI staff working on data collections and information/digital curation.
Year(s) Of Engagement Activity	2018


Description	BL GLAM Machine Learning Meetup
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	It was a meetup for galleries, libraries, archives and museums (GLAM) IT staff, researchers and suppliers.
Year(s) Of Engagement Activity	2018


Description	Blocks, Plates Stones Conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Possibly the first-ever conference on printing surfaces (blocks, plates and stones) dealing with historical research, conservation issues and artistic possibilities with collections.
Year(s) Of Engagement Activity	2017
URL	https://www.ies.sas.ac.uk/events/conferences/previous-conferences/blocks-plates-stones-conference


Description	Blocks, Plates Stones ECR training day
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Training day for ECRs in printing history
Year(s) Of Engagement Activity	2017
URL	http://www.academia.edu/33139617/CALL_FOR_APPLICATIONS_ECR_Training_Day_Using_Historical_Matrices_an...


Description	Bodleian Conservators
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Bodleian conservators work on books, prints, photographs, papyri and other media. They often take digital pictures for the purposes of recording condition or analysis.
Year(s) Of Engagement Activity	2017


Description	Bodleian Digital Scholarship Research Uncovered lecture
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	Bodleian Centre for Digital Scholarship hosts a lecture series, open to all.
Year(s) Of Engagement Activity	2017


Description	British Library Digital Labs Symposium
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Showcase of the British Library's digital collections and projects, in the form of presentations and posters.
Year(s) Of Engagement Activity	2017
URL	http://blogs.bl.uk/digital-scholarship/2017/09/bl-labs-symposium-2017-mon-30-oct-book-your-place-now...


Description	CERL Seminar - Visual Approaches to Cultural Heritage
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	CERL is the largest forum for European libraries. Our researcher engaged in Q&A and networked with potential collaborators who expressed interest in using our software in their research.
Year(s) Of Engagement Activity	2018
URL	https://www.cerl.org/services/seminars/powerpoint_presentations_zurich


Description	CVPR 2018 Tutorial on Interpretable Machine Learning for Computer Vision
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Co-organised a CVPR 2018 Tutorial on Interpretable Machine Learning for Computer Vision.
Year(s) Of Engagement Activity	2018
URL	https://interpretablevision.github.io/index_cvpr2018.html


Description	Co-organiser of ASMUS2020, a MICCAI workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Advances in Simplifying Medical UltraSound (ASMUS) 2020 is an international workshop that provides a forum for research topics around ultrasound image computing and computer-assisted interventions and robotic systems that utilize ultrasound imaging. It was held in conjunction with MICCAI 2020 in virtual form. The nineteen accepted papers were selected based on their scientific contribution, via a double-blind process involving written reviews from at least two external reviewers in addition to a member of the committee. The published work includes reports across a wide range of methodology, research and clinical applications. Advanced deep learning approaches for anatomy recognition, segmentation, registration and skill assessment are the dominant topics, in addition to ultrasound-specific new approaches in augmented reality and remote assistance. An interesting trend revealed by these papers is the merging of ultrasound probe and surgical instrument localization with robotically assisted guidance to produce increasingly intelligent systems that learn from expert labels and incorporate domain knowledge to enable increasingly sophisticated automation and fine-grained control. Two invited speakers were included in the workshop and the meeting had 80+ attendees.
Year(s) Of Engagement Activity	2020
URL	https://sites.google.com/view/asmus2020


Description	Co-organiser of ASMUS2021, a MICCAI workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Advances in Simplifying Medical UltraSound (ASMUS) 2021 is an international workshop that provides a forum for research topics around ultrasound image computing and computer-assisted interventions and robotic systems that utilize ultrasound imaging. It was held in conjunction with MICCAI 2021 in virtual form. Accepted papers were selected based on their scientific contribution, via a double-blind process involving written reviews from at least two external reviewers in addition to a member of the committee. The published work includes reports across a wide range of methodology, research and clinical applications. Advanced deep learning approaches for anatomy recognition, segmentation, registration and skill assessment are the dominant topics, in addition to ultrasound-specific new approaches in augmented reality and remote assistance. Three invited speakers were included in the workshop, and live demos of technologies were given. The meeting had 80+ attendees.
Year(s) Of Engagement Activity	2021
URL	https://miccai-ultrasound.github.io/#/asmus21


Description	Collections Trust Dynamic Collections conference
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Other audiences
Results and Impact	The national museums umbrella organisation event
Year(s) Of Engagement Activity	2020
URL	https://collectionstrust.org.uk/blog/collections-trust-conference-debrief/


Description	Distinguished Keynote Speaker in Biomedical and Health Data Science in two joint conferences of IEEE EMBS BHI and BSN 2021
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Keynote talk entitled: Simplifying interpretation and acquisition of ultrasound scans, delivered virtually. Abstract: Short Abstract: With the increased availability of low-cost and handheld ultrasound probes, there is interest in simplifying interpretation and acquisition of ultrasound scans through deep-learning based analysis so that ultrasound can be used more widely in healthcare. However, this is not just "all about the algorithm", and successful innovation requires inter-disciplinary thinking and collaborations. In this talk I will overview progress in this area drawing on examples of my laboratory's experiences of working with partners on multi-modal ultrasound imaging, and building assistive algorithms and devices for pregnancy health assessment in high-income and low-and-middle-income country settings. Emerging topics in this area will also be discussed.
Year(s) Of Engagement Activity	2021


Description	Eastern European Machine Learning Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The summer school comprises lectures and practical sessions conducted by renowned experts in different areas of artificial intelligence. Andrew Zisserman taught on "Self-supervised Learning"
Year(s) Of Engagement Activity	2019
URL	https://www.eeml.eu/previous-editions/eeml19


Description	Edinburgh CDCS workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Digital Humanities training event
Year(s) Of Engagement Activity	2020
URL	https://www.cdcs.ed.ac.uk/events/workshop-chapbooks-national-library-scotland


Description	European Conference on Computer Vision (ECCV) 2020
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrea Vedaldi is co-organising the European Conference on Computer Vision 2020 as program chair. ECCV is one of the top three international conference in the are. We project and attendance of more than 5K individuals. The organisation is a 2 years effort, which is why this entry is listed this year.
Year(s) Of Engagement Activity	2019,2020
URL	http://eccv2020.eu


Description	Exploring new TAP project opportunities - Dutta Nepal
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Media (as a channel to the public)
Results and Impact	Presented the opportunities of collaborations with Seebibyte in the form of Transfer and Application Project (TAP) to heads of different departments at Kantipur Media Group (Nepal).
Year(s) Of Engagement Activity	2019


Description	ICCV 2019 Tutorial on Interpretable Machine Learning for Computer Vision
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Co-organised a tutorial on Interpretable ML at ICCV 2019.
Year(s) Of Engagement Activity	2019
URL	https://interpretablevision.github.io


Description	ICCV 2019 Workshop on Neural Architects
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Co-organised a ICCV'19 Workshop on Neural Architects.
Year(s) Of Engagement Activity	2019
URL	https://neuralarchitects.org


Description	ISUOG 2020 AI in Ultrasound lecture
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited lecture on AI in Ultrasound in the AI symposium of the International Society of Ultrasound in Obstetrics and Gynecology annual international conference.
Year(s) Of Engagement Activity	2020


Description	Iberian books workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Small workshop for digital humanities project
Year(s) Of Engagement Activity	2017


Description	Inspirational Engineer Talk - University of Cambridge
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Invited talk on my career and research given in at in invited lecture at the Department of Engineering, University of Cambridge.
Year(s) Of Engagement Activity	2019


Description	International Series of Online Research Software Events (SORSE) - AD
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Our manual annotation tool and our experience in developing this tool were shared with a wide range of audience.
Year(s) Of Engagement Activity	2020


Description	Introducing Traherne: an open source software tool for digital visual collation - Dutta
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	International audience learned about our open source software tools.
Year(s) Of Engagement Activity	2019


Description	Invited talk at CVPR 2017 workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Gave an invited talk as part of a workshop at CVPR 2017 (Hawaii) which aimed to give an overview to computer vision researchers or problems and state-of-the-art research in medical image analysis. A little follow-up but discussion was quite passive (we might have been competing with the weather on the last day of the meeting!)
Year(s) Of Engagement Activity	2017


Description	Invited talk at CVPR 2019 workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited talk given at CVPR 2019 workshop
Year(s) Of Engagement Activity	2019


Description	Invited talk at International Ultrasonics Symposium
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Invited key note talk in the inaugural session on machine learning in ultrasonics. The session was packed reflecting the interest in not only my group's work but the interest in machine learning as well.
Year(s) Of Engagement Activity	2017


Description	Jianbo Jiao interviewed in Computer Vision News
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Dr Jiao was interviewed regarding his MICCAI paper for Computer Vision News.
Year(s) Of Engagement Activity	2020
URL	https://www.rsipvision.com/MICCAI2020-Wednesday/6/


Description	Keynote speaker
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Over 230 Academics and Industry Experts attended MEIbioeng 16 to meet, share, debate and learn from their peers. The annual conference supported the discussion of newly developing Biomedical Engineering research areas alongside established work that contribute towards the common goal of improving human health and well-being via development of new healthcare technologies.
Year(s) Of Engagement Activity	2016
URL	http://www.ibme.ox.ac.uk/news-events/events/meibioeng-16


Description	London Rare Books School
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	LRBS is a summer school aimed at academics, librarians, art historians and others interested in rare books and special collections. The presentation was part of a week-long intensive course on historical printing surfaces that included hands-on printing, metal-casting and other skills as well as lectures and library work.
Year(s) Of Engagement Activity	2018
URL	https://www.ies.sas.ac.uk/study-training/study-weeks/london-rare-books-school/blocks-and-plates-towa...


Description	MISS Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Invited lecturer at international summer school.
Year(s) Of Engagement Activity	2016


Description	MIUA 2021 Conference - co-organiser
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	MIUA is a UK-based international conference for the communication of image processing and analysis research and its application to medical imaging and biomedicine. This was the 25th edition of the meeting which was held virtually. 40 papers were presented (27k downloads as of 09-03-2022). MIUA is the principal UK forum for communicating research progress within the community interested in image analysis applied to medicine and related biological science. The meeting is designed for the dissemination and discussion of research in medical image understanding and analysis, and aims to encourage the growth and raise the profile of this multi-disciplinary field by bringing together the various communities including among others:
Year(s) Of Engagement Activity	2021
URL	https://miua2021.com/


Description	MIUA2020 Conference - co-organiser
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	MIUA is a UK-based international conference for the communication of image processing and analysis research and its application to medical imaging and biomedicine. This was the 24th edition of the meeting which was held virtually. MIUA is the principal UK forum for communicating research progress within the community interested in image analysis applied to medicine and related biological science. The meeting is designed for the dissemination and discussion of research in medical image understanding and analysis, and aims to encourage the growth and raise the profile of this multi-disciplinary field by bringing together the various communities including among others:
Year(s) Of Engagement Activity	2020
URL	https://miua2020.com/


Description	McGill University Library
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	This was a private presentation to Special Collections librarians, library IT staff and a couple of academics interested in some collections of early printed material, and rare printers' woodblocks.
Year(s) Of Engagement Activity	2017


Description	Meeting at British Library
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Demonstrated how to use our software to British library curators and other staff.
Year(s) Of Engagement Activity	2018


Description	Microsoft Postgraduate Summer School
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Invited talk at Microsoft Summer school
Year(s) Of Engagement Activity	2016


Description	NIHR Point of Care Ultrasound workshop (Birmingham)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	I gave a talk at a clinical workshop looking at how point of care US might be introduced outside of traditional hospital uses. Meeting aimed to educate clinical practioners working in primary care to think about how US might be used in the future and encourage them to consider being part of trials of new innovations.
Year(s) Of Engagement Activity	2019


Description	NLS Digital Scholarship Seminar
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Internal training event in digital scholarship
Year(s) Of Engagement Activity	2020


Description	National Academies roundtable on researcher access to data
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The National Academies Data Reform Round Table was a by invitation meeting that discussed some of the current challenges that researchers face with getting access to data for research due to current data protection regulation. The Department for Digital, Culture, Media and Sport (DCMS) was consulting on reforming the UK's data protection regime which formed part of a larger effort to implement the government's National Data Strategy, and specifically Mission 2 of that strategy: 'supporting a pro-growth and trusted data regime'. This issue affects researchers working in computer vision and medical image analysis and this was part of the discussion. In terms of impact/outcome, the meeting output fed into a response that hopefully will have influence (how direct can not be measured/it is too early to determine but I selected this box in the next question for this reason).
Year(s) Of Engagement Activity	2021


Description	Newcastle University Humanities Research Institute
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Digital Humanities training event
Year(s) Of Engagement Activity	2020
URL	https://www.ncl.ac.uk/nuhri/events/item/computervisionandthedigitalhumanities.html


Description	Oxford Digital Humanities Summer School 2017
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	OXDHSS is the second-largest digital humanities summer school in the world and the largest in Europe. Now based in Engineering Science (through OeRC) It attracts c.250 students to Oxford to take one of several week-long courses, together with lectures and posters. We presented on the general 'Introduction to Diigital Humanities' course, which is the biggest and broadest, and intended to give an introduction to the field(s) for managers, librarians, IT staff or academics who are interested in knowing more or gettiing their institution involved.
Year(s) Of Engagement Activity	2017
URL	http://digital.humanities.ox.ac.uk/dhoxss/2017/


Description	Oxford Digital Humanities Summer School 2018
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The Summer School offers training to anyone with an interest in using digital technologies in the Humanities, including academics at all career stages, students, project managers, and people who work in IT, libraries, and cultural heritage. Delegates select one week-long workshop, supplementing their training with expert guest lectures and a busy social programme.
Year(s) Of Engagement Activity	2018
URL	https://digital.humanities.ox.ac.uk/dhoxss


Description	Oxford Humanities Division Poster Showcase
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	This was a showcase of posters run by the Training Officer of the University's Humanities Division, aimed particularly at ECRs.
Year(s) Of Engagement Activity	2018


Description	Oxford Humanities Division Poster Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	This was a showcase of posters run by the Training Officer of the University's Humanities Division, aimed particularly at ECRs.
Year(s) Of Engagement Activity	2017


Description	Oxford Humanities Research Fair
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	Presented Seebibyte software.
Year(s) Of Engagement Activity	2020


Description	Oxford Humanities Research Fair
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	It was a Bodleian event on resources for humanities postgrads.
Year(s) Of Engagement Activity	2020
URL	https://libguides.bodleian.ox.ac.uk/humanities/HumanitiesResearchFair


Description	Oxford Traherne Project Meeting
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Professional Practitioners
Results and Impact	The meeting was attended by around 20 book historians and researchers. The editorial team of the Oxford Traherne project were impressed with our digital collator software and said that it has the potential to revolutionise the field of collation and scholarly editing in general.
Year(s) Of Engagement Activity	2018


Description	PRAIRIE AI summer school, France
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The PRAIRIE AI summer school comprises lectures and practical sessions conducted by renowned experts in different areas of artificial intelligence. Andrew Zisserman taught on "Self-supervised Learning"
Year(s) Of Engagement Activity	2018
URL	https://project.inria.fr/paiss/


Description	PRAIRIE Summer School talk (Paris)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Gave a talk at the postgraduate summer school attended largely by computer science students interested in AI and not with a healthcare focus with questions afterwards.
Year(s) Of Engagement Activity	2019


Description	Plantin-Moretus Museum, Antwerp
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Presented Seebibyte software.
Year(s) Of Engagement Activity	2020


Description	Plantin-Moretus Museum, Antwerp
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Third sector organisations
Results and Impact	Briefing on VGG's work in printing history.
Year(s) Of Engagement Activity	2020
URL	https://www.museumplantinmoretus.be/en


Description	Presentation at Oxfordshire Creative Industries Showcase event
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Public/other audiences
Results and Impact	Demonstrated software developed by the Seebibyte Project at Oxfordshire Creative Industries Showcase event. Visitors were amazed and excited to see what our software tools were capable of doing. Some even suggested novel applications for our tools that we had not thought before.
Year(s) Of Engagement Activity	2019


Description	Presentation at VGG Group Meeting
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	New students joining the VGG became aware about our manual annotation tools from the presentation.
Year(s) Of Engagement Activity	2020


Description	Presentation at the British Library Labs Symposium 2019
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	We presented on Automatic Detection and Identification of people in British Library archive videos.
Year(s) Of Engagement Activity	2019


Description	Presentation at the Digital Humanities Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Presented Seebibyte software at Digital Humanities Summer School 2019. Many in the audience are now using our software tools for their research.
Year(s) Of Engagement Activity	2019


Description	Presentation at the EPSRC Swindon Office
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Policymakers/politicians
Results and Impact	Presented Image Matching and Face Recognition tools to EPSRC project officers and other employees.Employees from EPSRC were very enthusiastic to learn about our software tools and many of them asked questions that reflected their interests in using these tools for their own work.
Year(s) Of Engagement Activity	2019


Description	Presentation to BBC Visitors
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Industry/Business
Results and Impact	The team from BBC learned about our image and video annotation tool VIA and discussed the potential ways to use this tool in their research.
Year(s) Of Engagement Activity	2020


Description	Presentation to BBC visitors
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Media (as a channel to the public)
Results and Impact	We presented to the BBC team that by using our visual search tools, we analyzed a subset of 262007 images from this collection and found a large number of exact and near duplicate images. We informed the BBC team that these results were already being used by the Ashmolean team to improve the management of their digital assets.
Year(s) Of Engagement Activity	2020


Description	Presentation to Digital Viewing Visitors - AD
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Industry/Business
Results and Impact	The team from Digital Viewing learned about our VIA Image and Video Annotator and made plans for using the tool for related activities.
Year(s) Of Engagement Activity	2020


Description	Presentation to IBME Oxford - AD
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	Our Manual Annotation of Images and Video using VIA tools were introduced to the medical imaging research community and aroused interest in possible utilization in their research.
Year(s) Of Engagement Activity	2020


Description	Printing Revolution and Society 1450 - 1500 -- Fifty Years that Changed Europe
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	PRINTING REVOLUTION AND SOCIETY 1450-1500 - Fifty Years that Changed Europe was an international conference that was attended by around 100 academics, researchers, librarians and historians. New researchers are now preparing to contribute images and annotations to 15cILLUSTRATION website.
Year(s) Of Engagement Activity	2018
URL	http://15cbooktrade.ox.ac.uk/printing-revolution-and-society-conference-video-recordings/


Description	Queen Elizabeth Prize schools event at the Science Museum
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	I was a panel member, along with the awardees of the 2017 Queen Elizabeth Prize (Eric Fossum, Michael Tompsett and Nobukazu Teranishi) discussing their inventions related to digital sensors/imaging and how the digital imaging world has changed with a schools audience. The event was held at the Science Museum. My invitation stemmed from involvement in the nominations panel for the QEP as well as research interest in digital image analysis.
Year(s) Of Engagement Activity	2017


Description	RSA DDL
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Large international conference
Year(s) Of Engagement Activity	2020
URL	https://www.rsa.org/page/DayDigitalLearning2020


Description	Rank Prize Symposium on Challenges to Achieving Capacity in Nonlinear Optical Networks
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Co-organised a Rank Prize Symposium on Challenges to Achieving Capacity in Nonlinear Optical Networks.
Year(s) Of Engagement Activity	2018
URL	http://www.rankprize.org/index.php/symposia/optoelectronics


Description	Royal Society Digital Archive workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	The Royal Society has recently finished a project to digitise part of their archive and journal backlist: this event, organised by Louisiane Ferlier, was aimed at encouraging researchers to use it, and to archivists at the Society to gain ideas. The event was attended by historians of science and archivists.
Year(s) Of Engagement Activity	2018
URL	https://blogs.royalsociety.org/publishing/digitising-the-royal-society-journals/


Description	Royal Society Digital Archive workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	This was a workshop organised by MPLS division of Oxford University to Lloyds Foundation on how to digitise their archive.
Year(s) Of Engagement Activity	2018


Description	Royal Society-Chinese Academy of Sciences AI policy dialogue Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Speaker at International by-invitation Royal Society and Chinese Academy of Sciences AI policy dialogue workshop held in september 2020. A short write up on the meeting is being prepared and will be available in 2021.
Year(s) Of Engagement Activity	2020


Description	Royal Society/Government Chief Scientific Advisors meeting discussing PETs
Form Of Engagement Activity	A formal working group, expert panel or dialogue
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Policymakers/politicians
Results and Impact	Dinner discussion about PETs and potential future short terms uses of them across government departments. I presented an overview of the policy report that I chaired.
Year(s) Of Engagement Activity	2019


Description	SEAHA 2017
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	SEAHA is a doctoral training partnership in heritage science, the members of which are the University of Oxford, Brighton and UCL. Their annual conference is their main plenary gathering, attended by 100+ members of the consortium (students, their supervisors, researchers and professional staff) with exhibits from companies and organisations. The background of attendees ranges from art conservation to material science, or a mixture.
Year(s) Of Engagement Activity	2017
URL	http://www.seaha-cdt.ac.uk/activities/events/seaha17/


Description	Samsung Satellitte Symposium, European Congress in Radiology
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Talk was part of a lunch symposium presenting latest research in AI applied to radiology
Year(s) Of Engagement Activity	2017


Description	School talk
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Talk at Headington School as the Key Note speaker for their Year of Science.
Year(s) Of Engagement Activity	2017


Description	Seebibyte Show and Tell Roadshow
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	The event showcased the software developed by the Seebbyte team with the aim to attract potential collaborators and the talks resulted in discussion for multiple new TAPs.
Year(s) Of Engagement Activity	2020


Description	Seebibyte Show and Tell at University of Leeds
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	Demonstrated Seebibyte software to the audience.
Year(s) Of Engagement Activity	2020


Description	Seebibyte Visual Tracker, VIA Annotation Software for Images and Video - Dutta
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	Presented and demonstrated Seebibyte Visual Tracker, VIA Annotation Software for Images and Video. Audience were introduced to our software tools and we found new collaborators for TAP
Year(s) Of Engagement Activity	2019


Description	Seminar: Reflections on the Digital Turn in the Humanities and the Sciences
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Research seminar in digital tools for art history
Year(s) Of Engagement Activity	2020
URL	https://www.biblhertz.it/3069990/seminar-series-reflections-on-the-digital-turn-in-the-humanities-an...


Description	Seminars series organized by Venice Centre for Digital and Public Humanities (VeDPH), Department of Humanities, Ca' Foscari University Venice - AD
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Introduced visual search engine software to an international audience in Humanities.
Year(s) Of Engagement Activity	2020


Description	Show and Tell Event - Computer Vision Software - 14 June 2016 (Oxford)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	A main aim of the Seebibyte Project is to transfer the latest computer vision methods into other disciplines and industry. We want the software developed in this project to be taken up and used widely by people working in industry and other academic disciplines, and are organizing regular Show and Tell events to demonstrate new software developed by project researchers. A main outcome from these events will be new inter-disciplinary collaborations. As a first step, Transfer and Application Projects (TAPs) are developed with new collaborators. This first Show and Tell event was restricted to participants from the University of Oxford only, in particular researchers from the Department of Engineering Science, the Department of Earth Sciences and the Department of Materials. Future events will also target external participants, including from industry. The June 14 event focused on four topics: 1) Counting; 2) Landmark Detection (KeyPoint Detection); 3) Segmentation (Region Labelling); and 4) Text Spotting. Further information for each of the topics - including the event presentations and new software demos - is available on the event webpage (www.seebibyte.org/June14.html). The event received a positive feedback from participants and has resulted in several new TAPs being completed. It is anticipated that some of these will lead to new collaborations.
Year(s) Of Engagement Activity	2016
URL	http://www.seebibyte.org/June14.html


Description	Show and Tell Event - Computer Vision Software - 14 June 2018
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	The purpose of the event is to demonstrate software for recognising faces and tracking objects in videos in potentially large datasets.
Year(s) Of Engagement Activity	2018
URL	http://seebibyte.org/


Description	Show and Tell Event - Computer Vision Software - 15 June 2017
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	The purpose of the Show and Tell is to demonstrate software for searching, annotating and categorizing images in (potentially) large datasets. Software is open source and will be available following the meeting.
Year(s) Of Engagement Activity	2017
URL	http://www.seebibyte.org


Description	Sight and Sound Workshop at CVPR 2018
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2018. This is the description of the workshop: In recent years, there have been many advances in learning from visual and auditory data. While traditionally these modalities have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is intuitively appealing, and this workshop will cover recent advances in this direction. But it will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Year(s) Of Engagement Activity	2018
URL	http://sightsound.org/2018/


Description	Sight and Sound Workshop at CVPR 2019
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2019. This is the description of the workshop: In recent years, there have been many advances in learning from visual and auditory data. While traditionally these modalities have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is intuitively appealing, and this workshop will cover recent advances in this direction. But it will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Year(s) Of Engagement Activity	2019
URL	http://sightsound.org/


Description	Sight and Sound Workshop at the IEEE Conference on Computer Vision and Pattern Recognition
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2020. This is the description of the workshop: In recent years, there have been many advances in learning from visual and auditory data. While traditionally these modalities have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is intuitively appealing, and this workshop will cover recent advances in this direction. But it will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing.
Year(s) Of Engagement Activity	2020
URL	http://sightsound.org/


Description	Speaker - International Women in Engineering Day 2017
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Schools
Results and Impact	Secondary school girls from a number of local schools visited the department to see different areas of engineering and do some simple activities related to engineering. I gave the short talk at tea on some of emerging areas of engineering (wacky engineering) and talked a little about my own research and my field. Feedback from schools was positive for the whole event.
Year(s) Of Engagement Activity	2017


Description	Special Interest Ultrasound Group Annual meeting (Oslo)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	National ultrasound specialist interest group annual meeting in Oslo with participants from medical and NDE backgrounds, primarily industry focussed. I was one of two overseas guest speaks at the two day event.
Year(s) Of Engagement Activity	2019


Description	St Hughs College Donors Dinner
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Supporters
Results and Impact	Short talk as part of college donors dinner event which sparked questions and discussions afterwards.
Year(s) Of Engagement Activity	2020


Description	Stockholm DH Now workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Other audiences
Results and Impact	Digital Humanities training event
Year(s) Of Engagement Activity	2020
URL	https://su.powerinit.com/Data/Event/EventTemplates/2602/?EventId=879


Description	Talk (AI@Oxford)
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Speaker in a healthcare session of AI@Oxford conference and discussions with some participants afterwards
Year(s) Of Engagement Activity	2019


Description	Talk (AI@Oxford) on Audio-Visual AI
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Industry/Business
Results and Impact	Speaker at two day AI@Oxford Conference
Year(s) Of Engagement Activity	2019
URL	https://innovation.ox.ac.uk/innovation-news/events/aioxford-conference/


Description	Teaching in Summer School ICVSS
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This International Computer Vision Summer School aims to provide both an objective and clear overview and an in-depth analysis of the state-of-the-art research in Computer Vision and Machine Learning. The participants benefited from direct interaction and discussions with world leaders in Computer Vision.
Year(s) Of Engagement Activity	2015
URL	http://iplab.dmi.unict.it/icvss2015/


Description	Teaching in Summer School MISS
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The Medical Imaging Summer School is the largest summer school in its field. Around 200 students attended the school and received training in the science and technology of medical imaging. Students expressed interest in future research in the area.
Year(s) Of Engagement Activity	2016
URL	http://iplab.dmi.unict.it/miss/index.html


Description	Teaching in Summer School iV&L
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The iV&L Training School aims at bringing together Vision and Language researchers and to provide the opportunity for cross-disciplinary teaching and learning. Over 80 students attended the summer school and received training in deep learning across two disciplines, Computer Vision and Natural Language Processing. Students expressed interest in future research in the area.
Year(s) Of Engagement Activity	2016
URL	http://ivl-net.eu/ivl-net-training-school-2016/


Description	The 2017 IEEE-EURASIP Summer School on Signal Processing (S3P-2017)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	The 2017 IEEE-EURASIP Summer School on Signal Processing (S3P-2017), is the 5th edition of a successful series, organized by the IEEE SPS Italy Chapter and the National Telecommunications and Information Technologies Group - GTTI, with the sponsorship of IEEE (S3P program) and EURASIP (Seasonal School Co-Sponsorship agreement). S3P-2017 represents a stimulating environment where top international scientists in signal processing and related disciplines share their ideas on fundamental and ground-breaking methodologies in the field. It provides PhD students and researcher with a unique networking opportunity and a possibility of interaction with leading scientists. The theme of this 5th edition is "Signal Processing meets Deep Learning". Deep machine learning is changing the rules in the signal and multimedia processing field. On the other hand, signal processing methods and tools are fundamental for machine learning. Time for these worlds to meet.
Year(s) Of Engagement Activity	2017
URL	http://www.grip.unina.it/s3p2017/


Description	The End-of-End-to-End: A Video Understanding Pentathlon (a workshop at the IEEE Conference on Computer Vision and Pattern Recognition)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the "End-of-End-to-End: A Video Understanding Pentathlon" at CVPR 2020. This is the description of the workshop: Convolutional neural networks have yielded unprecedented progress on a wide range of image-centric benchmarks, driven through a combination of well-annotated datasets and end-to-end training. However, naively extending this approach from images to higher-level video understanding tasks quickly becomes prohibitive with respect to the computation and data annotation required to jointly train multi-modal high-capacity models. An attractive alternative is to repurpose collections of existing pretrained models as "experts", offering representations which have been specialised for semantically relevant machine perception tasks. In addition to efficacy, this approach offers a second key advantage---it encourages researchers without access to industrial computing clusters to contribute towards questions of fundamental importance to video understanding: How should temporal information be used to maximum effect? How best to exploit complementary and redundant signals across different modalities? How can models be designed that function robustly across different video domains? To stimulate research into these questions, we are hosting a challenge that focuses on learning from videos and language with experts: making available a diverse collection of carefully curated visual and audio pre-extracted features across a set of five influential video datasets as part of a "pentathlon" of video understanding.
Year(s) Of Engagement Activity	2020
URL	https://www.robots.ox.ac.uk/~vgg/challenges/video-pentathlon/


Description	Traherne Digital Collator - Dutta
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Other audiences
Results and Impact	We reported our progress with the Traherne Digital Collator to the members of the Trahene Project who are now using this software tool for their research.
Year(s) Of Engagement Activity	2019


Description	Turing Institute Computer Vision and History seminar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Other audiences
Results and Impact	Seminar for a putative special interest group
Year(s) Of Engagement Activity	2020
URL	https://computervision4digitalheritage.github.io/sig/


Description	UCL Digital Humanities Seminar
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Department of Information Studies research seminar
Year(s) Of Engagement Activity	2017


Description	University of Reading Department of Typography and Graphic Communication
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	This was One-day workshop to develop a large AHRC funding application on historical printing to be lead by Prof. Rob Banham, based in the world's leading department of typography. The event was attended by typographers; designers; design historians; University research support staff.
Year(s) Of Engagement Activity	2018
URL	http://www.reading.ac.uk/typography/typ-homepage.aspx


Description	VGG Web Search Engines
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	This was a presentation to Continental AG delegates who are interested in AI/CV research. Plans made for future related activities.
Year(s) Of Engagement Activity	2018


Description	VeDPH Summer School
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Digital Humanities paper and workshop
Year(s) Of Engagement Activity	2020
URL	https://vedph.github.io/summercamp/


Description	Video: helping with hearing
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Media (as a channel to the public)
Results and Impact	This is an online video as part of the OxfordAI outreach. See https://www.research.ox.ac.uk/Article/2018-11-08-video-helping-with-hearing The description is: "Can AI modelling assist people with hearing difficulties? Discover how #OxfordAI could help by isolating voices in noisy environments. We talk to DPhil student Triantafyllos Afouras from the Visual Geometry Group in Oxford's Department of Engineering Science."
Year(s) Of Engagement Activity	2018
URL	https://www.research.ox.ac.uk/Article/2018-11-08-video-helping-with-hearing


Description	Visual Search of BBC News at the event Artificial Intelligence @ Oxford - A One-Day Expo
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Policymakers/politicians
Results and Impact	The event was attended by about 100 people internationally mainly from academia, industry, commerce and government who are interested in AI. Video: https://www.youtube.com/watch?v=9ZKGL0QDLpk
Year(s) Of Engagement Activity	2018
URL	https://ori.ox.ac.uk/artificial-intelligence-oxford-a-one-day-expo-27th-march-2018/


Description	VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'.
Year(s) Of Engagement Activity	2020
URL	https://www.robots.ox.ac.uk/~vgg/data/voxceleb/interspeech2020.html


Description	VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments.
Year(s) Of Engagement Activity	2019
URL	http://www.robots.ox.ac.uk/~vgg/data/voxceleb/interspeech2019.html


Description	Workshop on language and vision at CVPR 2019
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	The interaction between language and vision, despite seeing traction as of late, is still largely unexplored. This is a particularly relevant topic to the vision community because humans routinely perform tasks which involve both modalities. We do so largely without even noticing. Every time you ask for an object, ask someone to imagine a scene, or describe what you're seeing, you're performing a task which bridges a linguistic and a visual representation. The importance of vision-language interaction can also be seen by the numerous approaches that often cross domains, such as the popularity of image grammars. More concretely, we've recently seen a renewed interest in one-shot learning for object and event models. Humans go further than this using our linguistic abilities; we perform zero-shot learning without seeing a single example. You can recognize a picture of a zebra after hearing the description "horse-like animal with black and white stripes" without ever having seen one. Furthermore, integrating language with vision brings with it the possibility of expanding the horizons and tasks of the vision community. We have seen significant growth in image and video-to-text tasks but many other potential applications of such integration - answering questions, dialog systems, and grounded language acquisition - remain largely unexplored. Going beyond such novel tasks, language can make a deeper contribution to vision: it provides a prism through which to understand the world. A major difference between human and machine vision is that humans form a coherent and global understanding of a scene. This process is facilitated by our ability to affect our perception with high-level knowledge which provides resilience in the face of errors from low-level perception. It also provides a framework through which one can learn about the world: language can be used to describe many phenomena succinctly thereby helping filter out irrelevant details. Topics covered (non-exhaustive): language as a mechanism to structure and reason about visual perception, language as a learning bias to aid vision in both machines and humans, novel tasks which combine language and vision, dialogue as means of sharing knowledge about visual perception, stories as means of abstraction, transfer learning across language and vision, understanding the relationship between language and vision in humans, reasoning visually about language problems, visual captioning, dialogue, and question-answering, visual synthesis from language, sequence learning towards bridging vision and language, joint video and language alignment and parsing, and video sentiment analysis.
Year(s) Of Engagement Activity	2019
URL	http://languageandvision.com/


Description	Workshops at the Conference on Computer Vision and Pattern Recognition (CVPR) 2019
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	Workshop chair for the IEEE Conference on Computer Vision and Pattern Recognition 2018. We selected and coordinate 90 international workshops.
Year(s) Of Engagement Activity	2018,2019
URL	http://cvpr2019.thecvf.com/program/workshops

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications