Visual AI: An Open World Interpretable Visual Transformer
Lead Research Organisation:
University of Oxford
Department Name: Engineering Science
Abstract
With the advent of deep learning and the availability of big data, it is now possible to train machine learning algorithms for a multitude of visual tasks, such as tagging personal image collections in the cloud, recognizing faces, and 3D shape scanning with phones. However, each of these tasks currently requires training a neural network on a very large image dataset specifically collected and labelled for that task. The resulting networks are good experts for the target task, but they only understand the 'closed world' experienced during training and can 'say' nothing useful about other content, nor can they be applied to other tasks without retraining, nor do they have an ability to explain their decisions or to recognise their limitations. Furthermore, current visual algorithms are usually 'single modal', they 'close their ears' to the other modalities (audio, text) that may be readily available.
The core objective of the Programme is to develop the next generation of audio-visual algorithms that does not have these limitations. We will carry out fundamental research to develop a Visual Transformer capable of visual analysis with the flexibility and interpretability of a human visual system, and aided by the other 'senses' - audio and text. It will be able to continually learn from raw data streams without requiring the traditional 'strong supervision' of a new dataset for each new task, and deliver and distill semantic and geometric information over a multitude of data types (for example, videos with audio, very large scale image and video datasets, and medical images with text records).
The Visual Transformer will be a key component of next generation AI, able to address multiple downstream audio-visual tasks, significantly superseding the current limitations of computer vision systems, and enabling new and far reaching applications.
A second objective addresses transfer and translation. We seek impact in a variety of other academic disciplines and industry which today greatly under-utilise the power of the latest computer vision ideas. We will target these disciplines to enable them to leapfrog the divide between what they use (or do not use) today which is dominated by manual review and highly interactive analysis frame-by-frame, to a new era where automated visual analytics of very large datasets becomes the norm. In short, our goal is to ensure that the newly developed methods are used by industry and academic researchers in other areas, and turned into products for societal and economic benefit. To this end open source software, datasets, and demonstrators will be disseminated on the project website.
The ubiquity of digital images and videos means that every UK citizen may potentially benefit from the Programme research in different ways. One example is smart audio-visual glasses, that can pay attention to a person talking by using their lip movements to mask out other ambient sounds. A second is an app that can answer visual questions (or retrieve matches) for text-queries over large scale audio-visual collections, such as a person's entire personal videos. A third is AI-guided medical screening, that can aid a minimally trained healthcare professional to perform medical scans.
The core objective of the Programme is to develop the next generation of audio-visual algorithms that does not have these limitations. We will carry out fundamental research to develop a Visual Transformer capable of visual analysis with the flexibility and interpretability of a human visual system, and aided by the other 'senses' - audio and text. It will be able to continually learn from raw data streams without requiring the traditional 'strong supervision' of a new dataset for each new task, and deliver and distill semantic and geometric information over a multitude of data types (for example, videos with audio, very large scale image and video datasets, and medical images with text records).
The Visual Transformer will be a key component of next generation AI, able to address multiple downstream audio-visual tasks, significantly superseding the current limitations of computer vision systems, and enabling new and far reaching applications.
A second objective addresses transfer and translation. We seek impact in a variety of other academic disciplines and industry which today greatly under-utilise the power of the latest computer vision ideas. We will target these disciplines to enable them to leapfrog the divide between what they use (or do not use) today which is dominated by manual review and highly interactive analysis frame-by-frame, to a new era where automated visual analytics of very large datasets becomes the norm. In short, our goal is to ensure that the newly developed methods are used by industry and academic researchers in other areas, and turned into products for societal and economic benefit. To this end open source software, datasets, and demonstrators will be disseminated on the project website.
The ubiquity of digital images and videos means that every UK citizen may potentially benefit from the Programme research in different ways. One example is smart audio-visual glasses, that can pay attention to a person talking by using their lip movements to mask out other ambient sounds. A second is an app that can answer visual questions (or retrieve matches) for text-queries over large scale audio-visual collections, such as a person's entire personal videos. A third is AI-guided medical screening, that can aid a minimally trained healthcare professional to perform medical scans.
Planned Impact
The proposed programme encompasses new methodology and applied research in computer vision and other modalities (audio, text) that will enable analysis and search of image and video content while learning new things, with human-like flexibility and interpretability. These capabilities will encourage end user take up of computer vision technologies and commercial interest in embedding these technologies in products.
The Programme will have Economic and Societal impact by
1. Enabling UK industry to leverage AI in their activities with a key strategic advantage.
2. Developing new and improved computer vision technologies that will require substantially less training data to solve problems and is thus suitable for commercialisation by a wide range of companies.
3. Enhancing the visual and audio capabilities and knowledge base of UK industries, including small ones.
4. Enhancing quality of life by improving, for instance, healthcare capabilities, surveillance, environmental monitoring, and the means of accessing and enjoying personal digital media.
5. Reducing the cost and risk of collecting manual annotations for deploying AI technology, especially for sensitive data such as medical records.
6. Collaborating directly with companies and organizations that we have already identified, and will work with over the course of the Programme.
7. Training the next generation of computer vision researchers who will be equipped to support the imaging needs of science, technology and wider society for the future.
Impact on Knowledge includes
1. Realisation of new approaches to essential computer vision technology, and the dissemination of research findings through publications, conference presentations, summer school teaching, and the distribution of open source software and image databases.
2. Sharing knowledge with industrial collaborators via Transfer and Application Projects (TAPs) and other activities leading to adoption of advanced computer vision methods across many disciplines of science, engineering and medicine that currently do not use them.
3. Communication of advances to a public audience through website articles, Show and Tell events, social and broadcast media, and other co-ordinated public understanding activities
The Programme will have Economic and Societal impact by
1. Enabling UK industry to leverage AI in their activities with a key strategic advantage.
2. Developing new and improved computer vision technologies that will require substantially less training data to solve problems and is thus suitable for commercialisation by a wide range of companies.
3. Enhancing the visual and audio capabilities and knowledge base of UK industries, including small ones.
4. Enhancing quality of life by improving, for instance, healthcare capabilities, surveillance, environmental monitoring, and the means of accessing and enjoying personal digital media.
5. Reducing the cost and risk of collecting manual annotations for deploying AI technology, especially for sensitive data such as medical records.
6. Collaborating directly with companies and organizations that we have already identified, and will work with over the course of the Programme.
7. Training the next generation of computer vision researchers who will be equipped to support the imaging needs of science, technology and wider society for the future.
Impact on Knowledge includes
1. Realisation of new approaches to essential computer vision technology, and the dissemination of research findings through publications, conference presentations, summer school teaching, and the distribution of open source software and image databases.
2. Sharing knowledge with industrial collaborators via Transfer and Application Projects (TAPs) and other activities leading to adoption of advanced computer vision methods across many disciplines of science, engineering and medicine that currently do not use them.
3. Communication of advances to a public audience through website articles, Show and Tell events, social and broadcast media, and other co-ordinated public understanding activities
Organisations
- University of Oxford (Lead Research Organisation)
- Leiden University (Collaboration)
- University of Copenhagen (Collaboration)
- National Library of the Czech Republic (Collaboration)
- UNIVERSITY OF OXFORD (Collaboration)
- National Library of Scotland (Collaboration)
- Netherlands Institute for Art History (Collaboration)
- Ca' Foscari University of Venice (Collaboration)
- National Consortium of Intelligent Medical Imaging (Collaboration)
- Plexalis Ltd (Project Partner)
- Intelligent Ultrasound (Project Partner)
- British Broadcasting Corporation (United Kingdom) (Project Partner)
- Samsung (South Korea) (Project Partner)
- Continental (Germany) (Project Partner)
- Toshiba (Japan) (Project Partner)
- Nielson (Project Partner)
Publications
Zhao B
(2023)
Dataset Condensation with Distribution Matching
Zhang C
(2021)
Temporal Query Networks for Fine-grained Video Understanding
Zhang C
(2021)
Temporal Query Networks for Fine-grained Video Understanding
Zhang B
(2022)
Affinity Attention Graph Neural Network for Weakly Supervised Semantic Segmentation.
in IEEE transactions on pattern analysis and machine intelligence
Yeung PH
(2021)
Learning to map 2D ultrasound images into 3D space with minimal human annotation.
in Medical image analysis
Description | 1-on-1 Engineers and Policy Fellowship discussion |
Geographic Reach | National |
Policy Influence Type | Influenced training of practitioners or researchers |
URL | https://raeng.org.uk/policyfellowships |
Description | Chair of Royal Society Data Science Policy group leading to publication of a report "Science in the age of AI" |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Participation in a guidance/advisory committee |
Description | Royal Society National Academies Data Reform Round Table Consultation |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Description | Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group, Chair |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Participation in a guidance/advisory committee |
Impact | Quoting the aims from the report "We have three objectives for this report. Our first objective is that the use cases inspire those collecting and using data to consider the potential benefits of PETs for their own work, or in new collaborations with others. Second, for the evidence we present on barriers to adoption and standardisation to help inform policy decisions to encourage a marketplace for PETs. Finally, through our recommendations, we hope the UK will maximise the opportunity to be a global leader in PETs - both for data security and collaborative analysis - alongside emerging, coordinated efforts to implement PETs in other countries." |
URL | https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/From-Privacy-to-Part... |
Description | Royal Society Privacy Enhancing Technologies Working Group - policy report published (Chair) |
Geographic Reach | National |
Policy Influence Type | Participation in a guidance/advisory committee |
Impact | The report has contributed to wider discussion of data sharing between government departments and a number of the recommendations have been followed up. It is well cited. A follow-on project is underway with the Alan Turing Institute which will report in 2022. The important message was to show that PETs are maturing as a technology and can be considered enablers to provided trusted sharing of data and to move the conversation away from security and accepting zero risk in sharing data. The work is relevant to not only may research area (health data science) but many other sectors which are data-driven. |
URL | https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies/privacy-enhancing-te... |
Description | Biomedical Research Centre |
Amount | £89,000,000 (GBP) |
Organisation | National Institute for Health Research |
Sector | Public |
Country | United Kingdom |
Start | 12/2022 |
End | 04/2027 |
Description | EPX0401861 Turing AI World Leading Researcher Fellowship Studentship |
Amount | £110,541 (GBP) |
Funding ID | EP/Y530517/1 |
Organisation | United Kingdom Research and Innovation |
Sector | Public |
Country | United Kingdom |
Start | 09/2023 |
End | 09/2028 |
Description | Envisioning Dante c.1472- c.1630 |
Amount | £805,620 (GBP) |
Funding ID | AH/W005220/1 |
Organisation | Arts & Humanities Research Council (AHRC) |
Sector | Public |
Country | United Kingdom |
Start | 08/2022 |
End | 09/2025 |
Description | Royal Society Research Professorship |
Amount | £1,400,000 (GBP) |
Funding ID | RSRP\R\241003 |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 03/2024 |
End | 03/2029 |
Description | Royal Society Research Professorship Enhanced research Expenses |
Amount | £100,000 (GBP) |
Funding ID | RF\ERE\210331 |
Organisation | The Royal Society |
Sector | Charity/Non Profit |
Country | United Kingdom |
Start | 09/2021 |
End | 03/2024 |
Description | Studentship |
Amount | £154,725 (GBP) |
Organisation | |
Sector | Private |
Country | United States |
Start | 09/2021 |
End | 09/2025 |
Description | Toshiba 2021 |
Amount | $200,000 (USD) |
Organisation | Toshiba |
Sector | Private |
Country | Japan |
Start | 06/2021 |
End | 03/2023 |
Description | Toshiba 2023 |
Amount | £200,000 (GBP) |
Organisation | Toshiba |
Sector | Private |
Country | Japan |
Start | 04/2023 |
End | 04/2025 |
Description | Turing AI Fellowship: Ultra Sound Multi-Modal Video-based Human-Machine Collaboration |
Amount | £4,248,942 (GBP) |
Funding ID | EP/X040186/1 |
Organisation | United Kingdom Research and Innovation |
Sector | Public |
Country | United Kingdom |
Start | 09/2023 |
End | 09/2028 |
Title | CAIFE dataset and annotations |
Description | The CAIFE dataset is a large fetal echocardiography dataset consisting of freehand video and sweep video, collated from multiple hospitals. A subset of this dataset has been manually annotated by cardiac view, and a large subset automatically labelled. The generation of the dataset was funded by the COCHE project but the dataset is used by other video analysis projects as well. Those projects have contributed annotations to enrich the resource. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | No |
Impact | On-going |
Title | Coreferenced Image Narratives Dataset |
Description | Our Coreferenced Image Narratives (CIN) dataset contains 1880 images from the Localized Narratives dataset [1] that come with long-form text descriptions (narrations) and mouse traces. These images are originally a subset of the test and validation set of the Flickr30k dataset [2] . We annotated this subset with coreference chains and bounding boxes in the image that are linked with the textual coreference chains, and use them only for validation and testing. Note that we also include singletons (i.e., coreference chains of length one). [1] Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari; Connecting Vision and Language with Localized Narratives ; ECCV 2020. [2] Bryan A. Plummer, Liwei Wang, Chris M. Cervantes, Juan C. Caicedo, Julia Hockenmaier, Svetlana Lazebnik; Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models ; IJCV 2017. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | The dataset allows extending and evaluating the abilities of the recent powerful large vision and language models. As it has been very recently published, there is only one publication from our group published in the top tier, NLP conference, EMNLP 2023 under the title "Semi-supervised multimodal coreference resolution in image narrations". |
URL | https://github.com/VICO-UoE/CIN |
Title | EPIC Fields: Marrying 3D Geometry and Video Understanding |
Description | We introduce EPIC Fields, an augmentation of EPIC-KITCHENS with 3D camera information. Similar to other datasets for neural rendering, EPIC Fields removes the complex and expensive step of reconstructing cameras using photogrammetry, and allows researchers to focus on more interesting modeling problems. We illustrate the challenge of photogrammetry in egocentric videos and propose several technical innovations to address them. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Upcoming |
URL | https://epic-kitchens.github.io/epic-fields/ |
Title | EPIC-KITCHENS VISOR |
Description | We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for segmenting hands and active objects in egocentric video. VISOR annotates videos from EPIC-KITCHENS, which comes with a new set of challenges not encountered in current video segmentation datasets. Specifically, we need to ensure both short- and long-term consistency of pixel-level annotations as objects undergo transformative interactions, e.g. an onion is peeled, diced and cooked - where we aim to obtain accurate pixel-level annotations of the peel, onion pieces, chopping board, knife, pan, as well as the acting hands. VISOR introduces an annotation pipeline, AI-powered in parts, for scalability and quality. Data published under the Creative Commons Attribution-NonCommerial 4.0 International License. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | The dataset can be used for audio event detection and the baseline code will be made publicly available. |
URL | https://data.bris.ac.uk/data/dataset/2v6cgv1x04ol22qp9rm9x2j6a7/ |
Title | Epic-Sounds: A Large-scale Dataset of Actions That Sound |
Description | We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos. We propose an annotation pipeline where annotators temporally label distinguishable audio segments and describe the action that could have caused this sound. We identify actions that can be discriminated purely from audio, through grouping these free-form descriptions of audio into classes. For actions that involve objects colliding, we collect human annotations of the materials of these objects (e.g. a glass object being placed on a wooden surface), which we verify from visual labels, discarding ambiguities. Overall, EPIC-SOUNDS includes 78.4k categorised segments of audible events and actions, distributed across 44 classes as well as 39.2k non-categorised segments. We train and evaluate two state-of-the-art audio recognition models on our dataset, highlighting the importance of audio-only labels and the limitations of current models to recognise actions that sound. |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | A standard benchmark for testing audio-visual models. Already being cited in major publications |
URL | https://epic-kitchens.github.io/epic-sounds/ |
Title | Image Change dataset |
Description | Propose a scalable methodology for obtaining a large-scale change detection training dataset by leveraging existing object segmentation benchmarks. Introduce a co-attention based novel architecture that is able to implicitly determine correspondences between an image pair and find changes in the form of bounding box predictions. Contribute four evaluation datasets that cover a variety of domains and transformations, including synthetic image changes, real surveillance images of a 3D scene, and synthetic 3D scenes with camera motion. Evaluate our model on these four datasets and demonstrate zero-shot and beyond training transformation generalization. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023. Future impact to be determined. |
URL | https://arxiv.org/pdf/2209.14341.pdf |
Title | Localizing Visual Sounds the Hard Way |
Description | The objective of this work is to localize sound sources that are visible in a video without using manual annotations. Our key technical contribution is to show that, by training the network to explicitly discriminate challenging image fragments, even for images that do contain the object emitting the sound, we can significantly boost the localization performance. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Localizing Visual Sounds the Hard Way Honglie Chen, Weidi Xie, Triantafyllos Afouras, Arsha Nagrani, Andrea Vedaldi, Andrew Zisserman CVPR, 2021 |
URL | https://www.robots.ox.ac.uk/~vgg/research/lvs/ |
Title | PASS: An ImageNet replacement for self-supervised pretraining without humans |
Description | PASS is a large-scale image dataset that does not include any humans and which can be used for high-quality pretraining while significantly reducing privacy concerns. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | YM. Asano, C. Rupprecht, A. Zisserman, A. Vedaldi PASS: An ImageNet replacement for self-supervised pretraining without humans NeurIPS Dataset Track, 2021 |
URL | https://www.robots.ox.ac.uk/~vgg/data/pass/ |
Title | PULSE dataset and annotations |
Description | A multi-modal dataset consisting of fetal ultrasound video, gaze tracking data, probe movement data and sonographer audio for first, second and third trimester scans. Audio has been translated to text. A large subset of the ultrasound video is automatically annotated in terms of anatomy label (single label per frame). Manual annotation has been done on a smaller subset. This dataset was generated as part of the ERC Advanced Grant PULSE but has been used for research on UKRI projects which have also contributed some analysis methods for automatic annotation that have improved the value of the data set and annotations as a whole. The dataset is a private dataset. |
Type Of Material | Database/Collection of data |
Year Produced | 2019 |
Provided To Others? | No |
Impact | See outputs listed on the PULSE website and PURFECT webpages as examples. An ultrasound pre-trained model (PULSENet) has also been derived which is used as a backbone for other research. |
Title | Semantic Shift Benchmark |
Description | Demonstrate that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes. Following the success of modern deep learning systems on closed-set visual recognition tasks, a natural next challenge is open-set recognition (OSR) (Scheirer et al., 2013). In the closed-set setting, a model is tasked with recognizing a set of categories that remain the same during both training and testing phases. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Future impact to be determined |
URL | https://www.robots.ox.ac.uk/~vgg/research/osr/#ssb_suite |
Title | Video Person-Clustering Dataset A multi-modal TV-shows and movies dataset |
Description | VPCD contains multi-modal annotations (face, body and voice) for all primary and secondary characters from a range of diverse TV-shows and movies. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | A. Brown, V. Kalogeiton, A. Zisserman Face, Body, Voice: Video Person-Clustering with Multiple Modalities |
URL | https://www.robots.ox.ac.uk/~vgg/data/Video_Person_Clustering// |
Title | Video-text Alignment HTM-Align dataset |
Description | The objective is a temporal alignment network that ingests long term video sequences, and associated text sentences, in order to: (1) determine if a sentence is alignable with the video; and (2) if it is alignable, then determine its alignment. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Future impacts to be determined |
URL | https://www.robots.ox.ac.uk/~vgg/research/tan/ |
Description | NLS Chapbooks |
Organisation | National Library of Scotland |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We used our software to search and analysed the illustrations of the chapbooks. |
Collaborator Contribution | Partner provided chapbooks in large quatities. |
Impact | https://www.robots.ox.ac.uk/~vgg/research/chapbooks/ |
Start Year | 2020 |
Description | National Consortium of Intelligent Medical Imaging |
Organisation | National Consortium of Intelligent Medical Imaging |
Sector | Academic/University |
PI Contribution | A VisualAI postdoc (Jianbo Jiao) is providing expertise for image-based building deep learning models to assess COVID19 deterioration for hospital-based patients. |
Collaborator Contribution | NCIMI is providing access to COVID19 data for a TAP project. |
Impact | An initial evaluation of predictive modelling was performed using available covid-19 data. However due to the small size of the data, and the fact that covid treatments for patients have significantly improved and better pathways for patients are in place it was deemed not worth pursuing this work further beyond the preliminary study. A report was written but has not been published. |
Start Year | 2021 |
Description | TAP VAI-02 1516 Project |
Organisation | University of Copenhagen |
Country | Denmark |
Sector | Academic/University |
PI Contribution | We created a visual search engine using images and metadata supplied by Matilde Malaspina at University of Copenhagen and Barbara Tramelli from University of Venice. |
Collaborator Contribution | Partner provided images and metadata. |
Impact | A talk at Venice Centre for Digital and Public Humanities (VeDPH) on 9th Dec. 2020 |
Start Year | 2020 |
Description | TAP-VAI-03 16cIllustration Project |
Organisation | Ca' Foscari University of Venice |
Country | Italy |
Sector | Academic/University |
PI Contribution | We created a visually searchable database (https://www.robots.ox.ac.uk/~vgg/research/16ci/lyon/) of 16th century illustrations printed in Lyon. |
Collaborator Contribution | Partner provided images and metadata. |
Impact | The researchers at Venice Centre for Digital and Public Humanities are using this visual search engine as a research support tool. |
Start Year | 2021 |
Description | TAP-VAI-04 Frank-Scholten Archive |
Organisation | Leiden University |
Country | Netherlands |
Sector | Academic/University |
PI Contribution | Using our VISE software, we found a match between all the photographs and their corresponding negative in the Frank-Scholten image archive. |
Collaborator Contribution | They provide Dataset containing photographs and negatives captured by Frank-Scholten. |
Impact | tbc |
Start Year | 2021 |
Description | TAP-VAI-08 Fish Pool Trajectory |
Organisation | University of Oxford |
Department | Department of Zoology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We are developing tools and workflow to detect and track a Picasso triggerfish moving in a fish tank to find the food target. |
Collaborator Contribution | They provide videos dataset showing Picasso triggerfish in a fish pool. |
Impact | tbc |
Start Year | 2021 |
Description | TAP-VAI-09 Fish Tank Obstacles |
Organisation | University of Oxford |
Department | Department of Zoology |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | We are developing tools and workflow to detect and track Picasso triggerfish navigating through obstacles to reach a food target in a fish tank. |
Collaborator Contribution | They provide videos dataset showing Picasso triggerfish in a fish tank containing obstacles. |
Impact | tbc |
Start Year | 2021 |
Description | TAP-VAI-10 Czech National Library/ Czech Academy of Sciences |
Organisation | National Library of the Czech Republic |
Country | Czech Republic |
Sector | Public |
PI Contribution | We are providing technical support to the RKD team for implementing our VISE software in their platform. |
Collaborator Contribution | They are using our software tool (VISE) |
Impact | tbc |
Start Year | 2021 |
Description | TAP-VAI-11 RKD |
Organisation | Netherlands Institute for Art History |
Country | Netherlands |
Sector | Public |
PI Contribution | We are providing technical support to the RKD for implementing visual image search feature in the public facing web portal and internal research using our VGG Image Search Engine (VISE) software (https://www.robots.ox.ac.uk/~vgg/software/vise/). |
Collaborator Contribution | The RKD Provide images in millions and they are now using our VISE software for visual search functionality. |
Impact | Not yet. |
Start Year | 2021 |
Title | Audio-visual synchronisation |
Description | The software enables: Audio-visual synchronisation. Requires a model to relate changes in the visual and audio streams. Prior work focused primarily on the synchronisation of talking head videos. In contrast, open-domain videos often have a small visual indication, i.e. sparse in space. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Paper in British Machine Vision Conference (BMVC), 2022. Future impacts to be determined. |
URL | https://iashin.ai/SparseSync |
Title | Audio-visual synchronisation - Synchformer |
Description | An audio-visual synchronization model: the inputs are the audio and visual streams of a video, and the output is the temporal offset. The approach is applicable to both dense and sparse (in time and space) audio-visual synchronization cues (e.g. a person talking (dense in time) or a dog barking (sparse in time)). A particular advantage of the model and training is that it decouples feature extraction from synchronization modeling through multi-modal segment-level contrastive pre-training. |
Type Of Technology | Software |
Year Produced | 2024 |
Open Source License? | Yes |
Impact | Paper at ICASSP 2024. Also won Amazon synchronization challenge, https://wacv2024-workshop-quality-iva.github.io/workshop-quality-iva/index.html#Competition Future impacts to be determined. |
URL | https://www.robots.ox.ac.uk/~vgg/research/synchformer/ |
Title | Auditory Slow-Fast |
Description | Recognising actions using auditory signal only |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | Paper won outstanding paper at ICASSP 2021 - 3 papers selected out of 1400 papers. Well-referenced -46 stars. In a followup work by Deepmind [https://arxiv.org/pdf/2111.12124.pdf] this work is referred to as: "We find the Slowfast architecture is good at learning rich repre- sentations required by different domains" extending this work to speech and music audio. |
URL | https://github.com/ekazakos/auditory-slow-fast |
Title | EPIC Fields Code |
Description | This section contains the pipeline for the dataset introduced in our paper, "EPIC Fields: Marrying 3D Geometry and Video Understanding." We aim to bridge the domains of 3D geometry and video understanding, leading to innovative advancements in both areas. |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | Upcoming |
URL | https://github.com/epic-kitchens/epic-Fields-code |
Title | Find Identical Images (FII) |
Description | Identical images have the same image dimension (i.e. image width, image height, number of colour channels) and same pixel value in all corresponding pixel locations. FII is a command line tool to find all identical images in a folder. It can also find images that are common in two folders. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | tbc |
Title | Follow Things Around |
Description | Software to track "things" (e.g. animals) in a video. The input is the video, the output is a text file specifying a bounding box of the `thing' or `things' in each frame. The method uses `tracking by detection', meaning that no manual annotation is required on the video. Instead a detector for the `thing' is required (and detectors are available pre-trained for multiple classes of animals). Follow Things Around is provided as a Jupyter Notebook for Google Colab. It runs in a web browser, without the need for a GPU. A user can access their data for tracking on their Google Drive. |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | It is too early to say. |
URL | https://www.robots.ox.ac.uk/~vgg/software/follow-things-around/ |
Title | Generalised Visual Counting in Images |
Description | Our goal is to develop a generalised visual object counting system, that augments humans' ability for recognising the number of objects in a visual scene. Specifically, generalised visual object counting refers to the problem of identifying the number of the salient objects of arbitrary semantic class in an image (i.e. open-world visual object counting) with arbitrary number of instance "exemplars" provided by the end user, to refer to the particular objects to be counted, i.e. from zero-shot to few-shot object counting. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Future impact to be determined. |
URL | https://arxiv.org/pdf/2208.13721.pdf |
Title | Generalized Category Discovery |
Description | We present a new setting: 'Generalized Category Discovery' and a method to tackle it. Our setting can be succinctly described as: given a dataset, a subset of which has class labels, categorize all unlabelled images in the dataset. The unlabelled images may come from labelled or novel classes. Our method leverages contrastively trained vision transformers to assign labels directly through clustering. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Future impact to be determined |
URL | https://github.com/prajwalkr/vtp#readme |
Title | Image Counterfeit Spotter |
Description | Counterfeit Spotter compares images of suspicious products with a reference image and confirm if it is a real or a fake within seconds, right in your browser. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Still receiving feedback and reporting |
URL | https://www.robots.ox.ac.uk/~vgg/software/image-compare/counterfeit-spotter/#usecases |
Title | ImageCompare |
Description | Image Compare is a lightweight, standalone and offline application to visually compare a pair of images and highlight their differences. This application can be used in desktop computers and mobile phones without requiring installation as it runs entires in a web browser. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | tbc |
Title | Lip Reading |
Description | To learn strong lip reading models that can recognise speech in silent videos. |
Type Of Technology | Software |
Year Produced | 2022 |
Impact | Research shown the best models achieve state-of-the-art results, outperforming prior work trained on public data by a significant margin, and even industrial models trained on orders of magnitude more data. We have also designed a Visual Speech Detection model on top of our lip reading system that obtains state-of-the-art results on this task and even outperforms several audio-visual baselines. |
URL | https://www.robots.ox.ac.uk/~vgg/research/vtp-for-lip-reading/ |
Title | List Annotator (LISA) |
Description | List Annotator (LISA) is a standalone and light-weight HTML/CSS/JavaScript based application to efficiently annotate a large list of images. LISA is an open source project developed and maintained by the Visual Geometry Group (VGG) and released under a license that grants its users the freedom to use it for any purpose. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | tbc |
Title | Motion Grouping |
Description | This software implements the model as described in the paper. It includes a pre-trained model and inference code to apply to downstream images, as well as the training code to train the model from scratch. It also includes code to evaluate and benchmark the results against existing datasets (DAVIS2016, FBMS59, SegTrackv2, MoCA). |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | This code accompanies the paper: Self-supervised Video Object Segmentation by Motion Grouping Charig Yang, Hala Lamdouar, Erika Lu, Andrew Zisserman, Weidi Xie. ICCV 2021 |
URL | https://oxris.ox.ac.uk/viewobject.html?id=1190260&cid=1 |
Title | VGG Image Annotator (VIA) |
Description | VGG Image Annotator is a simple and standalone manual annotation software for image, audio and video. VIA runs in a web browser and does not require any installation or setup. The complete VIA software fits in a single self-contained HTML page of size less than 400 Kilobyte that runs as an offline application in most modern web browsers. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | tbc |
Title | VGG Image Search Engine (VISE) |
Description | VGG Image Search Engine (VISE) is a free and open source software for visual search of a large number of images using an image as a search query. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | tbc |
Title | VGG Visual Tracker (VV |
Description | VGG Visual Tracker (VVT) is a tool for creating bounding box annotations on videos in a semi-automatic fashion, using class agnostic object trackers. VVT runs on modern web browsers (Chrome 65+, Firefox 60+, Safari 11+) and does not require any installation or setup. VVT is a variation of the VGG Image Annotator (VIA) v3 tool and uses the same data format. So, if you are already using VIA v3, the annotations are interoperable with your existing workflow. No changes required. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | tbc |
Title | Visual Analysis of Chapbooks |
Description | The chapbooks were produced cheaply to create everyday reading material and were the most popular reading material for the masses [1]. This dataset has been made freely available by the National Library of Scotland (NLS) |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | Reduced printing costs, these woodcuts were reused across multiple chapbooks. Helped researchers pursue many related research questions using software tools based on computer vision. |
URL | https://data.nls.uk/data/digitised-collections/chapbooks-printed-in-scotland/ |
Title | m-bain/whisperX: v3.0.0 |
Description | batched inference with faster-whisper backend |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | This repository provides fast automatic speech recognition (70x realtime with large-v2) with word-level timestamps and speaker diarization. 25 pull requests |
URL | https://zenodo.org/record/7876369 |
Description | 'Humanist in the Loop: Computer Vision by Example for the Study of Early Printed Books', University of Helsinki Digital Humanities Seminar. |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | A presentation to a historical research group in Computational History who are experimenting in computer vision methods themselves. The presentation included comparison of our respective measures and discussion of common challenges and potential solutions. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.helsinki.fi/en/digital-humanities/teaching/digital-humanities-research-seminar |
Description | (1) VIA: Image and Video Annotation; (2) Image Comparator; (3) Image search and retrieval |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Industry/Business |
Results and Impact | Show and Tell Event at the Oxford Big Data Institute. Researchers at the Big Data Institute are now aware about our computer vision tools that can significantly improve their existing research workflow. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.bdi.ox.ac.uk/ |
Description | - How to study early printed books with computer vision: a practical introduction, University of Newcastle-upon-Tyne |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation on and hands-on workshop with Visual AI project software and showcase of successful collaborations in this domain. Outputs included plans for uptake and further development of software on external research projects, |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.eventbrite.co.uk/e/how-to-study-early-printed-books-with-computer-vision-a-practical-int... |
Description | ACH talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Forum for conversations on an expansive definition of digital humanities in a broad array of subject areas, methods, and communities of practice. |
Year(s) Of Engagement Activity | 2021 |
URL | https://drive.google.com/file/d/1CN5CDWPf4cLTT1NY9gyP-JxxvsCdzRG-/view |
Description | AD-Manual Annotation of Radiology Images using VGG Image Annotator (VIA) online Course |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | The course website lists our manual annotation tool (VIA). This provides a lot of exposure to our software, which is available for the world as an open source tool. This tool significantly speed up annotating work for professions that need to annotate large volume of visual data. |
Year(s) Of Engagement Activity | 2020 |
URL | https://folio47.wixsite.com/rp-course/radiology-preprocessor-workflow |
Description | AD-TAP Outcome Presentation for Leiden University |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | Presented the outcome of our collaboration with Leiden University. The team at Leiden University were extremely excited to see the results from our visual search engine. They said that they were "jumping like a child" after seeing the outcome and that this collaboration will lead to many new research projects in related to the Frank Scholten Archives. |
Year(s) Of Engagement Activity | 2021 |
Description | AEOLIAN Network workshop presentation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Presentation describing a project undertaken within the National Librarian of Scotland's Fellowship in Digital Scholarship programme for 2020-1. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.aeolian-network.net/events/workshop-1-employing-machine-learning-and-artificial-intellig... |
Description | AI4 LAM online conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Aimed at professionals in the LAM (Libraries, Archives and Museums sector, this was an online workshop teaching the use of several Visual AI tools and giving context to their application in this sector. Issues of attribution, bias and fairness were discussed as well as technical areas. |
Year(s) Of Engagement Activity | 2022 |
URL | https://sites.google.com/view/ai4lam/ai4lam-2022-virtual-event |
Description | AI4LAM 2023 Annual Conference, Internet Archive Canada |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Conference for technical professionals and allied researchers in the Libraries, Archives and Museums sector, my participation included a presentation/hands on workshop and follow-on discussions, generating three separate requests for collaboration or more information on Visual AI project tools. |
Year(s) Of Engagement Activity | 2023 |
URL | https://ff2023.archive.org/pages/program/ |
Description | AI4LAM workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Introduce the use of visual AI for collections research, access and management. Using the example of collaborations between Oxford's Visual Geometry Group (VGG) and researchers and curators within the GLAM sector, the speaker will provide a hands-on introduction to VGG's open-source tools for visual search, classification, comparison and annotation. |
Year(s) Of Engagement Activity | 2021 |
URL | https://libereurope.eu/event/introduction-to-visual-ai-in-glams-workshop-series-on-applying-and-depl... |
Description | AIUM 2021 Special Session Invited Speaker |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Invited speaker in Session with Title: Deep Learning Applications for New Ultrasound Techniques. Talk was pre-recorded with live questions. This primary audience was medical physicists rather than medical image analysis experts. |
Year(s) Of Engagement Activity | 2021 |
Description | AV4D: Visual Learning of Sounds in Spaces |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Workshop at the European Conference on Computer Vision (ECCV). |
Year(s) Of Engagement Activity | 2022 |
URL | https://av4d.org |
Description | AWS Human-Machine Collaboratory conference |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Industry/Business |
Results and Impact | Hosted by the Amazon Web Services (AWS)-funded Human-Machine Collaboratory at Oxford, Giles Bergel gave two talks (one alongside Dan Schofield, another Visual AI ambassador) on Visual AI collaborations and research in fields ranging from primatology to cultural heritage and media studies. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.mpls.ox.ac.uk/innovation-and-business-partnerships/human-machine-collaboration |
Description | Aberystwyth Bibliographical Group |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | Presented work in tracing woodcut illustrations, their original woodblocks and copies throughout the surviving corpus of British ballads and chapbooks. He discussed how woodcuts in these forms of cheap print served as visual brands for particular titles, genres or producers of cheap print, and demonstrated some of the bibliographical uses of their identification. Showed how computer vision software can strongly support these researches, and may be further applied to printed images of all kinds. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.hugofox.com/community/aberystwyth-bibliographical-group-19783/reports-of-recent-meetings... |
Description | Co-chaired Royal Society-CAS Science and AI workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Workshop on Science and AI - I co-chaired and gave a talk. |
Year(s) Of Engagement Activity | 2023 |
Description | Co-organiser of ASMUS2021, a MICCAI workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Advances in Simplifying Medical UltraSound (ASMUS) 2021 is an international workshop that provides a forum for research topics around ultrasound image computing and computer-assisted interventions and robotic systems that utilize ultrasound imaging. It was held in conjunction with MICCAI 2021 in virtual form. Accepted papers were selected based on their scientific contribution, via a double-blind process involving written reviews from at least two external reviewers in addition to a member of the committee. The published work includes reports across a wide range of methodology, research and clinical applications. Advanced deep learning approaches for anatomy recognition, segmentation, registration and skill assessment are the dominant topics, in addition to ultrasound-specific new approaches in augmented reality and remote assistance. Three invited speakers were included in the workshop, and live demos of technologies were given. The meeting had 80+ attendees. |
Year(s) Of Engagement Activity | 2021 |
URL | https://miccai-ultrasound.github.io/#/asmus21 |
Description | Computational ethology and cultural evolution in wild chimpanzees, Department of Evolutionary Anthropology, University of Zurich |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation by Daniel Schofield as part of an Anthropology research symposium and conference. Introduced VGG tools to the Human Evolutionary Ecology Group, University of Zurich and facilitated discussion how visual AI tools could be used for human anthropological rearch. |
Year(s) Of Engagement Activity | 2023 |
Description | Computer vision for the investigation of ancient documents Saint-Étienne, 6-7 April 2023 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Two-day workshop on computer vision and historical documents, publicising work done by Visual AI project members, three of who presented (myself, Abhishek Dutta and Prasanna Sridhar) leading to follow-up events in Oxford and (potentially) Milan involving fellow-practitioners and to enhancements to Visual AI open-source software tools (Image Comparator) made by the host organisation. |
Year(s) Of Engagement Activity | 2023 |
URL | https://ro2i.hypotheses.org/351 |
Description | ConCode webinar |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Presentation will highlight some of the ways that cultural heritage collections are using computer vision (or visual AI) for collections management and research, focussing particularly on the work of the Oxford Visual Geometry Group and its collaborators. |
Year(s) Of Engagement Activity | 2021,2022 |
URL | https://www.youtube.com/watch?v=d4XaZ4bur6Q |
Description | Conference of European National Librarians 'AI in Libraries' Webinar |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation by Horace Lee, with Giles Bergel, on Visual AI's WISE Image Search Engine to members of various national libraries in Europe. A Digital Curator from the British Library expressed her interest in using WISE for the British Library's image collection. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.cenl.org/network-group-ai-in-libraries-webinars-2023/ |
Description | Deep Discoveries webinar |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Discussed how computer vision excels at matching identical features within images and has made progress in broad classification tasks, though the middle ground remains challenging. Visual similarity, which is essential to human visual recognition, is challenging to conceptualise, measure and compute. Outlined some approaches to defining similarity in computational terms, drawing on the experience of the Visual Geometry Group (Oxford) in collaborating with cultural heritage researchers. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.eventbrite.co.uk/e/computer-vision-and-heritage-opportunities-for-research-and-engagemen... |
Description | Digital Humanities Annual Conference - Tokyo |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | The project gave a paper and lead a workshop teaching the use of Visual AI software tools for the study of printed illusttrations. |
Year(s) Of Engagement Activity | 2022 |
URL | https://dh2022.adho.org/ |
Description | Digital Humanities Congress Sheffield |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | Presentation of Visual AI collaborations on book history to a diverse audience of digital humanists to promote the sharing of knowledge, ideas and techniques within the digital humanities. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.dhi.ac.uk/dhc2022/ |
Description | Digital Humanities and Book History conference |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Project research, tools and collaborations presented to a digital humanities audience, working in particular in the field of book history, in which field Visual AI has a high profile |
Year(s) Of Engagement Activity | 2022 |
URL | https://dcsco-op.org/past-events/dhbh/ |
Description | Digital Humanities at Oxford Summer School |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Two separate presentations at a high-profile, diverse and well-attended event, both of which introduced participants to Visual AI tools and methods. One of the sessions included presentations from other Visual AI project members (Abhishek Dutta; David Pinto; and Prasanna Sridhar). There was some lively debate and some requests for further information about project software |
Year(s) Of Engagement Activity | 2023 |
URL | https://web.cvent.com/event/58fc430e-5294-4919-a7a3-c2b14f81a059/websitePage:bc9d128c-098d-49e0-97c9... |
Description | Digital Humanities at Oxford Summer School |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | A presentation and two hands-on sessions on Visual AI tools and collaborations in Digital Humanities |
Year(s) Of Engagement Activity | 2022 |
URL | https://eng.ox.ac.uk/events/dhoxss-2022/ |
Description | Digitising, Cataloguing, Searching and Sharing the Medieval and Early-Modern Image: On-Going Projects & Different Methodologies |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Other audiences |
Results and Impact | Presentation on digitising, Cataloguing, Searching and Sharing the Medieval and Early-Modern Image: On-Going Projects & Different Methodologies |
Year(s) Of Engagement Activity | 2021 |
Description | Distinguished Keynote Speaker in Biomedical and Health Data Science in two joint conferences of IEEE EMBS BHI and BSN 2021 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Keynote talk entitled: Simplifying interpretation and acquisition of ultrasound scans, delivered virtually. Abstract: Short Abstract: With the increased availability of low-cost and handheld ultrasound probes, there is interest in simplifying interpretation and acquisition of ultrasound scans through deep-learning based analysis so that ultrasound can be used more widely in healthcare. However, this is not just "all about the algorithm", and successful innovation requires inter-disciplinary thinking and collaborations. In this talk I will overview progress in this area drawing on examples of my laboratory's experiences of working with partners on multi-modal ultrasound imaging, and building assistive algorithms and devices for pregnancy health assessment in high-income and low-and-middle-income country settings. Emerging topics in this area will also be discussed. |
Year(s) Of Engagement Activity | 2021 |
Description | Edinburgh CDCS Digitised Documents Series workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Workshop to showcase the state of the art in Visual AI for cultural heritage and the digital humanities, and provide a hands-on introduction to some simple techniques for searching and classifying imagery in books, paintings, photographs and film. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.cdcs.ed.ac.uk/events/visual-ai-and-humanities-introduction |
Description | Edinburgh CDCS workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | Workshop to showcase the state of the art in Visual AI for cultural heritage and the digital humanities, and provide a hands-on introduction to some simple techniques for searching and classifying imagery in books, paintings, photographs and film. Introduced participants to the study of bias within AI, as such controversial applications as facial recognition and automated image categorisation. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.cdcs.ed.ac.uk/events/workshop-chapbooks-national-library-scotland |
Description | Fantastic Futures Conference |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Conference to aim to help participants discover: basic concepts of artificial intelligence in the GLAM sector, concrete uses and practices of AI in the GLAM sector, technologies and tools applicable to the GLAM sector's data and collections. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.bnf.fr/en/agendaEN/workshops-tutorials-les-futurs-fantastiques-3rd-conference-about-arti... |
Description | Helping Computers See and Understand the World Around Us |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | Science Week Demonstration for Year 3 and Year 4 students at the Cutteslowe Primary School in Oxford |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.cutteslowe.oxon.sch.uk/ |
Description | History of Printed Illustrations webinar |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | Presentation, drawing on a recent collaboration with the National Library of Scotland on their chapbook collections, demonstrated how computer vision (or 'visual AI') can support the study of printed illustrations. Demonstrated free software developed for these purposes; discuss its strengths and weaknesses; and consider its overall place within the illustration researcher's toolbox. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.cphc.org.uk/events/2021/7/8/hopin-webinar-ly8r3 |
Description | ICDAR Hip2021 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Workshop to bring together researchers from various fields working on document image acquisition, restoration, analysis, indexing, and retrieval to make these documents accessible in digital libraries. |
Year(s) Of Engagement Activity | 2021 |
URL | https://blog.sbb.berlin/hip2021/ |
Description | IIIF Community Call |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Community Call discussing the University of Oxford Visual Geometry Group's work with IIIF and Machine Learning |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.youtube.com/watch?v=KXE3-LD6xxI&t=1s |
Description | Indiana University Booklab |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Hands-on workshop teaching the use of Visual AI software outputs to an audience of digital humanities practitioners and students. As well as discussions, this lead to an invitation to return for a longer period of time and to teach the tools in classroom and professional settings. |
Year(s) Of Engagement Activity | 2023 |
URL | https://booklab.indiana.edu/news-events/past-events/giles-bergel-2023.html |
Description | International Computer Vision Summer School |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | International Computer Vision Summer School |
Year(s) Of Engagement Activity | 2022 |
URL | https://iplab.dmi.unict.it/icvss2022/ |
Description | International Conference on Computer Vision, 2023 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Andrew Zisserman was one of the General Chairs who organized the International Conference on Computer Vision (ICCV) in Paris, France. ICCV is one of the three principal international computer vision conferences. Over 7000 registered to attend the conference. |
Year(s) Of Engagement Activity | 2023 |
URL | https://iccv2023.thecvf.com/ |
Description | Invited talk at University of British Columbia |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Invited talk at the University of British Columbia, Canada. Also included visiting labs and follow up may be writing a grant together and exchange of students. |
Year(s) Of Engagement Activity | 2023 |
Description | Learning 3D Geometry |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Undergraduate students |
Results and Impact | Lecture in the computer vision course at the University of Amsterdam. |
Year(s) Of Engagement Activity | 2022 |
Description | Learning 3D Geometry: From Fusion to Generation |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Talk at the CVPR CV4MR Workshop on Computer Vision for Mixed Reality. |
Year(s) Of Engagement Activity | 2023 |
URL | https://cv4mr.github.io |
Description | Learning on Screen - BoB/TRilT Academic Engagement launch |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | Project tools and collaborations advertised to researchers seeking to use one of the largest research databases of UK TV programmes, leading to follow up discussions. |
Year(s) Of Engagement Activity | 2022 |
URL | https://learningonscreen.ac.uk/guidance/bob-and-trilt-for-research/launch-event/ |
Description | Libraries Rewired: A CILIP Digital Transformation Event |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Abhishek Dutta and Prasanna Sridhar presented Visual AI software and demos to CILIP, the UK professional body for librarians and information professionals. the event raised awareness of Visual AI's work, and attracted the interest of a number of IT suppliers to the sector, who the team are following up with. |
Year(s) Of Engagement Activity | 2023 |
URL | https://librariesrewired.org.uk/ |
Description | London Rare Books Summer School - the Digital Book Historian's Toolkit |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Schools |
Results and Impact | View of the landscape of digital research in book history, including bibliographic data and content management systems, data visualisation, systems for image sharing and annotation in libraries and archives, computer vision, and (semi-)automated collation. Instead of emphasising mastery of any particular technology, we encouraged computational thinking and digital experimentation to enhance historical research questions and information management. |
Year(s) Of Engagement Activity | 2021 |
Description | MIUA 2021 Conference - co-organiser |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | MIUA is a UK-based international conference for the communication of image processing and analysis research and its application to medical imaging and biomedicine. This was the 25th edition of the meeting which was held virtually. 40 papers were presented (27k downloads as of 09-03-2022). MIUA is the principal UK forum for communicating research progress within the community interested in image analysis applied to medicine and related biological science. The meeting is designed for the dissemination and discussion of research in medical image understanding and analysis, and aims to encourage the growth and raise the profile of this multi-disciplinary field by bringing together the various communities including among others: |
Year(s) Of Engagement Activity | 2021 |
URL | https://miua2021.com/ |
Description | Max Planck BibHerz Library Seminar: Reflections on the Digital Turn in the Humanities and the Sciences |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Seminar on how digital technologies have changed approaches to the discovery, study, and presentation of images; what impact the changing dynamic between the analogue and digital manifestation of the book or manuscript has on their working practices; and how this affected their use and questions that are asked or could be asked. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.biblhertz.it/3069990/seminar-series-reflections-on-the-digital-turn-in-the-humanities-an... |
Description | NLS Digital Scholarship Workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Industry/Business |
Results and Impact | 15 attendees for an annual workshop, which sparked questions and ongoing discussions. |
Year(s) Of Engagement Activity | 2021 |
Description | National Academies roundtable on researcher access to data |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | The National Academies Data Reform Round Table was a by invitation meeting that discussed some of the current challenges that researchers face with getting access to data for research due to current data protection regulation. The Department for Digital, Culture, Media and Sport (DCMS) was consulting on reforming the UK's data protection regime which formed part of a larger effort to implement the government's National Data Strategy, and specifically Mission 2 of that strategy: 'supporting a pro-growth and trusted data regime'. This issue affects researchers working in computer vision and medical image analysis and this was part of the discussion. In terms of impact/outcome, the meeting output fed into a response that hopefully will have influence (how direct can not be measured/it is too early to determine but I selected this box in the next question for this reason). |
Year(s) Of Engagement Activity | 2021 |
Description | National Academies' party conference event speaker |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Policymakers/politicians |
Results and Impact | Speaker on the (virtual) National Academies panel at the Liberal Democrat political party conference which focused on the theme of 'Becoming a "science superpower": will the UK be fit to tackle the next global crisis?'. Briefing: The panel discussions will address how the UK should approach the future, building resilience to future crises and achieving 'superpower' status. The panel will include leading experts representing the National Academies, as well as representatives from the political parties and a journalist Chair. Not aware of any direct impact (see next week) but these sessions are an important part of keeping an open and positive dialogue with MPs. |
Year(s) Of Engagement Activity | 2021 |
Description | National Librarian of Scotland's Lecture in Digital Scholarship |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Media (as a channel to the public) |
Results and Impact | Introduced research on chapbooks using Visual AI and how machine vision can help others to understand printed heritage collections. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.youtube.com/watch?v=5jkq0iLzMvo&t=10s |
Description | Neural Geometry and Rendering: Advances and the Common Objects in 3D Challenge? |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Workshop at the European Conference on Computer Vision (ECCV). |
Year(s) Of Engagement Activity | 2022 |
URL | https://ngr-co3d.github.io |
Description | Office for National Statistics, Integrated Data Programme Advisory Group, Member, |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | The Office for National Statistics Integrated Data Programme Advisory Group offers advise to the ONS on its programme aimed at sharing data for pubic good with other organisations. I was invited due to my role as Chair of the Royal Society PETs science policy work together with my research interest in health data science/medical image analysis. |
Year(s) Of Engagement Activity | 2021,2022 |
Description | OxML - Oxford Machine Learning Summer School |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Gave a lecture at the OxML summer school on Deep Learning. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.oxfordml.school |
Description | Practical Applications of IIIF Seminar: Image Registration and IIIF |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Public/other audiences |
Results and Impact | Discussing the methods, challenges and possibilities of Image Registration. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.iiconservation.org/content/practical-applications-iiif-seminar-1-image-registration-and-... |
Description | Presentation for British Museum Portable Antiquities Scheme |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Horace Lee gave a presentation to members of the British Museum Portable Antiquities Scheme (PAS) on the capabilities of Visual AI's WISE Image Search Engine and how it can be used in archaeology. This was part of an ongoing collaboration with the British Museum |
Year(s) Of Engagement Activity | 2023 |
Description | Presentation to The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences symposium "Documenting, Understanding, Preserving Cultural Heritage. Humanities and Digital Technologies for Shaping the Future", Florence, Italy, July 2023 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation by Giles Bergel, Abhishek Dutta and Andrew Zisserman (Visual AI) and Rosario Carvalho et al on the Az Infinitum project, a collaboration with Visual AI that integrated the VGG Image Search Engine (VISE) software in a web application to allow the search of large collections of decorative Portuguese 'azulejo' tiles. The presentation and other outputs (a paper and web application) raised awareness of the use of Visual AI project software in the domains of art history, heritage science and digital humanities, impacting professionals in those domains and raising public interest in the use of computer vision to understand these historical materials. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.timemachine.eu/ltm-projects/az-infinitum-azulejo-indexation-and-referencing-system/ |
Description | Renaissance Society of America Day of Digital Learning |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | RSA DAY OF DIGITAL LEARNING. Featuring a varied menu of sessions involving hands-on, participatory work with digital tools and resources. |
Year(s) Of Engagement Activity | 2021 |
URL | https://rsaddl.hcommons.org/ |
Description | Renaissance Society of America Day of Digital Learning |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | An introduction to computer vision - the extraction of information from images - for the purposes of book and art history. Overview of the field, with particular reference to collaborative research performed by the Visual Geometry Group (VGG) at Oxford. |
Year(s) Of Engagement Activity | 2022 |
URL | https://rsa2022ddl.hcommons.org/main-page/rsa-ddl-2022-topics/ |
Description | Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group - Chair |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Policymakers/politicians |
Results and Impact | Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group (policy report), Chair, 2017-19. Also Chair of follow-on to initial report, 2021-. |
Year(s) Of Engagement Activity | 2019,2020,2021,2022 |
URL | https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies |
Description | Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group - Chair |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Policymakers/politicians |
Results and Impact | Royal Society Privacy Enhancing Technologies (PETs) Policy Working Group (policy report), Chair, 2017-19. Also Chair of follow-on to initial report, 2021-. |
Year(s) Of Engagement Activity | 2019,2020,2021,2022 |
URL | https://royalsociety.org/-/media/policy/projects/privacy-enhancing-technologies |
Description | Royal Society and DSIT Workshop on Science and AI Safety |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Royal Society and DSIT Workshop on Science and AI Safety including discussion meeting as well as red teaming activity with postgraduate students. The link below was a high profile output from part of the event. I provided opening comments for the event (but organisation was led by the Royal Society team and DSIT). |
Year(s) Of Engagement Activity | 2023 |
URL | https://time.com/6328851/scientists-training-ai-safety/ |
Description | Royal Society and US National Academy of Sciences Forum on Researcher Access to Data |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Forum description: The pandemic has demonstrated that there is strong public benefit derived from researchers having prompt access to a variety of data sources, such as data from public and government bodies, as well as private companies (in particular, tech companies). There is also significant interest in how we connect and link the different data sources. The Forum will address the evolution of researcher access to data; best practices and lessons learned from fields that are on the forefront of data sharing (i.e., climate studies, astrophysics, biomedicine); and challenges related to pressing societal problems such as online information (and misinformation), modeling for pandemics, and using data in emergencies. |
Year(s) Of Engagement Activity | 2023 |
Description | Sight and Sound Workshop at the IEEE Conference on Computer Vision and Pattern Recognition, 2021 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2021. This is the description of the workshop: While traditionally visual and audio data have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is appealing, and this workshop will cover recent advances in this direction. It will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing. |
Year(s) Of Engagement Activity | 2021 |
URL | https://sightsound.org/2021/ |
Description | Sight and Sound Workshop at the IEEE Conference on Computer Vision and Pattern Recognition, 2023 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Andrew Zisserman co-organized the Sight and Sound Workshop at CVPR 2023. This is the description of the workshop: While traditionally visual and audio data have been studied in isolation, researchers have increasingly been creating algorithms that learn from both modalities. This has produced many exciting developments in automatic lip-reading, multi-modal representation learning, and audio-visual action recognition. Since pretty much every internet video has an audio track, the prospect of learning from paired audio-visual data - either with new forms of unsupervised learning, or by simply incorporating sound data into existing vision algorithms - is appealing, and this workshop will cover recent advances in this direction. It will also touch on higher-level questions, such as what information sound conveys that vision doesn't, the merits of sound versus other "supplemental" modalities such as text and depth, and the relationship between visual motion and sound. We'll also discuss how these techniques are being used to create new audio-visual applications, such as in the fields of speech processing and video editing. |
Year(s) Of Engagement Activity | 2023 |
URL | https://sightsound.org/2023/ |
Description | Sixth Form Schools Science Talk |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Schools |
Results and Impact | Gave talk to lower sixth form students at Magdalen College School on my research. This was part of their lecture series related to the lower sixth form project which provides them with experience of researching a topic. Lots of interesting questions particularly about the global health angle of the research/potential impact and ethics of using AI. In fact the quality of questions was much higher than most technical audience ones! Teacher followup said there was good discussion afterwards. |
Year(s) Of Engagement Activity | 2022 |
Description | Summer School on Artificial Intelligence, India |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Lectured at Summer School on "Recognizing Human Actions in Videos", followed by Q & A session. |
Year(s) Of Engagement Activity | 2021 |
URL | https://cvit.iiit.ac.in/summerschool2021/index.php |
Description | Talk at the Machine Learning and Computer Vision Research Group at University of Bristol |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation by Abhishek Dutta and Prasanna Sridhar on Visual AI project tools and workflows, in particular annotation and model training ('Manual Annotation of Images and Video using VIA'), leading to requests for information and further plans. |
Year(s) Of Engagement Activity | 2023 |
URL | https://uob-mavi.github.io/people/ |
Description | Talk at the Staff Meeting of History of Science Museum in Oxford on computer vision for heritage collection management and research |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation by Giles Bergel and Abhishek Dutta on Visual AI collaborations with cultural heritage organisations (libraries, museums and galleries) including software demos allowing visual search of digital collections. The Digital Collections manager, and other HSM staff, made appointments for follow up meetings and inquiries have been made to the Museum's IT suppliers. |
Year(s) Of Engagement Activity | 2023 |
Description | The Sixth Annual Conference for Research Software Engineering |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Research Software Engineers from other Universities got to learn about our methods and processes of developing software tools that are used widely all over the world. |
Year(s) Of Engagement Activity | 2022 |
URL | https://virtual.oxfordabstracts.com/#/event/3101/submission/70 |
Description | Understanding egocentric data in 3D |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | https://ego4d-data.org/workshops/cvpr23 |
Year(s) Of Engagement Activity | 2023 |
Description | University of Illinois HRI Introduction to Computer Vision for Digital Humanists |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Online workshop delivering training in Visual AI software tools, leasing to discussion on their utility and a follow-up call with a prominent digital humanist working on historical newspapers |
Year(s) Of Engagement Activity | 2023 |
URL | https://mediaspace.illinois.edu/media/t/1_arib8duv/28379181 |
Description | University of Oxford Social Sciences Division 'Common Ground' seminar series: AI and Society, |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Dan Schofield have an introduction of Visual AI to division of social sciences, Oxford and engaged in discussions about ethics and how to foster collaborations between AI engineering teams and social science researchers in Oxford. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.socsci.ox.ac.uk/article/new-common-ground-seminar-series-starts-with-ai-and-society |
Description | University of Stockholm Digital Humanities Now workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Showcase new and ongoing research in the broad Digital Humanities field. |
Year(s) Of Engagement Activity | 2021 |
URL | https://su.powerinit.com/Data/Event/EventTemplates/2602/?EventId=879 |
Description | VGG Image Search Engine (VISE) |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Industry/Business |
Results and Impact | Talk at the RKD Netherlands Institute for Art History. The RKD team have integrated our VISE image search engine software into their platform. In this event, all the contributors to the digital platform talked about their work and their software. Our VISE software was introduced to a wider group of international audience. |
Year(s) Of Engagement Activity | 2022 |
URL | https://rkd.nl/en/ |
Description | VisuAI Show and Tell 2021 |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Presneted our Visual annotation and Visual search software to potential interetest reseachers, some of whom enquired further and later adopted tools in their research. |
Year(s) Of Engagement Activity | 2021 |
Description | Visual AI for ethology: chimpanzee behaviour analysis using deep learning, Department of Evolutionary Anthropology, University of Zurich |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Research presentation by Daniel Schofield to the University of Zurich Anthropology department, outlining computer vision applications for ethology as well as introducing Visual AI software. |
Year(s) Of Engagement Activity | 2023 |
Description | VisualAI Show and Tell |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Postgraduate students |
Results and Impact | The event was for the University of Edinburgh. It showcased the software developed by the VisualAI team with the aims of publicising the open source software produced in the project, and of attracting potential collaborators. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.robots.ox.ac.uk/~vgg/projects/visualai/events.html#ST15621 |
Description | VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'. |
Year(s) Of Engagement Activity | 2021 |
URL | https://www.robots.ox.ac.uk/~vgg/data/voxceleb/interspeech2021.html |
Description | VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'. |
Year(s) Of Engagement Activity | 2023 |
URL | https://mmai.io/datasets/voxceleb/voxsrc/interspeech2023.html |
Description | VoxCeleb Speaker Recognition Challenge (VoxSRC) Workshop 2022 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | Andrew Zisserman co-organized the VoxCeleb Speaker Recognition Challenge (VoxSRC) and workshop. The purpose of the challenge was to "probe how well current methods can recognize speakers from speech obtained 'in the wild'." It was based on the VoxCeleb dataset obtained from YouTube videos of celebrity interviews, and consisting of audio from both professionally edited and red carpet interviews as well as more casual conversational audio in which background noise, laughter, and other artefacts are observed in a range of recording environments. The challenge consisted of both speaker verification and speaker diarisation tracks. The task of speaker verification is to determine whether two samples of speech are from the same person, while speaker diarization involves the more general task of breaking up multi-speaker audio into homogenous single speaker segments, effectively solving 'who spoke when'. |
Year(s) Of Engagement Activity | 2022 |
URL | http://mm.kaist.ac.kr/datasets/voxceleb/voxsrc/interspeech2022.html |
Description | What do you learn after Developing, Maintaining and Supporting Research Software for 6 years? |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Industry/Business |
Results and Impact | Talk at the Vision, Graphics and Learning (VGL) research group in the Department of Computer Science, University of York. The PhD and Postdocs in the VGL group of University of York became aware about the software development methods and practices for create research software tools used by millions all over the world. |
Year(s) Of Engagement Activity | 2022 |
URL | https://www.youtube.com/watch?v=8S0HbFX4HBM |
Description | WikiWorkshop presentation of WISE image search engine |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Abhishek Dutta, Horace Lee, Prasanna Sridhar and Andrew Zisserman presented WISE, a multimodal search engine running on over 50 million images from Wikimedia Commons. This lead to follow-up meeting with the Wikimedia Foundation about how WISE can help the foundation to make images searchable, including to find harmful content. |
Year(s) Of Engagement Activity | 2023 |
URL | https://wikiworkshop.org/2023/# |
Description | Workshop on Studying the Images of Popular Prints: Methods and Theory, Catholic University of Valencia |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Other audiences |
Results and Impact | Two-day workshop on popular Spanish 'pliegos', and related materials, including a hands on session teaching VGG tools applied to these materials by the Spanish Chapbooks project at Cambridge University, who were present. Outcomes included plans for further development of Cambridge and other Spanish resources and use of project software, and an invitation to speak to a similar project at the University of Geneva in 2025. |
Year(s) Of Engagement Activity | 2023 |
URL | http://biblioteca.cchs.csic.es/docs/Poster_Valencia_low.pdf |
Description | Workshop: Introduction to Visual AI for Behavioural Research, Department of Anthropology, University of Oxford. |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Professional Practitioners |
Results and Impact | Hands-on workshop by Daniel Schofield to the University of Oxford Anthropology Department. introducing visual AI tools and core concepts for using computer vision in anthropological research. |
Year(s) Of Engagement Activity | 2023 |
Description | Workshop: Introduction to computer vision tools for primatology: How to annotate, detect and track, Kuching, Malaysia |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | 2 hour workshop and presentation from Dan Schofield to attendees of the International Primatological Society introducing Visual AI tools for primatological research. |
Year(s) Of Engagement Activity | 2023 |
URL | https://ipskuching.com/programme/ |
Description | `A statistical learning perspective on reconstructing the 3D world |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | Invited talk at the BrainWorlds Freiburg-Oxford Workshop. |
Year(s) Of Engagement Activity | 2023 |
URL | https://brainworlds.uni-freiburg.de |