Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language (2024)
Abstract
No abstract provided
Bibliographic Information
Digital Object Identifier: http://dx.doi.org/10.1109/cvpr52733.2024.01246
Publication URI: http://dx.doi.org/10.1109/cvpr52733.2024.01246
Type: Conference/Paper/Proceeding/Abstract