Visits to University of California, Berkeley, Stanford University, and SRI International

Lead Research Organisation: King's College London
Department Name: Institute of Telecommunications

Abstract

During his visit to the Department of Statistics, University of California, Berkeley, Prof. Cvetkovic will be hosted by Prof. Yu, in the Statistical Machine Learning group. He will be focusing on topics of the current activity of Prof. Yu's group, that can be broadly described as statistical machine learning theory, methodologies, and algorithms for solving high-dimensional data problems. Particular problems covered include sparse modelling (e.g. Lasso, compressed sensing), structured sparsity, analysis and methods for spectral clustering, and applications to data which come from a diverse range of interdisciplinary areas, ranging from neuroscience to social networks. During this visit, Prof. Cvetkovic and Prof. Yu will set forth directions for collaboration on problems in learning in high-dimensions, leading to a research grant proposal.

During his previous EPSRC project, EP/D053005/1, Prof. Cvetkovic in collaboration with Prof. Sollich, Department of Mathematics, King's College London, and Prof. Yu developed an unorthodox approach to robust speech recognition in high-dimensional spaces of acoustic waveforms of speech. Dr. Horacio Franco, Director of Speech Technology and Research Laboratory of SRI International, who in 2010 won a major DARPA award for solving the problem of the sensitivity of automatic speech recognition systems to additive noise, finds this approach groundbreaking and expresses a strong interest in exploring venues for collaboration.
The purpose of this visit would be to investigate ways in which the approach developed by Prof. Cvetkovic and his collaborators can be brought closer to practice and based on that investigate the directions of long-term collaboration and possible joint grant proposals between SRI International, King's College London, and UC Berkeley.

At King's College, Prof. Cvetkovic has commenced work on a new multichannel audio technology, supported by EPSRC grant EP/F001142/1. The project produced a considerable publication volume and patent portfolio. A visit to one of world leading centres for music and acoustics technologies, such as CCRMA, would be very beneficial for taking advantage of this gained momentum to penetrate the field, which is still a new application area for Prof. Cvetkovic, at a deeper level, expand its scope, establish collaborations, and inform future grant proposals. At CCRMA, Prof. Cvetkovic will be interacting primarily with Prof. Julius Smith, working on multichannel audio technologies, and other signal processing problems in audio and acoustics. A recent work of Prof. Cvetkovic complements a large volume of work of Prof. Smith on ultra fast rendition of multichannel audio using digital waveguide networks (DWNs). This is an area which is of a significant academic interest, requiring interdisciplinary approaches at the interface of signal processing, acoustics, psychoacoustic, and computer science, as well of a great relevance to virtual reality and gaming applications. While this would be the area of initial focus, at CCRMA there are several other ongoing projects which are closely related to Prof. Cvetkovic's research or research in the Institute of Telecommunications at King's (Mobile Phone Orchestra, Sound in Space, Music in Virtual Worlds), as well as projects which could provide valuable inspiration for possible collaborative projects between the Department of Music and the Institute of Telecommunications at King's and CCRMA (Sound Waves on the Internet for Real-time Echoes, and the Historical Recordings). Finally, most of the largest companies which are potential licensees of Prof. Cvetkovic'c audio technology, such as DTS, Dolby, Microsoft, are based on the west coast of the US. The presence of Prof. Cvetkovic at CCRMA would accelerate the exploration of licensing opportunities, as these and other relevant companies frequently visit CCRMA, and are situated in the Bay Area or not too far from it.

Planned Impact

Visit to the Department of Statistics, University of California, Berkeley

The main purpose of the visit of Prof. Cvetkovic to the Department of Statistics, University of California, Berkeley, is to study emerging information sciences techniques and thus facilitate his engagement in cutting edge research in this field. His subsequent research in this field is meant to have a significant theoretical component, and immediate beneficiaries of this work would be other researchers in signal processing, and statistics and applied probability -- two areas which EPSRC intends to increase its investments in.

The techniques for statistical inference and kernel methods which Prof. Cvetkovic will be focusing on at Berkeley are applicable to a broad and diverse types of data, from social networks to neuroscience, so many segments of science, industry and society would benefit from this work. One particular application which will be considered is cardiac arrhythmia detection and classification. The ultimate benefit of this work would be improved healthcare and societal well being, as ventricular fibrillation (VF) is a leading cause of death in the western world and existing methods for VF detection are not sufficiently reliable. Other beneficiaries include pharmaceutical companies which develop medications for treating different forms of arrhythmias, then biomedical equipment companies which manufacture defibrillators, and finally other researchers in biomedical signal processing, and pharmacology.

Another application which will be considered is automatic speech recognition. The approach developed by Prof. Cvetkovic and Prof. Sollich in collaboration with Prof. Yu is still very novel and original, so results of this work will be of interest to the research community working on speech recognition. Beneficiaries of practical speech recognition systems built around these ideas are discussed in the following.

Visit to Speech Technology and Research (STAR) Laboratory, SRI International

Automatic speech recognition plays an important role in a wide variety of applications, ranging from collecting military intelligence, through assisted living and medical record transcription, to various customer service systems. Beneficiaries therefore include the military, healthcare, and service industries. Reliable and accurate automatic speech recognition systems contribute towards improving national security, providing better healthcare and reducing its cost, and making the IT infrastructure function seamlessly while appearing invisible. This work therefore addresses several important challenges within Digital Economy, Healthcare, and Global Uncertainties themes, as defined by EPSRC.

Visit to the Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

The first and immediate line of beneficiaries of the work done during the visit to CCRMA are researches the fields of signal processing, music and acoustics technologies. Then companies producing systems for multichannel audio, including home stereo systems, game devices, and sound mixing consoles, make another line of direct beneficiaries. Through their products, people working in creative industries which involve sound recording, production and reproduction, will also benefit from this work. Finally, the ultimate beneficiary is the general public which will be able to enjoy superior audio quality at a lower price. Thus, the proposed activity will advance relevant science, it will lead to technological developments, which would have a positive impact on industry and commerce, and it will contribute towards enhancing the quality of life of general public.

Publications

10 25 50
publication icon
De Sena E (2015) Efficient Synthesis of Room Acoustics via Scattering Delay Networks in IEEE/ACM Transactions on Audio, Speech, and Language Processing

publication icon
De Sena E (2020) Localization Uncertainty in Time-Amplitude Stereophonic Reproduction in IEEE/ACM Transactions on Audio, Speech, and Language Processing

 
Title Circular Breathing 
Description A live performance by Reeps One at Somerset House. 
Type Of Art Performance (Music, Dance, Drama, etc) 
Year Produced 2018 
Impact There were no notable impacts. The performance was a pilot performance of Reeps One that uses our software for live sound spatialisation, which demonstrated that our technology is robust and provides unprecedented control of real time soundscape designs. 
 
Title Ouroboros 
Description An immersive 3D audio-visual installation 
Type Of Art Composition/Score 
Year Produced 2017 
Impact Demonstration and first public display of the audio technology created on the projects funded by the associated awards. 
URL http://pantar.com/portfolio/ouroboros/
 
Title Ouroboros 
Description Audio visual poem by Ali Hossaini 
Type Of Art Artistic/Creative Exhibition 
Year Produced 2017 
Impact Installation at Guildhall Art Gallery, June - July 2018. 
 
Title Philosophy Shop 
Description Immersive avant garde play at RADA that uses our VST plugin for sound spatialisation. 
Type Of Art Performance (Music, Dance, Drama, etc) 
Year Produced 2019 
Impact There are no notable impacts yet, but this play demonstrates the robustness and effectiveness of my sound technology in sound design in theatre settings. 
URL https://www.rada.ac.uk/whats-on/the-philosophy-shop/
 
Title Pigment Channel 
Description A VR experience. 
Type Of Art Artistic/Creative Exhibition 
Year Produced 2018 
Impact No notable impact yet, but this product demonstrated the effectiveness of my sound technology for creating spatial sound experiences in the context of VR. 
URL http://patrickmorgan.co.uk/v-a-project.html
 
Description During his visit to UC Berkeley, the PI was hosted by Prof. Yu, in the Statistical Machine Learning group. He was focusing on the topics of the current activity of Prof. Yu's group, that can be broadly described as statistical machine learning theory, methodologies, and algorithms for solving high-dimensional data problems. Simultaneously, the PI was visiting weekly the speech group at ICSI, to keep abreast with developments in speech recognition; he and Prof. Yu had had a previous collaboration on speech recognition which they planned to continue and expand.

At SRI, the PI was investigating ways in which the approach to robust automatic speech recognition, which he and his CoI developed within the project funded through EPSRC award EP/D053005/1, could be brought closer to practice, and based on that set forth the directions of long-term collaboration and possible joint grant proposals between SRI, King's College London, and UC Berkeley.

At CCRMA, the PI was working on multichannel audio technologies, and signal processing problems in audio and acoustics. The particular problem of initial focus were digital waveguide networks (DWN) for ultra fast real time rendition of multichannel audio. At CCRMA there are also several other ongoing projects which are closely related to PI's research, and these provided inspiration for collaborative research at the interface of (audio) signal processing and the humanities (e.g. reconstruction of acoustic spaces of historical venues), or music (composition with 3D sound effects), or even neuroscience (understanding neural mechanisms governing music perception).

The visit to UC Berkeley was very beneficial in terms of enabling the PI to gain a wider perspective, and where needed
in-depth knowledge, of state-of-the-art developments in statistical machine learning relevant to his work. The visit to UC Berkeley, including ICSI, and the visit to SRI, enabled setting forth directions of collaborative research on robust speech recognition. A grant proposal in this domain, with UC Berkeley and SRI as project partners, committing considerable resources, has been submitted to EPSRC.

Tangible outputs of the visit to Stanford, include a collaborative journal paper which will be submitted to IEEE Transactions on Audio, Speech, and Language Processing by the end of 2014, and a joint tutorial on multichannel surround sound systems, to be presented at ICASSP 2015.
Exploitation Route The purpose of this award was to support PI's visits to UC Berkeley, SRI, and Stanford. The project is not expected to produce any findings, but rather aims to enable the PI to learn new techniques, and facilitate collaborations with international centres of excellence. These aims have been accomplished, and the collaborative research facilitated by this grant is likely to have impact several sectors, as indicated below.
Sectors Aerospace, Defence and Marine,Creative Economy,Digital/Communication/Information Technologies (including Software),Electronics,Healthcare,Leisure Activities, including Sports, Recreation and Tourism,Culture, Heritage, Museums and Collections,Security and Diplomacy

 
Description This award was an Overseas Travel Grant. It's aim was to support visits of the PI to: • Department of Statistics, University of California, Berkeley, for six months. • Speech Technology & Research Laboratory, SRI International, for one month. • Center for Computer Research in Music and Acoustics, Stanford University, for three months. The objectives of these visits were to enable the PI to: • establish and develop long-term collaborations with internationally leading centres of excellence, •have a concentrated activity on studying new techniques at the interface of signal processing, statistical inference and machine learning, • broaden the scope of his current research developed with recent EPSRC support and bring its results closer to practice, • accelerate commercial exploitation of the intellectual property generated during EPSRC supported projects. Each of the individual visits is intended to accomplish two or more of these objectives. Considering the nature of the project, there are no key scientific findings, but all objectives have been met. An account of specific accomplishments is provided under section: RCUK Key Findings.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Cultural

 
Description Cultural Institute Award
Amount £25,800 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 06/2016 
End 06/2017
 
Description Cultural Institute Award
Amount £8,708 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 02/2018 
End 06/2018
 
Description Impact Acceleration Award
Amount £6,000 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 11/2015 
End 06/2016
 
Description Impact Acceleration Award
Amount £38,548 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 03/2018 
End 10/2018
 
Description Impact Acceleration Award Rapid
Amount £10,000 (GBP)
Organisation King's College London 
Sector Academic/University
Country United Kingdom
Start 02/2018 
End 06/2018
 
Description Responsive Mode
Amount £1,402,097 (GBP)
Funding ID EP/R012067/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 04/2018 
End 03/2021
 
Title Sound Spatialisation Software 
Description Software for dynamic spatialisation of sound sources in a dynamically changing environment, e.g. rendition of a VR audio content, that is compatible with multichannel and binaural rendering. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? No  
Impact No impact has been generated yet, but the tool will be useful in psychoacoustics and audiology research where it pertains to spatial hearing, and is expected to be evolve into a commercial software for immersive audio content creation for applications such as AR/VR and professional music mixing. 
 
Description 59 Productions 
Organisation 59 Productions
Country United Kingdom 
Sector Private 
PI Contribution Soundscape design for a multimedia art installation centred around a performance of pianist Yuja Wang.
Collaborator Contribution Design and production of a multimedia art installation centred around a performance of pianist Yuja Wang.
Impact A multimedia art installation centred around a performance of pianist Yuja Wang.
Start Year 2016
 
Description Edinburgh 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Intellectual input.
Collaborator Contribution Intellectual input.
Impact Joint EPSRC project.
Start Year 2016
 
Description Fidelio Arts 
Organisation Fidelio Arts Ltd
Country United Kingdom 
Sector Private 
PI Contribution Soundscape design for a multimedia art installation centred around a performance of pianist Yuja Wang.
Collaborator Contribution Organisation and management of the project, multimedia art installation centred around a performance of pianist Yuja Wang, including time of the pianist.
Impact A multimedia art installation centred around performance of Yuja Wang, a pianist represented by Fidelio Arts, presently one of leading classical pianists.
Start Year 2016
 
Description Institute of Sound Recording, University of Surrey 
Organisation University of Surrey
Country United Kingdom 
Sector Academic/University 
PI Contribution Expertise, intellectual input.
Collaborator Contribution Expertise, intellectual input.
Impact Joint publications, grant proposal, and further development and deployment of the audio technology developed with the relevant EPSRC project in art projects and installations.
Start Year 2016
 
Description METU 
Organisation Middle East Technical University
Department Institute of Marine Sciences
Country Turkey 
Sector Academic/University 
PI Contribution Expertise, intellectual.
Collaborator Contribution Expertise, intellectual.
Impact Joint publications. Development of the audio technology developed on the relevant EPSRC project and its deployment in art projects and installations.
Start Year 2012
 
Description SRI 
Organisation SRI International (inc)
Country United States 
Sector Charity/Non Profit 
PI Contribution Exchange of ideas and technical discussions.
Collaborator Contribution Exchange of ideas and technical discussions. They were co-sponsoring one visit of Prof Cvetkovic in 2012, and they were hosting him for 4 months (full or part time) at the lab in 2014.
Impact We formulated a grant proposal, submitted to EPSRC, with SRI as a formal partner. It is a multidisciplinary project involving signal processing, statistics, and machine learning, applied to a problem in speech technologies.
Start Year 2012
 
Description Stanford 
Organisation Stanford University
Country United States 
Sector Academic/University 
PI Contribution Collaborative research.
Collaborator Contribution Collaborative research.
Impact A joint tutorial on multichannel surround systems, to be presented at ICASSP 2015. A joint paper to be submitted to AT&T Transactions on Audio, Speech, and Language Processing.
Start Year 2013
 
Description UC Berkeley 
Organisation University of California, Berkeley
Country United States 
Sector Academic/University 
PI Contribution Collaboration on several joint publications, and on formulating a grant proposal to continue collaboration on robust speech recognition.
Collaborator Contribution Collaboration on several joint publications, and on formulating a grant proposal to continue collaboration on robust speech recognition. They also co-sponsored a visit of Prof. Cvetkovic in 2012, and were hosting him for 7 months in 2013.
Impact Two conference papers, and one journal paper. A grant proposal on robust speech recognition is formulated jointly, in which UC Berkeley appears as a formal partner. It is a collaborative project at the interface between signal processing, statistics, and machine learning, addressing a problem in speech technologies.
Start Year 2007
 
Title SDN iPhone app 
Description The iPhone app aims at delivering the auditory illusion of being in the middle of a virtual rectangular room. This is achieved by means of the scattering delay network (SDN) technology, together with binaural reproduction technique. The app is capable of simulating the acoustics of the room in real time thanks to the extremely low computational complexity of the SDN method, while at the same time delivering important perceptual cues in an accurate manner. The app uses the iPhone gyroscope in order to track the movement of the listener's head and adjusts the simulation accordingly. 
Type Of Technology Webtool/Application 
Year Produced 2015 
Impact The app was sent to several companies to spur their interest in commercial exploitation of the intellectual property arising from relevant EPSRC projects. Dolby has made several visit to King's College and is presently evaluating our technology. 
 
Title Sound Spatialisation Software 
Description Software for dynamic spatialisation of sound sources in a dynamically changing environment, e.g. rendition of a VR audio content, that is compatible with multichannel and binaural rendering. 
Type Of Technology Software 
Year Produced 2017 
Impact No impact has been generated yet, but the software will provide basis for two commercial product prototypes: 1) a VST plugin for professional sound mixing, 2) VR audio plugin for creating audio content in VR environments. 
 
Title Unity asset for audio content creation. 
Description The unity asset implements my audio technology within Unity VR/gaming platform for audio content creation. 
Type Of Technology Software 
Year Produced 2019 
Impact No impacts yet, we are at the final development stages. 
 
Title VST plugin for sound spatialisation 
Description The software implements room acoustics and sound source spatialisation. 
Type Of Technology Software 
Year Produced 2018 
Impact So far several public events, as listed in my portfolio, that use the plugin for content creation. 
 
Description 2015 Summer Science Exhibition of the Royal Society. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Our scattering delay network (SDN) technology was showcased during the 2015 Summer Science Exhibition, the flagship event of the Royal Society for science communication to the public. The event, lasting a week, had an attendance of about 15,000 people, in addition to two gala nights with the fellows of the Royal Society. The demonstration was part of the stand "Sound Scape Interaction in a 3D World" organised by a consortium of european universities led by Imperial College London. The demonstration consisted of a rotating platform called "Sound Hunter".
Visitors wore headphones while standing on the rotating platform and their task was to rotate the platform until a sound source auralised through the headphones was perceived to be in front of them. The SDN was used in cases where users choose to locate the sound source while in a reverberant room.
Year(s) Of Engagement Activity 2015
URL http://sse.royalsociety.org/2015