Visits to University of California, Berkeley, Stanford University, and SRI International

Lead Research Organisation: King's College London

Department Name: Institute of Telecommunications

Abstract

During his visit to the Department of Statistics, University of California, Berkeley, Prof. Cvetkovic will be hosted by Prof. Yu, in the Statistical Machine Learning group. He will be focusing on topics of the current activity of Prof. Yu's group, that can be broadly described as statistical machine learning theory, methodologies, and algorithms for solving high-dimensional data problems. Particular problems covered include sparse modelling (e.g. Lasso, compressed sensing), structured sparsity, analysis and methods for spectral clustering, and applications to data which come from a diverse range of interdisciplinary areas, ranging from neuroscience to social networks. During this visit, Prof. Cvetkovic and Prof. Yu will set forth directions for collaboration on problems in learning in high-dimensions, leading to a research grant proposal.

During his previous EPSRC project, EP/D053005/1, Prof. Cvetkovic in collaboration with Prof. Sollich, Department of Mathematics, King's College London, and Prof. Yu developed an unorthodox approach to robust speech recognition in high-dimensional spaces of acoustic waveforms of speech. Dr. Horacio Franco, Director of Speech Technology and Research Laboratory of SRI International, who in 2010 won a major DARPA award for solving the problem of the sensitivity of automatic speech recognition systems to additive noise, finds this approach groundbreaking and expresses a strong interest in exploring venues for collaboration.
The purpose of this visit would be to investigate ways in which the approach developed by Prof. Cvetkovic and his collaborators can be brought closer to practice and based on that investigate the directions of long-term collaboration and possible joint grant proposals between SRI International, King's College London, and UC Berkeley.

At King's College, Prof. Cvetkovic has commenced work on a new multichannel audio technology, supported by EPSRC grant EP/F001142/1. The project produced a considerable publication volume and patent portfolio. A visit to one of world leading centres for music and acoustics technologies, such as CCRMA, would be very beneficial for taking advantage of this gained momentum to penetrate the field, which is still a new application area for Prof. Cvetkovic, at a deeper level, expand its scope, establish collaborations, and inform future grant proposals. At CCRMA, Prof. Cvetkovic will be interacting primarily with Prof. Julius Smith, working on multichannel audio technologies, and other signal processing problems in audio and acoustics. A recent work of Prof. Cvetkovic complements a large volume of work of Prof. Smith on ultra fast rendition of multichannel audio using digital waveguide networks (DWNs). This is an area which is of a significant academic interest, requiring interdisciplinary approaches at the interface of signal processing, acoustics, psychoacoustic, and computer science, as well of a great relevance to virtual reality and gaming applications. While this would be the area of initial focus, at CCRMA there are several other ongoing projects which are closely related to Prof. Cvetkovic's research or research in the Institute of Telecommunications at King's (Mobile Phone Orchestra, Sound in Space, Music in Virtual Worlds), as well as projects which could provide valuable inspiration for possible collaborative projects between the Department of Music and the Institute of Telecommunications at King's and CCRMA (Sound Waves on the Internet for Real-time Echoes, and the Historical Recordings). Finally, most of the largest companies which are potential licensees of Prof. Cvetkovic'c audio technology, such as DTS, Dolby, Microsoft, are based on the west coast of the US. The presence of Prof. Cvetkovic at CCRMA would accelerate the exploration of licensing opportunities, as these and other relevant companies frequently visit CCRMA, and are situated in the Bay Area or not too far from it.

Planned Impact

Visit to the Department of Statistics, University of California, Berkeley

The main purpose of the visit of Prof. Cvetkovic to the Department of Statistics, University of California, Berkeley, is to study emerging information sciences techniques and thus facilitate his engagement in cutting edge research in this field. His subsequent research in this field is meant to have a significant theoretical component, and immediate beneficiaries of this work would be other researchers in signal processing, and statistics and applied probability -- two areas which EPSRC intends to increase its investments in.

The techniques for statistical inference and kernel methods which Prof. Cvetkovic will be focusing on at Berkeley are applicable to a broad and diverse types of data, from social networks to neuroscience, so many segments of science, industry and society would benefit from this work. One particular application which will be considered is cardiac arrhythmia detection and classification. The ultimate benefit of this work would be improved healthcare and societal well being, as ventricular fibrillation (VF) is a leading cause of death in the western world and existing methods for VF detection are not sufficiently reliable. Other beneficiaries include pharmaceutical companies which develop medications for treating different forms of arrhythmias, then biomedical equipment companies which manufacture defibrillators, and finally other researchers in biomedical signal processing, and pharmacology.

Another application which will be considered is automatic speech recognition. The approach developed by Prof. Cvetkovic and Prof. Sollich in collaboration with Prof. Yu is still very novel and original, so results of this work will be of interest to the research community working on speech recognition. Beneficiaries of practical speech recognition systems built around these ideas are discussed in the following.

Visit to Speech Technology and Research (STAR) Laboratory, SRI International

Automatic speech recognition plays an important role in a wide variety of applications, ranging from collecting military intelligence, through assisted living and medical record transcription, to various customer service systems. Beneficiaries therefore include the military, healthcare, and service industries. Reliable and accurate automatic speech recognition systems contribute towards improving national security, providing better healthcare and reducing its cost, and making the IT infrastructure function seamlessly while appearing invisible. This work therefore addresses several important challenges within Digital Economy, Healthcare, and Global Uncertainties themes, as defined by EPSRC.

Visit to the Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

The first and immediate line of beneficiaries of the work done during the visit to CCRMA are researches the fields of signal processing, music and acoustics technologies. Then companies producing systems for multichannel audio, including home stereo systems, game devices, and sound mixing consoles, make another line of direct beneficiaries. Through their products, people working in creative industries which involve sound recording, production and reproduction, will also benefit from this work. Finally, the ultimate beneficiary is the general public which will be able to enjoy superior audio quality at a lower price. Thus, the proposed activity will advance relevant science, it will lead to technological developments, which would have a positive impact on industry and commerce, and it will contribute towards enhancing the quality of life of general public.

Funded Value:

£21,058

Funded Period:

Mar 13 - Mar 14

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/K034626/1

Principal Investigator:

Zoran Cvetkovic

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Digital Signal Processing (100%)

Organisations

People	ORCID iD
Zoran Cvetkovic (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

De Sena E (2015) Efficient Synthesis of Room Acoustics via Scattering Delay Networks in IEEE/ACM Transactions on Audio, Speech, and Language Processing

De Sena E (2020) Localization Uncertainty in Time-Amplitude Stereophonic Reproduction in IEEE/ACM Transactions on Audio, Speech, and Language Processing

Erdem, E. (2019) PERCEPTUAL SOUNDFIELD RECONSTRUCTION IN THREE DIMENSIONS VIA SOUND FIELD EXTRAPOLATION

Hacihabiboglu H (2017) Perceptual Spatial Audio Recording, Simulation, and Rendering: An overview of spatial-audio techniques based on psychoacoustics in IEEE Signal Processing Magazine

Artistic and Creative Products
Key Findings
Impact Summary
Further Funding
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Title	Circular Breathing
Description	A live performance by Reeps One at Somerset House.
Type Of Art	Performance (Music, Dance, Drama, etc)
Year Produced	2018
Impact	There were no notable impacts. The performance was a pilot performance of Reeps One that uses our software for live sound spatialisation, which demonstrated that our technology is robust and provides unprecedented control of real time soundscape designs.


Title	Ouroboros
Description	An immersive 3D audio-visual installation
Type Of Art	Composition/Score
Year Produced	2017
Impact	Demonstration and first public display of the audio technology created on the projects funded by the associated awards.
URL	http://pantar.com/portfolio/ouroboros/


Title	Ouroboros
Description	Audio visual poem by Ali Hossaini
Type Of Art	Artistic/Creative Exhibition
Year Produced	2017
Impact	Installation at Guildhall Art Gallery, June - July 2018.


Title	Philosophy Shop
Description	Immersive avant garde play at RADA that uses our VST plugin for sound spatialisation.
Type Of Art	Performance (Music, Dance, Drama, etc)
Year Produced	2019
Impact	There are no notable impacts yet, but this play demonstrates the robustness and effectiveness of my sound technology in sound design in theatre settings.
URL	https://www.rada.ac.uk/whats-on/the-philosophy-shop/


Title	Pigment Channel
Description	A VR experience.
Type Of Art	Artistic/Creative Exhibition
Year Produced	2018
Impact	No notable impact yet, but this product demonstrated the effectiveness of my sound technology for creating spatial sound experiences in the context of VR.
URL	http://patrickmorgan.co.uk/v-a-project.html


Description	During his visit to UC Berkeley, the PI was hosted by Prof. Yu, in the Statistical Machine Learning group. He was focusing on the topics of the current activity of Prof. Yu's group, that can be broadly described as statistical machine learning theory, methodologies, and algorithms for solving high-dimensional data problems. Simultaneously, the PI was visiting weekly the speech group at ICSI, to keep abreast with developments in speech recognition; he and Prof. Yu had had a previous collaboration on speech recognition which they planned to continue and expand. At SRI, the PI was investigating ways in which the approach to robust automatic speech recognition, which he and his CoI developed within the project funded through EPSRC award EP/D053005/1, could be brought closer to practice, and based on that set forth the directions of long-term collaboration and possible joint grant proposals between SRI, King's College London, and UC Berkeley. At CCRMA, the PI was working on multichannel audio technologies, and signal processing problems in audio and acoustics. The particular problem of initial focus were digital waveguide networks (DWN) for ultra fast real time rendition of multichannel audio. At CCRMA there are also several other ongoing projects which are closely related to PI's research, and these provided inspiration for collaborative research at the interface of (audio) signal processing and the humanities (e.g. reconstruction of acoustic spaces of historical venues), or music (composition with 3D sound effects), or even neuroscience (understanding neural mechanisms governing music perception). The visit to UC Berkeley was very beneficial in terms of enabling the PI to gain a wider perspective, and where needed in-depth knowledge, of state-of-the-art developments in statistical machine learning relevant to his work. The visit to UC Berkeley, including ICSI, and the visit to SRI, enabled setting forth directions of collaborative research on robust speech recognition. A grant proposal in this domain, with UC Berkeley and SRI as project partners, committing considerable resources, has been submitted to EPSRC. Tangible outputs of the visit to Stanford, include a collaborative journal paper which will be submitted to IEEE Transactions on Audio, Speech, and Language Processing by the end of 2014, and a joint tutorial on multichannel surround sound systems, to be presented at ICASSP 2015.
Exploitation Route	The purpose of this award was to support PI's visits to UC Berkeley, SRI, and Stanford. The project is not expected to produce any findings, but rather aims to enable the PI to learn new techniques, and facilitate collaborations with international centres of excellence. These aims have been accomplished, and the collaborative research facilitated by this grant is likely to have impact several sectors, as indicated below.
Sectors	Aerospace Defence and Marine Creative Economy Digital/Communication/Information Technologies (including Software) Electronics Healthcare Leisure Activities including Sports Recreation and Tourism Culture Heritage Museums and Collections Security and Diplomacy


Description	This award was an Overseas Travel Grant. It's aim was to support visits of the PI to: • Department of Statistics, University of California, Berkeley, for six months. • Speech Technology & Research Laboratory, SRI International, for one month. • Center for Computer Research in Music and Acoustics, Stanford University, for three months. The objectives of these visits were to enable the PI to: • establish and develop long-term collaborations with internationally leading centres of excellence, •have a concentrated activity on studying new techniques at the interface of signal processing, statistical inference and machine learning, • broaden the scope of his current research developed with recent EPSRC support and bring its results closer to practice, • accelerate commercial exploitation of the intellectual property generated during EPSRC supported projects. Each of the individual visits is intended to accomplish two or more of these objectives. Considering the nature of the project, there are no key scientific findings, but all objectives have been met. An account of specific accomplishments is provided under section: RCUK Key Findings.
First Year Of Impact	2017
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Cultural


Description	Cultural Institute Award
Amount	£25,800 (GBP)
Organisation	King's College London
Sector	Academic/University
Country	United Kingdom
Start	05/2016
End	06/2017


Description	Cultural Institute Award
Amount	£8,708 (GBP)
Organisation	King's College London
Sector	Academic/University
Country	United Kingdom
Start	02/2018
End	06/2018


Description	Impact Acceleration Award
Amount	£6,000 (GBP)
Organisation	King's College London
Sector	Academic/University
Country	United Kingdom
Start	11/2015
End	06/2016


Description	Impact Acceleration Award
Amount	£38,548 (GBP)
Organisation	King's College London
Sector	Academic/University
Country	United Kingdom
Start	03/2018
End	10/2018


Description	Impact Acceleration Award Rapid
Amount	£10,000 (GBP)
Organisation	King's College London
Sector	Academic/University
Country	United Kingdom
Start	02/2018
End	06/2018


Description	Responsive Mode
Amount	£1,402,097 (GBP)
Funding ID	EP/R012067/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	03/2018
End	03/2021


Title	Sound Spatialisation Software
Description	Software for dynamic spatialisation of sound sources in a dynamically changing environment, e.g. rendition of a VR audio content, that is compatible with multichannel and binaural rendering.
Type Of Material	Improvements to research infrastructure
Year Produced	2017
Provided To Others?	No
Impact	No impact has been generated yet, but the tool will be useful in psychoacoustics and audiology research where it pertains to spatial hearing, and is expected to be evolve into a commercial software for immersive audio content creation for applications such as AR/VR and professional music mixing.


Description	59 Productions
Organisation	59 Productions
Country	United Kingdom
Sector	Private
PI Contribution	Soundscape design for a multimedia art installation centred around a performance of pianist Yuja Wang.
Collaborator Contribution	Design and production of a multimedia art installation centred around a performance of pianist Yuja Wang.
Impact	A multimedia art installation centred around a performance of pianist Yuja Wang.
Start Year	2016


Description	Edinburgh
Organisation	University of Edinburgh
Country	United Kingdom
Sector	Academic/University
PI Contribution	Intellectual input.
Collaborator Contribution	Intellectual input.
Impact	Joint EPSRC project.
Start Year	2016


Description	Fidelio Arts
Organisation	Fidelio Arts Ltd
Country	United Kingdom
Sector	Private
PI Contribution	Soundscape design for a multimedia art installation centred around a performance of pianist Yuja Wang.
Collaborator Contribution	Organisation and management of the project, multimedia art installation centred around a performance of pianist Yuja Wang, including time of the pianist.
Impact	A multimedia art installation centred around performance of Yuja Wang, a pianist represented by Fidelio Arts, presently one of leading classical pianists.
Start Year	2016


Description	Institute of Sound Recording, University of Surrey
Organisation	University of Surrey
Country	United Kingdom
Sector	Academic/University
PI Contribution	Expertise, intellectual input.
Collaborator Contribution	Expertise, intellectual input.
Impact	Joint publications, grant proposal, and further development and deployment of the audio technology developed with the relevant EPSRC project in art projects and installations.
Start Year	2016


Description	METU
Organisation	Middle East Technical University
Department	Institute of Marine Sciences
Country	Turkey
Sector	Academic/University
PI Contribution	Expertise, intellectual.
Collaborator Contribution	Expertise, intellectual.
Impact	Joint publications. Development of the audio technology developed on the relevant EPSRC project and its deployment in art projects and installations.
Start Year	2012


Description	SRI
Organisation	SRI International (inc)
Country	United States
Sector	Charity/Non Profit
PI Contribution	Exchange of ideas and technical discussions.
Collaborator Contribution	Exchange of ideas and technical discussions. They were co-sponsoring one visit of Prof Cvetkovic in 2012, and they were hosting him for 4 months (full or part time) at the lab in 2014.
Impact	We formulated a grant proposal, submitted to EPSRC, with SRI as a formal partner. It is a multidisciplinary project involving signal processing, statistics, and machine learning, applied to a problem in speech technologies.
Start Year	2012


Description	Stanford
Organisation	Stanford University
Country	United States
Sector	Academic/University
PI Contribution	Collaborative research.
Collaborator Contribution	Collaborative research.
Impact	A joint tutorial on multichannel surround systems, to be presented at ICASSP 2015. A joint paper to be submitted to AT&T Transactions on Audio, Speech, and Language Processing.
Start Year	2013


Description	UC Berkeley
Organisation	University of California, Berkeley
Country	United States
Sector	Academic/University
PI Contribution	Collaboration on several joint publications, and on formulating a grant proposal to continue collaboration on robust speech recognition.
Collaborator Contribution	Collaboration on several joint publications, and on formulating a grant proposal to continue collaboration on robust speech recognition. They also co-sponsored a visit of Prof. Cvetkovic in 2012, and were hosting him for 7 months in 2013.
Impact	Two conference papers, and one journal paper. A grant proposal on robust speech recognition is formulated jointly, in which UC Berkeley appears as a formal partner. It is a collaborative project at the interface between signal processing, statistics, and machine learning, addressing a problem in speech technologies.
Start Year	2007


Title	SDN iPhone app
Description	The iPhone app aims at delivering the auditory illusion of being in the middle of a virtual rectangular room. This is achieved by means of the scattering delay network (SDN) technology, together with binaural reproduction technique. The app is capable of simulating the acoustics of the room in real time thanks to the extremely low computational complexity of the SDN method, while at the same time delivering important perceptual cues in an accurate manner. The app uses the iPhone gyroscope in order to track the movement of the listener's head and adjusts the simulation accordingly.
Type Of Technology	Webtool/Application
Year Produced	2015
Impact	The app was sent to several companies to spur their interest in commercial exploitation of the intellectual property arising from relevant EPSRC projects. Dolby has made several visit to King's College and is presently evaluating our technology.


Title	Sound Spatialisation Software
Description	Software for dynamic spatialisation of sound sources in a dynamically changing environment, e.g. rendition of a VR audio content, that is compatible with multichannel and binaural rendering.
Type Of Technology	Software
Year Produced	2017
Impact	No impact has been generated yet, but the software will provide basis for two commercial product prototypes: 1) a VST plugin for professional sound mixing, 2) VR audio plugin for creating audio content in VR environments.


Title	Unity asset for audio content creation.
Description	The unity asset implements my audio technology within Unity VR/gaming platform for audio content creation.
Type Of Technology	Software
Year Produced	2019
Impact	No impacts yet, we are at the final development stages.


Title	VST plugin for sound spatialisation
Description	The software implements room acoustics and sound source spatialisation.
Type Of Technology	Software
Year Produced	2018
Impact	So far several public events, as listed in my portfolio, that use the plugin for content creation.


Description	2015 Summer Science Exhibition of the Royal Society.
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Our scattering delay network (SDN) technology was showcased during the 2015 Summer Science Exhibition, the flagship event of the Royal Society for science communication to the public. The event, lasting a week, had an attendance of about 15,000 people, in addition to two gala nights with the fellows of the Royal Society. The demonstration was part of the stand "Sound Scape Interaction in a 3D World" organised by a consortium of european universities led by Imperial College London. The demonstration consisted of a rotating platform called "Sound Hunter". Visitors wore headphones while standing on the rotating platform and their task was to rotate the platform until a sound source auralised through the headphones was perceived to be in front of them. The SDN was used in cases where users choose to locate the sound source while in a reverberant room.
Year(s) Of Engagement Activity	2015
URL	http://sse.royalsociety.org/2015

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications