Towards visually-driven speech enhancement for cognitively-inspired multi-modal hearing-aid devices (AV-COGHEAR)

Lead Research Organisation: University of Stirling
Department Name: Computing Science and Mathematics

Abstract

Current commercial hearing aids use a number of sophisticated enhancement techniques to try and improve the quality of speech signals. However, today's best aids fail to work well in many everyday situations. In particular, they fail in busy social situations where there are many competing speech sources; they fail if the speaker is too far from the listener and swamped by noise. We have identified an opportunity to solve this problem by building hearing aids that can 'see'.

This ambitious project aims to develop a new generation of hearing aid technology that extracts speech from noise by using a camera to see what the talker is saying. The wearer of the device will be able to focus their hearing on a target talker and the device will filter out competing sound. This ability, which is beyond that of current technology, has the potential to improve the quality of life of the millions suffering from hearing loss (over 10m in the UK alone).

Our approach is consistent with normal hearing. Listeners naturally combine information from both their ears and eyes: we use our eyes to help us hear. When listening to speech, eyes follow the movements of the face and mouth and a sophisticated, multi-stage process uses this information to separate speech from the noise and fill in any gaps. Our hearing aid will act in much the same way. It will exploit visual information from a camera (e.g.using a Google Glass like system), and novel algorithms for intelligently combining audio and visual information, in order to improve speech quality and intelligibility in real-world noisy environments.

The project is bringing together a critical mass of researchers with the complementary expertise necessary to make the audio-visual hearing-aid possible. The project will combine new contrasting approaches to audio-visual speech enhancement that have been developed by the Cognitive Computing group at Stirling and the Speech and Hearing Group at Sheffield. The Stirling approach uses the visual signal to filter out noise; whereas the Sheffield approach uses the visual signal to fill in 'gaps' in the speech. The vision processing needed to track a speaker's lip and face movement will use a revolutionary 'bar code' representation developed by the Psychology Division at Stirling. The MRC Institute of Hearing Research (IHR) will provide the expertise needed to evaluate the approach on real hearing loss sufferers. Phonak AG, a leading international hearing aid manufacturer, will provide the advice and guidance necessary to maximise potential for industrial impact.

The project has been designed as a series of four workpackages that consider the key research challenges related to each component of the device's design. These questions have been identified by preliminary work at Sheffield and Stirling. Among the challenges are developing improved techniques for visually-driven audio-analysis; designing better metrics for weighting audio and visual evidence; developing techniques for optimally combining the noise-filtering and gap-filling approaches. A further key challenge is that, for a hearing aid to be effective, the processing cannot delay the signal by more than 10ms.

In the final year of the project a full integrated, software prototype will be clinically evaluated using listening tests with hearing-impaired volunteers in a range of modern noisy reverberant environments. Evaluation will use a new purpose-built speech corpus that will be designed specifically for testing this new class of multimodal device. The project's clinical research partner, the Scottish Section of MRC IHR, will provide advice on the experimental design and analysis aspects throughout the trials. Industry leader Phonak AG will provide advice and technical support for benchmarking real-time hearing devices. The final clinically-tested prototype will be made available to the whole hearing community as a testbed for further research, development, evaluation and benchmarking.

Planned Impact

This mufti-disciplinary project has been designed to have impact beyond the academic environment:

*Sufferers of hearing loss*

The aim of the proposal is to demonstrate a totally new class of hearing device that, by using visual input, is able to deliver an unparalleled level of speech intelligibility in noisy situations where current audio-only hearing aids are known to fail. The proposal, by supplying the enabling research for this technology, has potential for significant long term societal impact. Reduced ability to understand speech in noise is one of the most debilitating symptoms of hearing loss. An effective hearing device would improve the quality of life of millions of hearing loss suffers (over 10m in the UK alone [1], receiving around £300M of treatment from the NHS annually [2]). Notably even mild age-related hearing loss (something which effects us all) can cause speech to become hard to understand in situations where many people are speaking at the same time (e.g., social gatherings); or where speech is heard from a distance and degraded by reverberation (e.g., classrooms). Even a small improvement in performance could be the difference that allows someone to continue in their job (e.g., a teacher in a noisy classroom) or to remain socially active, avoiding potential isolation and depression. Note that as the device will work by complementing the visual processing performed routinely in speech perception, it will be of particular benefit to hearing loss suffers who are also visually impaired.

*The hearing aid industry*

A new class of audio-visual (AV) hearing aid would have impact on the hearing aid industry itself: demand for AV aids would rapidly displace inferior audio-only devices. There are clear precedents for hearing science rapidly transforming hearing technology, e.g., multiple microphone processing and frequency compression have been commercialised to great effect. We foresee AV processing as the next step forward. Previous barriers to AV processing are falling: reliable wireless technology frees the computation from having to be performed on the device itself; wearable computing devices are becoming sufficiently powerful to perform real-time face tracking and feature extraction. AV aids will also impact on industry standards for hearing aid evaluation and clinical standards for hearing loss assessment. Plans for realising these industrial impacts (including through an International Workshop, AV hearing device Challenge/Competition and open-source dissemination) are detailed in the Pathways to Impact document.

*Applications beyond hearing aids*

The project has potential impact in other speech processing applications, including,

-Cochlear implants (CI). CI users have even more severe problems coping with noise. With further research the technologies we are proposing could be used directly in CI signal processing.

-Telecommunications. Here we imagine video signals captured at the transmission-end being used to filter and enhance acoustic signals arriving at the receiver-end. Note, this could be useful either in teleconferencing, or built into conventional audio-only receivers, or for people with visual impairment who are unable to see visual cues directly.

-Speech-enhancement for normal hearing. AV speech intelligibility enhancement may be useful for users with no hearing loss in certain situations, e.g., in situations where ear defenders are being worn - factories, emergency response, military, etc.

[1] http://www.patient.co.uk/doctor/deafness-in-adults
[2] http://www.publications.parliament.uk/pa/cm201415/cmhansrd/cm140624/text/140624w0002.htm#140624w0002.htm_wqn4

Publications

10 25 50
 
Title The Grid Audio-Visual Lombard Speech Corpus 
Description Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint computational/behavioral studies in speech perception. The corpus includes 54 talkers, with 100 utterances per talker (50 Lombard and 50 plain utterances). This dataset follows the same sentence format as the audiovisual Grid corpus, and can thus be considered as an extension of that corpus. The sentence sets used in the Lombard Grid corpus are unique, however, and have not been utilized by the Grid corpus. It offers two synchronised views of the talkers (front and side) to facilitate analysis of speech from different angles. A bespoke head-mounted camera system was used to collect both front and profile views of the talkers. Statistics: 54 talkers: 30 female talkers and 24 male talkers; 5,400 (audio, front video and side video) utterances (16,200 files in total): 50% Lombard utterances, 50% plain reference utterances. The dataset is described in detail in the paper, Najwa Alghamdi, Steve Maddock, Ricard Marxer, Jon Barker and Guy J. Brown,, "A corpus of audio-visual Lombard speech with frontal and profile views", The Journal of the Acoustical Society of America 143, El523 (2018) The paper is available online at White Rose Online Research. ------------------------------------------------------------------------------------ Notes on Filenaming Filename format SPKR_COND_UTTERANCE.wav|.mov - e.g., s8_p_sbbi9p.wav *SPKR = s1 to s55 *COND = l or p, where l=> Lombard, p=> plain (i.e. non-Lombard) *UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again' Metadata format *SPKR = s1 to s55 *SESSION = 1 or 2 *INDEX = 1 to 10 for ordering of the recording blocks *SUBINDEX = 1 to 10 for ordering of utterance in a 10-utterance block. *COND = l or r, where l=> Lombard, p=> plain (i.e. non-Lombard) *UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again' If a sentence is spoken incorrectly then the filename will be _WRONG.wav e.g. s8_2_38_8_r_lrwizp_WRONG_lrbizp.wav *TRANS = the Grid utterance code for what was actually said. 
Type Of Art Film/Video/Animation 
Year Produced 2018 
URL https://zenodo.org/record/3736464
 
Title The Grid Audio-Visual Lombard Speech Corpus 
Description Lombard Grid is a bi-view audiovisual Lombard speech corpus which can be used to support joint computational/behavioral studies in speech perception. The corpus includes 54 talkers, with 100 utterances per talker (50 Lombard and 50 plain utterances). This dataset follows the same sentence format as the audiovisual Grid corpus, and can thus be considered as an extension of that corpus. The sentence sets used in the Lombard Grid corpus are unique, however, and have not been utilized by the Grid corpus. It offers two synchronised views of the talkers (front and side) to facilitate analysis of speech from different angles. A bespoke head-mounted camera system was used to collect both front and profile views of the talkers. Statistics: 54 talkers: 30 female talkers and 24 male talkers; 5,400 (audio, front video and side video) utterances (16,200 files in total): 50% Lombard utterances, 50% plain reference utterances. The dataset is described in detail in the paper, Najwa Alghamdi, Steve Maddock, Ricard Marxer, Jon Barker and Guy J. Brown,, "A corpus of audio-visual Lombard speech with frontal and profile views", The Journal of the Acoustical Society of America 143, El523 (2018) The paper is available online at White Rose Online Research. ------------------------------------------------------------------------------------ Notes on Filenaming Filename format SPKR_COND_UTTERANCE.wav|.mov - e.g., s8_p_sbbi9p.wav *SPKR = s1 to s55 *COND = l or p, where l=> Lombard, p=> plain (i.e. non-Lombard) *UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again' Metadata format *SPKR = s1 to s55 *SESSION = 1 or 2 *INDEX = 1 to 10 for ordering of the recording blocks *SUBINDEX = 1 to 10 for ordering of utterance in a 10-utterance block. *COND = l or r, where l=> Lombard, p=> plain (i.e. non-Lombard) *UTTERANCE = 6-character Grid utterance code, e.g. 'pgag6a' which means 'place green at g 6 again' If a sentence is spoken incorrectly then the filename will be _WRONG.wav e.g. s8_2_38_8_r_lrwizp_WRONG_lrbizp.wav *TRANS = the Grid utterance code for what was actually said. 
Type Of Art Film/Video/Animation 
Year Produced 2018 
URL https://zenodo.org/record/3736465
 
Description As part of WP1 (multimodal feature extraction), WP2 (cognitively inspired multimodal speech modelling), and WP3 (intelligent process and filter selection), we have successfully demonstrated our project hypothesis. Specifically, our preliminary findings show it is possible to combine visual and acoustic inputs to produce a multimodal hearing device, that is able to significantly boost speech intelligibility in everyday listening environments, in which traditional audio-only hearing devices prove ineffective. We have developed and validated a novel lip-reading driven, audio-visual (AV) speech enhancement system [1], that significantly outperforms benchmark audio-only approaches at low signal-to-noise ratios (SNRs). However, consistent with our cognitive hypothesis, visual cues were found to be relatively less effective for speech enhancement at high SNRs, or in low-levels of background noise. Thus, we further extended [1] and developed a more cognitively-inspired, context-aware AV approach [2] that contextually utilises visual and noisy audio features, more effectively accounting for different noisy conditions. Our innovative approach estimates clean audio features, without requiring any prior SNR estimation, which significantly reduces computational complexity.

Our proposed AV enhancement approach integrates a convolutional neural network (CNN) and a long-short-term memory (LSTM) network, that learn to switch between visual-only (V-only), audio-only (A-only), and both AV cues at low, high and moderate SNR levels, respectively. For the AV switching module, we developed an enhanced, visually-derived Wiener filter (EVWF) for noisy-speech filtering, and carried out its evaluation under dynamic real-world scenarios (including cafe, street, bus and pedestrian) at different SNR levels. The latter range from low (-12dB) to high (12dB) SNRs, using benchmark Grid and CHiME3 corpora. For objective testing, perceptual evaluation of speech quality (PESQ) is used to evaluate the quality of the restored speech. For subjective testing, the standard mean-opinion-score (MOS) method is used. Comparative experimental results demonstrated the superior performance of our context-aware AV approach, over a range of benchmark methods, including A-only, V-only, spectral subtraction (SS), and log-minimum mean square error (LMMSE) based speech enhancement methods, at both low and high SNRs. These preliminary findings demonstrate the capability of our proposed AV approach to deal with spectro-temporal variations in any real-world noisy environment, by contextually exploiting the complementary strengths of audio and visual cues. In conclusion, our deep learning-driven AV framework is posited as a benchmark resource for the multi-modal speech processing and machine learning communities.

As part of the project's key joint-objective aimed at combining contrasting approaches to speech enhancement, previously developed at Stirling (filtering-based) and Sheffield (mask estimation-based) respectively, we have developed a novel deep neural network (DNN) based AV mask estimation model [3][4]. The proposed model contextually integrates temporal dynamics of both audio and noise-immune visual features, for improved mask estimation and speech separation. For optimal AV feature extraction and ideal binary mask (IBM) estimation, a hybrid DNN architecture is exploited to leverage the complementary strengths of a stacked LSTM and convolutional LSTM networks. Comparative simulation results in terms of speech quality and intelligibility demonstrate significant performance improvement of our proposed AV model, compared to benchmark A-only and V-only mask estimation approaches, for the case of both speaker dependent and independent scenarios.

For real-time evaluation and testing challenges, as part of WP4 (full evaluation and testing), we recorded two custom, noisy AV corpora: one with speech spoken in real noise backgrounds [4], and the other with Lombard speech induced by headphone presentation of noise [7]. This has enabled us to test (for the first time) our proposed AV speech processing algorithms (both filtering-based and mask estimation-based) in real noisy environments. Specifically, we employ the benchmark AV Grid and CHiME3 corpora for training our approaches, and our newly-developed, real-noisy AV (ASPIRE) corpus for testing [4]. In addition to PESQ and MOS, clinical tests are conducted using speech recognition threshold (SRT) for quantifying the intelligibility of speech in noise. We conducted pilot testing with 10 subjects, and observed that even without any familiarization with Grid sentences, subjects were able to identify Grid keywords more accurately. For example, the percentage of correctness was found to be around 90% and 100% respectively, at SNRs of -3dB and -6dB, using our visually-driven Wiener filtering approach, with subjects asked to report keywords (e.g., 'Blue', 'F', '2' in Grid utterances). We also conducted tests where subjects were asked to write the whole sentence. In this case too, subjects were able to write keywords more accurately. The obtained results were used to estimate the SRTs/Psychometric functions for five different speech enhancement algorithms (A-only EVWF, V-only EVWF, AV-EVWF, spectral subtraction, and LMMSE). Results are being submitted to top journals and conferences [4][5].

Looking ahead, the multimodal nature of speech presents both opportunities and challenges for hearing and speech researchers. Real-time implementation of future AV hearing-aids will demand high data-rate, low latency, low computational complexity, and high security - which constitute some of the major bottlenecks to successful deployment of multi-modal hearing-aids. To address these challenges, we have proposed a highly innovative framework, integrating 5G Cloud-Radio Access Network (C-RAN), Internet of Things (IoT), and strong privacy algorithms, to fully benefit from the complementary strengths and possibilities offered by these emerging technologies. Specifically, our proposed 5G-IoT enabled, secure AV hearing-aid framework [6] aims to transmit encrypted (compressed) AV information, and receive enhanced (encrypted) speech (reconstructed using our contextual AV algorithms) in real-time. This will effectively address cybersecurity attacks such as location privacy and eavesdropping. For security implementation, a novel real-time, lightweight (chaotic) AV encryption algorithm is developed. To offload computational complexity and address real-time learning and optimization issues, the framework runs deep learning and AV big data optimization processes in the background, in the Cloud. Specifically, to enable real-time speech enhancement, secure AV information in the Cloud is used to filter noisy-audio using our developed AV deep learning, and analytical acoustic modelling approaches. The effectiveness and security of our AV hearing-aid framework is extensively evaluated using widely known security metrics. Critical analysis in terms of both speech enhancement and AV encryption, demonstrate the potential of our proposed technology in acquiring high quality speech reconstruction, and secure, mobile AV hearing-aid communication.

Although our AV-COGHEAR pilot project is currently focussed on software implementation, throughout all stages of proof-of-concept development, we have aimed to deliver causal AV algorithms with potential for future implementation on real systems - thus distinguishing between demands of real-time (which is a function of computational power) and stricter demands of low-latency (<10ms for hearing-aids, which set constraints on window sizes etc). The project's clinical research partner, the Scottish Section of the Institute of Hearing Research (IHR), has continued to provide insightful advice on AV experimental design and intelligibility-analysis aspects, throughout the trials. The industrial partner, Phonak AG, is continuing to provide expert advice and technical support for benchmarking our developed contextual AV algorithms, with relevant state-of-the-art, real-time hearing-device functionality. Collaborative findings are continuing to be submitted [4][5].

In conclusion, we believe our proposed 5G-IoT enabled, secure AV hearing-aid framework is an effective and feasible solution that represents a step change in multimodal hearing aid design. Our ongoing/follow-on work, as part of industry co-funded PhD projects (with collaborators at Edinburgh Napier, Wolverhampton and Sheffield Universities, IHR Glasgow and Phonak/Sonova), is aimed at conducting more extensive evaluation, optimisation and benchmarking, including with other state-of-the-art speech enhancement and lightweight encryption approaches. This will lead to a future, privacy-assured, real-time AV prototype implementation, serving as an integrated open-source testbed for further research, development and commercialisation opportunities.

An invited proposal, building on the outcomes of AV-COGHEAR, was submitted in August 2019 to the EPSRC Transformative Healthcare Technologies 2050 Call, which has been approved following final interviews in Feb 2020. The programme grant is led by Prof Hussain, and is in collaboration with six partner Universities (Edinburgh, Heriot-Watt, Glasgow, Nottingham, Wolverhampton and Manchester), and a strong User Group comprising global hearing manufacturers, Phonak, clinical researchers and end user engagement parters (including Deaf Scotland and Action on Hearing Loss). The programme grant aims to enhance hearing aid uptake by ambitiously developing and implementing real-time software and hardware prototypes of 5G-IoT enabled, personalised, privacy-preserving AV hearing aids, including a transformative feature to reduce end-user cognitive load/listening effort.

Key publications (2018-19) (Total project related publications to-date: 64)

[1] Adeel A., Gogate M., Hussain A. and Whitmer W.M., "A Novel Lip-Reading Driven Deep Learning Approach for Speech Enhancement", IEEE Transactions on Emerging Topics in Computational Intelligence, (accepted), 2019

[2] Adeel A., Gogate M., and Hussain A, "Novel Deep Learning-based Contextual, Multimodal Speech Enhancement in Real-World Environments", (Elsevier) Information Fusion, in press, 2019

[3] Gogate M., Adeel A., Marxer R., Barker J., and Hussain A. "Deep Neural Network-driven Speaker Independent, Audio-Visual Mask Estimation for Speech Separation", In Proceedings of the 19th Annual Conference of the International Speech Communication Association (INTERSPEECH 2018). Hyderabad, India.

[4] Mandar G., Adeel A., Dashtipour K., Marxer R., Barker J., W.M. Whitmer and Hussain A. "Speaker Independent Audio-Visual Mask Estimation for Speech Separation in Real Noisy Environments", The Journal of the Acoustical Society of America, 2019 (in preparation)

[5] Adeel A., Gogate M., Dashtipour K., W.M. Whitmer and Hussain A., "Speech recognition threshold (SRT) based intelligibility assessment of audio-visual speech enhancement algorithms in real-noisy environments", 10th International Conference on Brain Inspired Cognitive Systems (BICS 2019), Springer LNCS/LNAI Proceedings, 2019 (in preparation).

[6] Adeel A., Ahmed J., Larijani H. and Hussain A., "Real-time multi-modal chaotic encryption for next-generation 5G-IoT enabled and Lip-Reading driven Hearing-Aids", (Springer) Cognitive Computation, (accepted), 2019

[7] Alghamdi, N., Maddock, S., Marxer, R., Barker, J. and Brown, G. "A corpus of audio-visual Lombard speech with frontal and profile views". Journal of the Acoustical Society of America, 143 (6). pp. 523-529. 2018

[8] Barker, J., Watanabe, S., Vincent, E. and Trmal, J. The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines. In Proceedings of the 19th Annual Conference of the International Speech Communication Association (INTERSPEECH 2018). Hyderabad, India, 2018
Exploitation Route The innovative AV-COGHEAR concept of contextual multi-modal data fusion and learning is already being exploited in a number of other applications, including natural language processing and cybersecurity. In the latter work, multi-lingual sentiments, emotions and opinions, including deception and sarcasm are contextually extracted from natural language text, audio and visual modalities. Results are being published in leading conferences and journals. Spin-out research grant proposals exploiting this innovative research strand are also being submitted. Examples include a new EPSRC grant proposal (under preparation, led by Edinburgh Napier, with Prof Hussain CI) being submitted to the 2019 Academic Centres of Excellence in Cyber Security Research (ACEs-CSR) Call. This aims to conduct world-class research to develop innovative context-aware and multi-modal data science approaches for addressing priority cyber security challenges of contemporary society. Another newly submitted (£1M) EPSRC grant proposal (led by Edinburgh Napier, with Prof Hussain Lead CI, in collaboration with Glasgow, Exeter, and global industrial partners) aims to exploit contextual learning in IoT networks for enabling smart city applications.

Numerous research papers are continuing to be published in collaboration with national and international researchers, exploiting AV-COGHEAR ideas based on context-aware, cognitive multi-modal analytics, in a range of challenging real-world applications.

A follow-on, large EPSRC programme grant, radically building on the outcomes of AV-COGHEAR, has been funded and is due to start in 2020. This aims to transform the hearing aid landscape in 2050 and beyond, and enable the UK to become a world-leader in multi-modal hearing aid research.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Education,Electronics,Energy,Environment,Financial Services, and Management Consultancy,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Culture, Heritage, Museums and Collections,Retail,Security and Diplomacy,Transport

URL https://cogbid.github.io/cogavhearing/
 
Description The award led to one patent, numerous publications, and successful organization of an International Workshop on "Advances in Hearing Assistive Technologies (CHAT)" as part of the 2017 Interspeech Conference. It also led to renewed interest from global industry manufacturer, Sonova, who provided substantial support to partner a successful follow-on EPSRC programme grant (COG-MHEAR) (Grant ref. EP/T021063/1).
First Year Of Impact 2019
Sector Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Electronics,Healthcare,Government, Democracy and Justice,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Security and Diplomacy,Other
Impact Types Societal,Economic,Policy & public services

 
Description Artificial Intelligence (AI) - powered dashboard for Covid-19 related public sentiment and opinion mining in social media platforms
Amount £135,104 (GBP)
Funding ID COV/NAP/20/07 
Organisation Chief Scientist Office 
Sector Public
Country United Kingdom
Start 05/2020 
End 10/2020
 
Description COG-MHEAR: Towards cognitively-inspired 5G-IoT enabled, multi-modal Hearing Aids
Amount £3,259,000 (GBP)
Funding ID EP/T021063/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 03/2021 
End 02/2025
 
Description Natural Language Generation for Low-resource Domains
Amount £416,848 (GBP)
Funding ID EP/T024917/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 03/2021 
End 02/2024
 
Description Next-generation, cognitively-inspired Big-Data predictive analytics, visualization & gamification driven healthcare Mobile Apps
Amount £12,500 (GBP)
Organisation Digital Health Institute (DHI) 
Sector Private
Country United Kingdom
Start 11/2015 
End 01/2016
 
Description Transforming approaches to improving hearing aid technology
Amount £418,262 (GBP)
Funding ID EP/M026981/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 10/2015 
End 09/2018
 
Title A Novel Enhanced Visually-Derived Wiener Filtering Approach for Speech Enhancement 
Description A novel two-stage enhanced Visually-derived Wiener Filtering (EVWF) approach for speech enhancement has been developed. The first stage utilises a neural network-based data-driven approach to approximate clean audio features using temporal visual-only features (lip reading). The next stage proposes the novel use of inverse filterbank (FB) transformation in place of cubic spline interpolation employed in state-of-the-art visually-derived Wiener Filtering (VWF) approaches. The novel EVWF has demonstrated significantly enhanced capability to estimate clean high-dimensional audio power spectra from low-dimensional visually-derived audio filterbank features, compared to the state-of-the-art VWF. approach 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact The developed technology was further evaluated as part of a follow-on PhD project (led by the AV-COGHEAR PI) in reverberant domestic environments with multiple real-world sound sources (using the benchmark CHiME2 and audio-visual GRID corpora). This further demonstrated that the proposed EVWF method was more reliable compared to both state-of-the-art, visually-derived weiner filtering, and audio-only speech enhancement methods, such as spectral subtraction and log-minimum mean-square error, with significant performance improvements demonstrated in terms of quantitative and qualitative speech enhancement measures. The new technology has been widely used and cited by other researchers as a benchmark resource. 
URL https://ieeexplore.ieee.org/abstract/document/8825842
 
Title Cochleanet: A robust language-independent audio-visual model for speech enhancement 
Description A novel language-, noise- and speaker-independent audio-visual (AV) deep neural network (DNN) architecture, termed CochleaNet, was developed for causal or real-time speech enhancement (SE). The model jointly exploits noisy acoustic cues and noise robust visual cues to focus on the desired speaker and improve speech intelligibility. The proposed SE framework is evaluated using a first of its kind AV binaural speech corpus, ASPIRE, recorded in real noisy environments, including cafeteria and restaurant settings. We demonstrate superior performance of our approach in terms of both objective measures and subjective listening tests, over state-of-the-art SE approaches, including recent DNN based SE models. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Our developed AV speech enhancement approach has been widely used and cited by researchers worldwide as a benchmark resource. It is continuing to be utilized for real-time AV prototype development for future multi-modal hearing-aids, as part of a follow-on EPSRC funded programme grant (COG-MHEAR). 
URL https://www.sciencedirect.com/science/article/pii/S1566253520302475
 
Title Contextual AV fusion for different noisy and environmental conditions 
Description Preliminary simulation results with our developed audio-visual (AV) model demonstrated that at high signal to noise ratios (SNRs), visual cues may be less helpful for AV noise 'filtering' based speech enhancement algorithms. This reflected human cognitive capabilities that resort to lip-reading in only very high levels of background noise. Our developed AV fusion approach is cognitively inspired in that it contextually learns and switches between audio and visual cues depending on noisy and environmental conditions. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Our developed AV speech enhancement approach has been widely used and cited by researchers worldwide as a benchmark resource. It was further evaluated as part of a follow-on PhD project (led by the PI). This demonstrated significant performance improvement under white noise at low SNRs, compared to conventional speech enhancement methods (SS and LMMSE). At high SNRs, for the case of uncorrelated White Noise, the LMMSE approach was found to outperform our AV approach (with a mean opinion score (MOS) of approx. 4.5 vs 3). Further performance evaluation in real noisy environments suggested that even at high SNRs, the proposed approach performed comparably to conventional speech enhancement approaches. The aforementioned limitation of our AV approach led us to propose the development of a more optimal, context-aware AV system, that can effectively account for different noisy, environmental conditions. This work is continuing to be progressed as part of a follow-on EPSRC funded programme grant (COG-MHEAR). 
URL https://www.sciencedirect.com/science/article/pii/S1566253518306018
 
Title Contextual Multimodal Switching For Speech Enhancement in Real-World Environments 
Description The proposed method is an extension of our previously developed and validated novel lip-reading driven, audio-visual (AV) speech enhancement system, which significantly outperforms benchmark audio-only approaches at low signal-to-noise ratios (SNRs). However, consistent with our cognitive hypothesis, visual cues were found to be relatively less effective for speech enhancement at high SNRs, or in low levels of background noise. Thus, we further extended our previous work and developed a more cognitively-inspired, context-aware AV approach that contextually utilises both visual and noisy audio features, and thus more effectively accounts for different noisy conditions. The developed context-aware AV approach estimates clean audio features, without requiring any prior SNR estimation, which significantly reduces computational complexity. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact We are exploiting the proposed approach in a humanoid robot that interacts autonomously and naturally in dynamic environments of a public shopping mall, providing an engaging and entertaining experience to the general public. Also, in extremely noisy environments e.g. in situations, where ear defenders are worn, such as emergency and disaster response and battlefield environments, air traffic control towers (to improve communication and reduce the risk of accidents) and cargo trains (to address driver distraction) etc. These are currently being explored as part of collaborative, industry co-funded PhD projects and follow-on joint-grant proposals. 
 
Title Deep neural network (DNN) based AV mask estimation model 
Description The proposed audio-visual (AV) mask estimation model contextually integrates the temporal dynamics of both audio and noise-immune visual features for improved mask estimation and speech separation. For optimal AV feature extraction and ideal binary mask (IBM) estimation, a hybrid DNN architecture is exploited to leverage the complementary strengths of a stacked LSTM and convolutional LSTM network. Comparative simulation results in terms of speech quality and intelligibility demonstrate significant performance improvement of our proposed AV mask estimation model as compared to benchmark, Audio-only and Visual-only mask estimation approaches, for the case of both speaker dependent and independent scenarios. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact The proposed model has been widely used and cited by researchers as an alternative approach to conventional filtering-based methods, through its contexual integration of contrasting approaches in order to leverage their complementary strengths. 
URL https://www.isca-speech.org/archive/interspeech_2018/gogate18_interspeech.html
 
Title Multi-modal Speech Enhancement Demonstrator Tool 
Description We developed the world's first open web-based demonstrator tool that shows how recordings of speech in noisy environments can be multi-modally processed to remove background noise and make the speech easier to hear. The demonstrator tool works for sound only, as well as video recordings, and enables researchers to develop innovative multi-modal speech and natural language communication applications. Users can listen to sample recordings and upload their own personal (noisy) videos or audio files to hear the difference after audio-visual processing using a deep neural network model. No uploaded data is stored. User data is erased as soon as the web page is refreshed or closed. 
Type Of Material Improvements to research infrastructure 
Year Produced 2023 
Provided To Others? Yes  
Impact This innovative demonstrator tool was showcased at an international workshop organised as part of the 2022 IEEE Engineering in Medicine and Biology Society Conference (EMBC) in Glasgow, 11-15 July. Around 40 Workshop participants (including clinical, academic and industry researchers) were provided with an interactive hands-on demonstration of the audio-visual speech enhancement tool. The tool demonstrated, for the first time, the technical feasibility of developing audio-visual algorithms that can enhance speech quality and intelligibility, with the aid of video input and low-latency combination of audio and visual speech information. This served to educate participants and demonstrated the potential of such transformative tools to extract salient information from the pattern of the speaker's lip movements and to contextually employ this information as an additional input to speech enhancement algorithms, in future multi-modal communications and hearing assistive technology applications. 
URL https://demo.cogmhear.org/
 
Title Novel Audio-visual SRT for quantifying the intelligibility of speech in noise 
Description The existing Speech Recognition Threshold (SRT) Testing method for quantifying the intelligibility of speech in noise, was exploited for clinical testing of our developed audio-visual (AV) speech enhancement framework. 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact The SRT testing method was further utilized, as a benchmark approach, as part of a follow-on PhD project (led by the PI) to evaluate intelligibility/cognitive load benefits. This served to demonstrate the potential of multi-modal (audio-visual) hearing aids, of the kind posited by our EPSRC AV-COGHEAR project, that are now being further developed as part of the follow-on EPSRC programme grant (COG-MHEAR). 
 
Title Real-Time Lightweight Chaotic Encryption for 5G IoT Enabled Lip-Reading Driven Hearing-Aid 
Description The real-time implementation of audio-visual (AV) hearing-aid demands high data-rate, low latency, low computational complexity, and high security, which are some of the major bottlenecks to the successful deployment of such next-generation multi-modal hearing-aids. To address these challenges, we propose an innovative framework, integrating 5G Cloud-Radio Access Network (C-RAN), Internet of Things (IoT), and strong privacy algorithms to fully benefit from the possibilities these complementary technologies have to offer. The proposed 5G-IoT enabled, secure AV hearing-aid framework transmits encrypted (compressed) AV information and receives enhanced (encrypted) reconstructed speech in real-time. This is aimed at addressing cybersecurity attacks such as location privacy and eavesdropping. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact The new technology has been adopted as part of new industry co-funded PhD projects (with collaborators at Edinburgh, Glasgow, Wolverhampton Manchester and Nottingham Universities, and Phonak/Sonova) to extensive evaluate and compare with benchmark lightweight encryption algorithms. Collaborative work with the new technology has identified and addressed further challenges associated with the implementation of real-time privacy-preserving AV prototypes as part of a follow-on EPSRC funded programme grant (COG-MHEAR). 
URL https://link.springer.com/article/10.1007/s12559-019-09653-z
 
Title World's first large-scale Audio-Visual Speech Enhancement Challenge (AVSEC): New baseline Deep Neural Network Model, Real-world Datasets and Audio-visual Intelligibility Testing Method 
Description We developed and made openly available, a new benchmark pre-trained deep neural network model, real-world (TED video) datasets and a novel subjectve audio-visual intelligibility evaluation method as part of the world's first large-scale Audio-Visual Speech Enhancement Challenge. Details of the benchmark model, datasets and intelligibility testing method were published in peer-reviewed proceedings of the 2023 IEEE Spoken Language Technology (SLT) Workshop (https://ieeexplore.ieee.org/abstract/document/10023284). 
Type Of Material Improvements to research infrastructure 
Year Produced 2022 
Provided To Others? Yes  
Impact The new benchmark pre-trained model code and training and evaluation datasets were made openly available as part of the world's first large-scale Audio-Visual Speech Enhancement (AVSE) Challenge organised by our COG-MHEAR teams as part of the 2023 IEEE Spoken Language Technology (SLT) Workshop, Qatar, 9-12 January 2023. The Challenge brought together wider computer vision, hearing and speech research communities from academia and industry to explore novel approaches to multimodal speech-in-noise processing. Our teams developed a new baseline pre-trained deep neural network model and made this openly available to participants, along with raw and pre-processed audio-visual datasets - derived from real-world TED talk videos - for training and development of new audio-visual models to perform speech enhancement and speaker separation at signal to noise (SNR) levels that were significantly more challenging than typically used in audio-only scenarios. The Challenge evaluation utilised established objective measures (such as STOI and PESQ, for which scripts were provided to participants) as well as a new audio-visual intelligibility testing method developed by the COG-MHEAR teams for subjective evaluation with human subjects. The new baseline model, real-world datasets and subjective audio-visual intelligibility testing method are continuing to be exploited by researchers in speech and natural language communication and hearing assistive technology applications. 
URL https://challenge.cogmhear.org/#/download
 
Title Audiovisual Dataset for audiovisual speech mapping using the Grid Corpus 
Description This new publicly available dataset is based on the benchmark audio-visual (AV) GRID corpus, which was originally developed by our project partners at Sheffield for speech perception and automatic speech recognition. The new dataset contains a range of joint audiovisual vectors, in the form of 2D-DCT visual features, and the equivalent audio log-filterbank vector. All visual vectors were extracted by tracking and cropping the lip region of a range of Grid videos (1000 videos from five speakers, giving a total of 5000 videos), and then transforming the region with 2D-DCT. The audio vector was extracted by windowing the audio signal, and transforming each frame into a log-filterbank vector. The visual signal was then interpolated to match the audio, and a number of large datasets were created, with the frames shuffled randomly to prevent bias, and with different pairings, including multiple visual frames to estimate a single audio frame (from one visual to one audio pairings, to 28 visual to one audio pairings). 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact This publicly available dataset has served as a benchmark resource for the speech enhancement community. It has enabled researchers to evaluate how well audio speech can be estimated using visual information only. A follow-on PhD project (led by the AV-COGHEAR PI) utlised this benchmark dataset to show that the application of novel speech enhancement algorithms (including those based on advanced machine learning) can demonstrate the effectiveness of exploiting visual cues for speech enhancement. The dataset is continuing to be used as a benchmark resource to evaluate tperformance of real-time privacy-preserving AV speech enhancement models as part of a follow-on EPSRC funded programme grant (COG-MHEAR) 
URL http://hdl.handle.net/11667/81
 
Title Audiovisual SPeech in Real noisy Environments (ASPIRE): New benchmark corpus 
Description ASPIRE is a first of its kind, publicly available, audiovisual (AV) speech corpus, recorded in real noisy environments (such as cafe, restaurants). This dataset follows the same sentence format as the current AV benchmark GRID corpus, and can be used to support reliable evaluation of next-generation, multi-modal Speech Filtering technologies. 
Type Of Material Database/Collection of data 
Year Produced 2019 
Provided To Others? Yes  
Impact Our benchmark AV corpus stimulated further interdisciplinary research, in collaboration with academics, industry and enduser organisations. This has led to a new EPSRC funded programme grant (COG-MHEAR) to develop real-time AV hearing-aid prototypes, standards, and benchmark evaluation frameworks. 
URL https://cogbid.github.io/ASPIRE/
 
Title New benchmark ChiME3 AV Corpus 
Description A new publicly available AV corpus has been developed by combining Grid Corpus clean audio/video with ChiME3 noises for a wide-range of SNRs ranging from -12 to 12dB. The new AV ChiME3 corpus contains: (1) Visual to Audio (lip-images and logFB clean audio) features, (2) AudioVisual to Audio (lip-images, noisy logFB, and clean logFB audio) features, (3) Visual to Audio (visual DCT and logFB clean audio) features, (4) AudioVisual to Audio (visual DCT, noisy logFB, and clean logFB audio) features, for SNRs ranging from -12dB to 12dB. All visual vectors were extracted by tracking and cropping the lip region of a range of Grid videos (1000 videos from five speakers, giving a total of 5000 videos), and then transforming the region with 2D-DCT. The noises from ChiME3 corpus were first up-sampled to match the sampling frequency of Grid corpus, and then mixed with clean Grid corpus files. The audio vectors were extracted by windowing the audio signal, and transforming each frame into a log-filterbank vector. The visual signal was then interpolated to match the audio. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact This new publicly available dataset has been widely used and cited as a benchmark resource by the speech enhancement community. The AV corpus has also been used as part of a follow-on PhD project (led by the AV-COGHEAR PI) to validate our context-aware, deep learning driven lip-reading model for speech enhancement. This contextually learns and switches between AV cues with respect to different operating conditions, without requiring any SNR estimation. The publicly available dataset is continuing to be used as a benchmark by the speech research community, including as part of a follow-on EPSRC programme grant (COG-MHEAR), to further investigate and improve contextual lip-reading performance, for both hearing-aids speech enhancement and recognition applications. 
URL https://cogbid.github.io/chime3av/
 
Description EPSRC AV-CogHear (Stirling) 
Organisation Sonova Holding AG
Country Switzerland 
Sector Private 
PI Contribution Supporting project partner on an EPSRC funded grant, providing consultation on methods, advising design, evaluating progress, and assisting with experiments with time and facilities.
Collaborator Contribution Developing an audio-visual speech enhancement algorithm that would be beneficial for hearing-impaired persons.
Impact Published proceedings/book chapter (https://link.springer.com/chapter/10.1007%2F978-3-319-49685-6_30). Published dataset (https://datastorre.stir.ac.uk/handle/11667/81). One-day Challenges in Hearing Assistive Technology conference 19 August 2017 in Stockholm, Sweden (http://spandh.dcs.shef.ac.uk/chat2017/).
Start Year 2015
 
Description EPSRC AV-CogHear (Stirling) 
Organisation University of Sheffield
Country United Kingdom 
Sector Academic/University 
PI Contribution Supporting project partner on an EPSRC funded grant, providing consultation on methods, advising design, evaluating progress, and assisting with experiments with time and facilities.
Collaborator Contribution Developing an audio-visual speech enhancement algorithm that would be beneficial for hearing-impaired persons.
Impact Published proceedings/book chapter (https://link.springer.com/chapter/10.1007%2F978-3-319-49685-6_30). Published dataset (https://datastorre.stir.ac.uk/handle/11667/81). One-day Challenges in Hearing Assistive Technology conference 19 August 2017 in Stockholm, Sweden (http://spandh.dcs.shef.ac.uk/chat2017/).
Start Year 2015
 
Description EPSRC AV-CogHear (Stirling) 
Organisation University of Stirling
Country United Kingdom 
Sector Academic/University 
PI Contribution Supporting project partner on an EPSRC funded grant, providing consultation on methods, advising design, evaluating progress, and assisting with experiments with time and facilities.
Collaborator Contribution Developing an audio-visual speech enhancement algorithm that would be beneficial for hearing-impaired persons.
Impact Published proceedings/book chapter (https://link.springer.com/chapter/10.1007%2F978-3-319-49685-6_30). Published dataset (https://datastorre.stir.ac.uk/handle/11667/81). One-day Challenges in Hearing Assistive Technology conference 19 August 2017 in Stockholm, Sweden (http://spandh.dcs.shef.ac.uk/chat2017/).
Start Year 2015
 
Description MRC Network (Cardiff) 
Organisation Cardiff University
Country United Kingdom 
Sector Academic/University 
PI Contribution We are collaborators on a MRC Network grant for a Hearing Aid Research Network, contributing presentations and discussions on the topic of disruptive technologies for hearing aids.
Collaborator Contribution Attendance, presentations at meetings, etc.
Impact No outputs yet.
Start Year 2015
 
Description 1st International CHAT Workshop 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The first international workshop on 'Challenges for Hearing Assistive Technology, CHAT' was successfully organised as part of Interspeech and served as a focal place for the hearing-aid industry and researchers working in speech technology. The interactions between these communities stimulated fresh ideas for new directions in hearing assistive technology. The workshop provided a forum to showcase the work conducted in our EPSRC AV-COGHEAR project. The interactive presentations by our project researchers, including a position paper (by the PI) on required standards for future development and evaluation of multi-modal hearing-aid technology, stimulated lively discussions afterwards, with some participants requesting more information. Others expressed an interest to exploit our innovative, contextual multi-modal processing approach in their respective academic and clinical research and industry-led projects and activities. Plans for new collaborations were also discussed with some participants.
Year(s) Of Engagement Activity 2017
URL http://spandh.dcs.shef.ac.uk/chat2017/
 
Description 2017 IEEE 4th Intl. Conference on Soft Computing & Machine Intelligence (ICSMI) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The main objective of ISCMI 2017 was to present the latest research and results of scientists related to Soft Computing & Machine Intelligence topics. This conference provided opportunities for the delegates to exchange new ideas face-to-face, to establish business or research relations as well as to find global partners for future collaborations. The PI presented ongoing AV-COGHEAR work as part of his invited keynote talk titled: "Towards Cognitive Big Data Informatics - selected real world case studies and emerging research challenges". The talk stimulated lively discussions and participants expressed an interest in exploiting our innovative context-aware and multi-modal learning and decision making ideas in their respective academic research and industry-led projects and activities. There were requests for further information, and follow-on collaboration possibilities are continuing to be discussed.
Year(s) Of Engagement Activity 2017
URL http://www.iscmi.us/ISCMI2017.html
 
Description British Society of Audiology Conference 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Industry/Business
Results and Impact We will be attending the British Society of Audiology Conference, 25-27 April 2016, which is a mixture of researchers, clinical practitioners, and industry presenters, to present an initial poster, titled "Audiovisual Speech Processing: Exploiting visual features for joint-vector modelling". The aim is to present the new project to a wider audience.
Year(s) Of Engagement Activity 2016
URL https://www.eventsforce.net/fitwise/frontend/reg/thome.csp?pageID=127483&eventID=323&eventID=323
 
Description CHAT-2017 Workshop, Stockholm, August 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact We have organised a 1-day international workshop entitled 'Challenges for Hearing Assistive Technology, CHAT-2016' that will run as a satellite event of the large week-long Interspeech Conference in Stockholm in August. The workshop has been granted recognition and financial support from the International Speech Communication Association (ISCA). We have recruited a Scientific Committee of 25 leading international researchers representing both academia and the hearing aid industry. The purpose of the workshop will be as a meeting place for the hearing aid industry and researchers working in speech technology. We hope that interaction between these communities can stimulate fresh ideas for new directions in hearing assistive technology. The workshop will also provide an opportunity to promote the work being conducted under our EPSRC-funded AV-COGHEAR project.
Year(s) Of Engagement Activity 2017
URL http://spandh.dcs.shef.ac.uk/chat2017/
 
Description Computing Science and Maths Seminars, University of Stirling 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact The aim was to highlight and discuss the existing AI hype and expectations, and how can we meet these expectations (pressing issues). More than 30 students/faculty members/professionals attended the talk. The talk stimulated lively discussions afterwards, and at the end, we all agreed on the need to increase talent pool, and develop AI models which are close to biophysical reality. Plans for new collaborations were also discussed with some participants.
Year(s) Of Engagement Activity 2018
 
Description Faculty research Afternoon, University of Stirling, 18th April 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Research week was an an exciting, multidisciplinary showcase of innovative research at the University of Stirling. We presented our AV-COGHEAR project and gained appreciation from visitors, including industry representatives, University faculty, internal and external speakers. The presentation stimulated lively discussions afterwards, with some participants requesting more information, and others expressing an interest to exploit our innovative, contextual multi-modal processing approach in their respective academic research and industry-led projects and activities. Plans for new collaborations were also discussed with some participants.
Year(s) Of Engagement Activity 2017
 
Description Impact for Access - Stirling University 
Form Of Engagement Activity Participation in an open day or visit at my research institution
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Schools
Results and Impact Approximately 70 school pupils (of ages between 14 and 16) attended a visit to the research organisation (University of Stirling), to learn more about studying Computing Science. As part of this, an image processing tutorial and interactive demo was given to small groups, which resulted in questions and discussions about both the direct research topic (signal processing), and studying Computing Science more generally.
Year(s) Of Engagement Activity 2016
 
Description International Conference on Life System Modeling and Simulation (LSMS 2017) and International Conference on Intelligent Computing for Sustainable Energy and Environment (ICSEE 2017) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The 2017 International Conference on Life System Modeling and Simulation (LSMS 2017) and 2017 International Conference on Intelligent Computing for Sustainable Energy and Environment (ICSEE 2017), were held during September 22-24, in Nanjing, China, with aims to bring together researchers and practitioners in the field of life system modeling and simulation as well as intelligent computing theory and methodology with applications to sustainable energy and environment from across the world. The PI presented our ongoing AV-COGHEAR project work as part of his invited keynote titled: "Towards cognitively-inspired Data Science based on context-aware, multi-modal Big Data Analytics: Selected real-world Case Studies". The talk stimulated lively discussions afterwards, with some participants requesting more information, and others expressing an interest to exploit our innovative, contextual multi-modal processing approach in their respective academic research and industry-led projects and activities. Plans for new collaborations were also discussed with some participants.
Year(s) Of Engagement Activity 2017
 
Description International Conference on Life System Modeling and Simulation (LSMS'2017) and International Conference on Intelligent Computing for Sustainable Energy and Environment (ICSEE'2017) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The PI presented collaborative work on Automated Detection of Targets via a Focus of Attention for SAR Images. High-resolution Synthetic Aperture Radar (SAR) images are known to provide rich features for target detection. However, the absence of efficient feature extraction and merging strategies limits their practicality. Inspired by the Focus of Attention (FOA) mechanism in the biological vision systems, the paper proposed a method for detecting SAR targets. This method mainly includes two stages: global detection (GD) and local detection (LD). The GD stage aims to rapidly predict the location of suspicious areas or targets, while the LD stage aims to locate and segment the salient objects as a whole. The proposed method obtains equal or better detection rate than that of vintage Constant False Alarm rate (CFAR), demonstrating its better feature extraction and processing. The talk stimulated lively discussions afterwards, with plans for new collaborations discussed with some participants.
Year(s) Of Engagement Activity 2017
 
Description MRC Hearing Aid Network (Cardiff) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentations and discussions of from Stirling and Strathclyde EPSRC projects as well as provided expertise on directional microphones.
Year(s) Of Engagement Activity 2015,2016,2017,2018
URL https://www.mrc.ac.uk/documents/pdf/hearing-aid-research-networks/
 
Description MRC Microphone Network: Novel Applications of Microphone Technologies to Hearing Aids 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Discussion on the outcomes of CHAT 2017 and planning next ISH CHAT satellite (presentation by Jon Barker). Further discussion on organizing CHAT2018
Year(s) Of Engagement Activity 2018
 
Description MRC Network OverHear (London) 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentations, demonstrations and discussions on large-scale behaviour studies possible in a large, multi-modal measurement facility such as PAMELA in London, including insights from EPSRC projects at Stirling and Strathclyde universities
Year(s) Of Engagement Activity 2015,2016,2017,2018
URL http://h2020evotion.eu/wp-content/uploads/2017/10/Over-Hear-Programme-Final.pdf
 
Description MRC-EPSRC Network workshop meeting on Microphone Technologies for Hearing Aids, Cardiff, September 21, 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Along with other speakers (Daniel Robert, Allan Belcher, and John Culling), Ahsan Adeel presented the EPSRC AV-COGHEAR vision, objectives, ongoing-future work, immediate challenges, and video/audio-visual information in hearing aids: A brief review and some future directions. The meeting was attended by around 30-40 people consisting of academics from other funded projects and staff from the EPSRC and MRC. The meeting aimed to review grant impact mechanisms and discuss impact emphasis and requirements.
Year(s) Of Engagement Activity 2016
 
Description MRC-EPSRC Network workshop meeting on Microphone Technologies for Hearing Aids, organized at Stirling, 2 Feb 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact AV-COGHEAR PI: Prof. A. Hussain (Stirling), jointly with MRC Network PI: Prof J. Culling (Cardiff), organized the MRC-EPSRC Network meeting at Stirling, in the form of a one day interactive Workshop, attended by approximately 40 participants from multi-disciplinary backgrounds. All AV-COGHEAR project partners, including PI: Prof Hussain, CI: Dr J. Barker (Sheffield), project postdocs, Dr A. Ahsan, Dr R. Marxer, Dr A. Abel, the project's clinical partner: Dr W. Whitmer (MRC IHR Glasgow) and the industry partner, Dr P. Derleth (Phonak Hearing), attended the Workshop meeting and actively participated in discussions to explore and develop synergies between AV-COGHEAR and other related MRC Network and EPSRC project partners. In particular, plans for developing and sharing audio-visual speech data were discussed and agreed, along with a detailed proposal for jointly-organizing a first of its kind, international Workshop (chaired by A. Hussain, J. Barker, J. Culling and J. Hansen) on "Challenges for Hearing Assistive Technology (CHAT-2017)", as part of INTERSPEECH'2017 at Stockholm, Sweden, on 19th August 2017.
Year(s) Of Engagement Activity 2017
 
Description MRC/EPSRC workshop on hearing aid technology research, June 2016 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Dr Jon Barker presented the aims and initial progress of the project at a joint EPSRC/MRC workshop on hearing aid technology research. The meeting was attended by around 60-70 people consisting of academics from other funded projects and staff from the EPSRC and MRC.

The meeting was useful in that it allowed us to identify links with ongoing projects working with audio-visual speech data.
Year(s) Of Engagement Activity 2016
 
Description Poster Session, CHAT 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The aim of this ISCA-supported workshop was to present the challenges of hearing aid signal processing to the wider speech science and speech technology communities. We presented two posters: (1) Towards Multi-modal Hearing Aid Design and Evaluation in Realistic Audio-Visual Settings: Challenges and Opportunities (2) Towards Next-Generation Lip-Reading Driven Hearing-Aids: A preliminary Prototype Demo. The presented posters captured attention of a very large audience and generated possibility of exciting new collaboration opportunities.
Year(s) Of Engagement Activity 2017
 
Description SICSA DemoFest 2017 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact SICSA DemoFest is the largest event of its kind in Scotland. It showcases the very best of Informatics and Computing Science research from all of Scotland's Universities. We presented our Lip-reading driven deep-learning approach for speech enhancement and gained attention from a large audience. The demo stimulated lively discussions afterwards, with some participants requesting more information, and others expressing an interest to exploit our innovative, contextual multi-modal processing approach in their respective academic research and industry-led projects and activities. Plans for new collaborations were also discussed and agreed with some participants, including with a H2020 project team based at Glasgow University.
Year(s) Of Engagement Activity 2017
 
Description SIPRA Workshop, Harbin, China, 14-17 July 2017. 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact A three-day workshop was held in the Harbin Institute of Technology, Harbin, China, aimed at early career researchers from the UK and China, with the objectives of increasing research capacity, knowledge transfer from cognate areas, exploration and identification of research opportunities, and building international teams with the aim of future collaboration for research in this area of international relevance: Disaster Recovery and Mitigation. The EPSRC AV-COGHEAR team presented its novel cognitively-inspired, multimodal approach to speech enhancement, and its applications in next-generation multi-modal applications. The latter include currently proposed assistive technology applications such as hearing-aids, cochlear implants, and listening devices, and also innovative applications in extreme noisy environments, where ear defenders are worn, such as emergency and disaster response and battlefield environments. The talk stimulated lively discussions afterwards, with plans for new collaborations discussed and agreed with some participants, specifically to explore application of our innovative technology in extreme environments.
Year(s) Of Engagement Activity 2017
 
Description School visit (University of Manchester) 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact We discussed unsolved crucial neuroscience questions, such as (1) How the biological neuron integrates incoming multisensory signals with respect to different situations? (2) How are the roles of incoming multisensory signals defined (selective amplification or attenuation) that help neuron(s) to originate a precise neural firing complying with the anticipated behavioural-constraint of the environment? (3) How are the external environment and anticipated behavior integrated? We also discussed how answers to these questions could help us better understand amplification or suppression of incoming multisensory signals in different environmental conditions and how they affect the functioning of neurons that alter their activity. The interactive presentation stimulated lively discussions afterwards, with some participants requesting more information. Plans for new collaborations were also discussed with some participants and the School reported increased interest in related subject areas.
Year(s) Of Engagement Activity 2019
 
Description The First International Conference On Intelligent Computing in Data Sciences (ICDS2017) December 18-19, 2017 Meknes-Morocco 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact The main aim of The First International Conference On Intelligent Computing in Data Sciences ICDS2017 was to promote new research in the theme of computational intelligence for analyzing, exploring and learning from large amounts of data. Original and high quality contributions related to this theme were solicited, including theories, methodologies, and applications. The PI presented ongoing AV-COGHEAR work and its potential applications in real-world noisy environments, as part of his invited Keynote talk. The talk stimulated lively discussions and participants expressed an interest in exploiting our innovative context-aware and multi-modal learning and decision making ideas in their respective academic research and industry-led projects and activities. The 2017 International Conference on Life System Modeling and Simulation (LSMS 2017) and 2017 International Conference on Intelligent Computing for Sustainable Energy and Environment (ICSEE 2017), were held during September 22-24, in Nanjing, China, with aims to bring together researchers and practitioners in the field of life system modeling and simulation as well as intelligent computing theory and methodology with applications to sustainable energy and environment from across the world. The PI presented our ongoing AV-COGHEAR project work as part of his invited keynote titled: "Towards cognitively-inspired Data Science based on context-aware, multi-modal Big Data Analytics: Selected real-world Case Studies". The talk stimulated lively discussions afterwards, with some participants requesting more information, and others expressing an interest to exploit our innovative, contextual multi-modal processing approach in their respective academic research and industry-led projects and activities. Plans for new collaborations were also discussed with some participants.
Year(s) Of Engagement Activity 2017
URL http://www.researchnetwork.ma/icds2017/
 
Description Tinnitus and Hearing Show 2018 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Public/other audiences
Results and Impact Presented an overview of research at the Institute, including work with Universities of Strathclyde and Stirling. Also presented as part of a panel
Year(s) Of Engagement Activity 2018
URL https://invizear.com/tinnitus-and-hearing-this-scotland-2018/
 
Description Workshop on Geological Disaster Monitoring based on Sensor Networks 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Policymakers/politicians
Results and Impact Presentation of our ongoing AV speech processing work and its application in emergency and disaster response. Outputs: Explored possible collaboration opportunities and had useful discussions with fellow British/Chinese delegates and keynote speakers.
Year(s) Of Engagement Activity 2017