New pathways to hearing: A multisensory noise reducing and palate based sensory substitution device for speech perception

Lead Research Organisation: University College London
Department Name: Experimental Psychology


Disabling hearing loss is a global problem that affects nearly half a billion people. Furthermore, it is a problem that is growing with an aging population and has clear negative functional, social, emotional, and economic impacts. In the United Kingdom, adult onset hearing loss is predicted to be one of the top ten disease burdens by 2030 (WHO). Commercially available correction for hearing loss is mostly limited to hearing aids and cochlear implants. These devices suffer from signal processing and sensory transduction limitations. On the signal processing side, they struggle with the separation of speech from noise, often from other voices in social situations - the cocktail party phenomenon. On the transduction side, devices continue to rely on the damaged cochlea as the channel of communication. The aim of this proposal is to address these limitations through multisensory remapping at both the signal processing and transduction stages.
We will address signal processing limitations by introducing a new multisensory algorithm that will aim to recover the auditory signal from talking faces. The moving face can provide a source of information that is independent of environmental noise. Facial movement can also be used to enhance the signal to noise ratio of audio-only based speech. The new method can also recover facial movement from auditory signals alone so that speech perception might continually benefit from known improvements associated with being able to see the face, even when it is not present, as when the recovered face is presented on a device carried by the listener (e.g. a smart phone or Google Glass).
We will address signal transduction limitations by building on recent successes in supplementing vision through high-density tactile stimulation of the tongue and previous work demonstrating promise for supplementing word learning through tactile stimulation. In particular, we will build a novel non-invasive conformable high-density electrode array that provides electrotactile stimulation of the hard palate. This is the first device with high enough channel density to realistically provide, in tactile form, the spatial information about sound frequency available along the healthy cochlea. By putting it on the hard palate, this device will be the first sensory supplementation device to have direct access to sensorimotor brain circuitry important for speech learning and perception through the trigeminal nerve.
Finally, we will use behavioural and brain imaging methods to experimentally test the combined use of these signal processing and transduction innovations for hearing supplementation. From past experience with more primitive devices (e.g. The Tickle Talker), we expect people will be able to rapidly learn words and transfer training to novel words in new contexts. We expect training to be enhanced by combined presentation of audio to the hard palate and the face to a portable display device, so learning can occur in natural contexts. We will test the ability of participants to use the device for speech perception behaviourally and use functional imaging to look for indications of activation or modification of speech circuits in the brain after training. This work will contribute to our understanding of multisensory signal processing algorithms for hearing devices, auditory-to-tactile hearing supplementation, and multisensory brain plasticity. Success with this experimental device would warrant clinical trials to supplement hearing in individuals with hearing loss and bring these innovations to market as a new device to help with the social and economic challenges posed by disabling hearing loss.

Planned Impact

Disabling hearing loss is a global problem that affects nearly half a billion people. More than 16% of the UK population overall and 70% of those over 70 has some form of hearing loss (Action on Hearing Loss Statistics, 2014). The problem is particularly acute given the ageing population and the ubiquity of age-related hearing loss. Age-related hearing loss is one of the most significant contributors to social isolation and lack of wellbeing in the elderly. Associated problems include increased social and cognitive difficulties, social isolation, depression, dementia, and mortality when uncorrected (Arlinger, S., Negative consequences of uncorrected hearing loss-a review. International Journal of Audiology, 2003). Commercially available correction for hearing loss is mostly limited to hearing aids that only amplify sounds and cochlear implants that are invasive, expensive, and limited in the number of channels carrying acoustic information. Health related quality of life and cost effectiveness results associated with these devices are quite variable with only small to medium typical effect sizes (e.g., Chisolm et al., 2007). The technologies on which these devices are built are already mature. Our strategy is to take a different approach that exploits recent advances in computation power, particularly in handheld devices, and new developments in the fabrication of conformable electrode arrays. Our project makes two primary advances that will have an impact on those with hearing loss. First, we propose a new multisensory signal processing algorithm for increasing the signal to noise ratio of speech through facial speech reading that can be adopted by both existing hearing devices and a new generation of devices. Second, we propose a new signal transduction method for supplementing speech hearing through a high-density conformable electrode array worn on the hard palate. We have identified a project partner Wicab Inc. who already market an electrotactile stimulator for the blind which is placed on the tongue. They have expressed a strong interest in their letter of support in developing new products for the deaf and deaf-blind and are keen to support the project with their existing expertise and to develop any new devices to a point at which they could be brought to market.


10 25 50
Description There is considerable evidence that speech perception uses representations that also serve speech production. Information about the configuration of articulators in the vocal tract of a speaker could support speech perception, however this requires that  the visual system gain information about the vocal tract configuration from a speaker's face. We used a non-invasive, dense, markerless technique to encode configural change in both the face and magnetic resonance (MR) images of the vocal tract. We combine this information using a novel statistical approach to extract the joint variation in the two modalities. This allowed us to recover information in one modality from another along with a "bubbles" technique to identify which areas of the face are best at recovering the configuration of the vocal tract and vice versa. We show that there is sufficient information in the configuration of the facial surface to recover the configuration of the vocal tract, that implicit coding of configural change improves recovery, and that the key areas for driving the correspondence vary in accordance with the articulation required to form the acoustic signal at the relevant point in the utterance. This provides evidence that the facial configuration is driven by the time-varying position of articulators in the vocal tract that are specific to the generation of the acoustic signal.
Exploitation Route We believe that our approaches for deriving speech, the face, and/or the vocal track from any one of these modalities in a computationally tractable manner will be of considerable interest for others trying, e.g., to use facial information to remove noise that goes into hearing aids. We also believe that the results of thresholds, adaptation rates, and two-point discrimination on the palate, and specification of optimal electrode spacing will be of considerable interest to anyone trying to present any information to the palate (e.g., those trying to present speech to the deaf through a means other than hearing aids). The palate is an ideal location for any form of sensory substitution given that it is out of sight and the known density of receptors there. However, there is not much data or research at all available for doing so. Thus, our work could be of help to others seaking to present information to the palate.
Sectors Digital/Communication/Information Technologies (including Software),Pharmaceuticals and Medical Biotechnology,Other

Description NIHR BRC
Amount £23,600,000 (GBP)
Organisation University College London 
Department NIHR Biomedical Research Centre
Sector Academic/University
Country United Kingdom
Start 10/2017 
End 03/2022
Title Electrode array to test discrimination/adaptation 
Description We have developed a head mounted electrode array (and button box) to test two-point discrimination, motion discrimination, and adaption on the palate at varying distances ranging from about 200 micrometers to a few millimetres. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? No  
Impact These discrimination/adaptation tests are crucial for determining how many electrodes can be packed into our sensory substitution device and how to present auditory information to the palate. 
Title Linking Face and Vocal Tract Action 
Description We have been linking facial action to MRI vocal tract movement when participants are saying the same sentences. This will allow prediction of the vocal tract from facial movement allowing us to use the vocal tract shape to predict speech. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? No  
Impact This method is at an early stage but linking facial action to vocal tract configuration is the first step in the pipeline needed to link visible facial action to speech. 
Title Multichannel device for palate stimulation 
Description - Jonathan Viventi - existing electrodes they have too short for our application (1 inch) . they can not yet supply new electrodes (5 inch) not sure when (16/07) - they are developing 13-14cm, 144ch that should be ready around July 2017 (17/03) . we thought of designing the electrodes ourself and use their headstage and program - but their headstage can only deliver 1mA, 15v (16/08) - later confirmed it can only deliver 5v (16/09) - Not suitable for our application. Based on impedance map of the electrodes we received - a typical impedance is in the range of 20kohm which with 5v gives us maximum 0.2mA output only. - their headstage uses 0.2" pitch ZIF - we analysed their headstage for possibility of driving it with a different hardware - we agreed to buy a headstage from them - 16/08 and 16/09 - we received electrode samples on 16/10 but no headstage. We tested these on tongue and palate with the Digitimer DS7A we acquired on loan on 16/11 ? 0.4mm electrode distance threshold about 13mA/cm2 (too high) - Neuronexus can supply us electrodes - sample requested on 17/01 - materials was confirmed; electrode Pt and substrate Polyimide (17/03) - Ripple (US) can build custom electrodes for us - 0.5mm pitch multilayer - sample requested on 17/01 - received 17/01 - inter-electrode distance of received electrode is 1cm instead of 1mm we were sent the photo of got in touch - x4 electrode will cost $25,000 (17/03). Stimulation system 64ch cost about £50,000. Too much. - Smartelectronics (UK based) can print on gold - prepared a drawing electrode for them 64pins, 10cm - Digitimer UK can supply us with DS7A - this is single channel current stimulator - £3240 - we could get one on loan - arrived 16/11 - tested maximum trigger frequency 1ms (17/08) - in worse case - grapvine (US) can supply us with a driving circuit (+stim FE; output 1.5mA, 32ch, 128 current step control - not ideal - 16/08 - Instead, draw a rough sketch of a stimulator circuit with 2AFC to build (17/09) . If made it could be used for conducting 3 types of experiments 1. DT 2. two point discrimination 3. adaptation rate . we started to select components for the design (MUX, MC, current source...) (17/09) . meanwhile, we reviewed 3 model of Digitimer current source products if they can be used - none suitable for the range of current and accuracy we need (17/09) . called WPI - what is the load dependency of current of A365 . various designs considered . designed a PCB board (17/10) - ordered . started programming the device while awaiting the PCB board to be printed . PCB board arrived (17/11) - this device that we developed in just 3 months . can deliver current up to 3mA . has compliance voltage up to 30v . has error detection circuit to detect faulty connection of electrodes . and its nominal current step control of 4096 (for comparison grapvine +stim FE; output 1.5mA, 32ch, 128 current step control) . and all its variables such as pulse width, waveform, pulse duration and number can be controlled by a laptop connected through USB . designing SD connection for the system so it can work standalone without the need for a PC (17/11) . Started mapping the electrodes with software numbers (17/11) . USB communication with PC for control completed (17/12) . Programming for conducting Psychophysical test completed (17/12) 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact We are using this tool to do the first ever mapping of the palate. This is also a necessary step in building a larger array of palate based electrodes. 
Title Testing sensory substitution during fMRI 
Description We will use a multi-channel air puffing device during fMRI as a stand in for an electrotactile sensory substitution device. 
Type Of Material Improvements to research infrastructure 
Year Produced 2019 
Provided To Others? No  
Impact The air puffer will be used to see how the palate representation looks and how it changes with learning to use sensory substitution devices to present auditory information. This serves as a testing bed for protocols for a later electrotactile (higher channel count) device. See collaboration with Dr Fred Dick at BUCNI. 
Title We developed 1024 point FFT analyser for converting Wicab Brainport vision device to a speech processing device. 
Description - screen size 2.4cm monochrome - 12 bit ADC, 5.5kHz bandwidth - has a on screen VU meter to show the intensity of the input signal - battery powered, can run for over 24h on a single charge - bluetooth connectivity - enclosed within a 3D printed frame that can be mounted directly on the Wicab Brainport device (16/06) - applies logarithmic hearing scaling factor of k=0.3 to the displayed spectrum (17/01) - applies 25% overlapping Hanning windowing for smoother FFT (17/02) - applies equal loudness contours at 60dB to the spectrum - to display the spectrum in phon (or loudness) (17/03) - full mounted prototype tested (17/03) - on screen display of the screen-refresh-time (17/05) - a single push button that switches the device and the bluetooth on and off - fully assembled (17/05) - Wicab device delay measured to be about 140ms - Wicab confirmed it is 20ms (17/05) - equipped with external led screen illumination to improve the quality of the capture images by Wicab camera (17/06) - function to apply tongue shape to the presented spectrum (17/07) - IRB approved 16/01 and 17/03 - testing started 17/06 
Type Of Material Improvements to research infrastructure 
Year Produced 2017 
Provided To Others? Yes  
Impact We've used this to conduct research on perceiving speech through the tongue with a group of N=33 participants. Data are being analysed. 
Title Linked facial action and MR vocal tract saggital sections 
Description We have collected data from people repeating a number of standard sentences to a single full face camera, a multi-camera array containing 5 separate viewed of the face around the horizontal plane and in a MR scanner. In the scanner we have recorded the vocal tract movement in saggital section while the same subject repeated the same sentences. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? No  
Impact The data for the data base was collected in the summer and autumn of 2017 for explicit use in the project. We have begun the process of registering these data collected in different places and times and through different media. The postdoctoral researcher tasked with this left the project in March 2018. Once the researcher is replaced we will continue with the aim of the project to link face and speech. Our final aim is to publish the data set by the end of the project. 
Description Richard Hogle - vice president of Wicab Inc. 
Organisation Wicab
Country United States 
Sector Private 
PI Contribution We are using Wicab's tongue based sensory substitution device to present speech to the tongue in experiments.
Collaborator Contribution - Brainport arrived (03-2016) - we agreed not to reverse engineer the device - all we know is that the device encodes into 4 parallel rows of electrodes, and signals are amplitude modulated
Impact We will eventually have a publication from experiments from this collaboration but none are ready yet.
Start Year 2016
Title 3D micro wire printing system 
Description Development of the 3D micro wire printing system for future development of custom made palate electrodes/devices. 
IP Reference  
Protection Protection not required
Year Protection Granted
Licensed No
Impact None yet