Text Entry by Inference: Eye Typing, Stenography, and Understanding Context of Use

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

My research is based on the observation that our daily interaction with computers is highly redundant. Some of these redundancies can be modelled and exploited by intelligent user interfaces. Intelligent text entry methods use AI techniques such as machine learning to exploit redundancies in our languages. They enable users to write quickly and accurately, without the need for a key press for every single intended letter.In this programme I propose to develop two new intelligent text entry methods. The first is a system that enables disabled users to communicate efficiently using an eye-tracker. The second system is a novel intelligent text entry method that is inspired by stenography.In addition, I propose to explore text entry methods' broader context. The research literature has concentrated on inventing text entry methods that promise high entry rates and low error rates. Now that we have text entry methods that have reasonably high entry rates it is time to complement this objective function by discovering other aspects of text entry. I propose to use social-science techniques, such as diary and field-studies, to understand how users would prefer to use text entry methods in the wild. System 1: Eye-typing by inferenceThis is a system that will potentially increase the entry rate in eye-typing systems. Current eye-typing systems are inherently slow (due to the dwell timeouts), and users perceive them as frustrating. I propose to build a system that enables users to eye-type without the need for a dwell timeout at all. Potentially, my method will be faster than any other eye-tracker based method in the world.With my proposed system users write words by directing their gaze at the intended letter keys, in sequence. Users' intended words are transcribed when they look at a result area positioned above the keyboard. Users can write more than one word. They can also write sequences or words, or even stop short within a word. They may go to the spacebar key between words but this is not strictly necessary for the system to be able to correctly infer users' intended words.System 2: Stenography by inferenceThis system will be a stenography system for pen or single-finger input. The primary application is mobile text entry. However, I strive to create a system that to some extent can replace the desktop keyboard, should users so desire. Potentially it will be faster than any other pen-based text entry method.The idea behind this method is to enable users to write words quickly by gesturing patterns they have previously learned. Such open-loop recall from muscle-memory is much faster than the closed-loop visually-guided motions users are required to perform when they tap on, for example, an on-screen keyboard. My proposed system will enable users to quickly and accurately articulate gestures for individual words. These gestures will be fixed for a particular word. That is, each word is associated with a single (prototypical) unique gestural pattern. A user's input gesture is recognised by a pattern recognizer. The word whose closest pattern best match the user's input gesture will be outputted by the system as the user's intended word.Understanding the broader context of text entryThe last component of my proposed programme serves to contribute new perspectives to the text entry research field. As previously discussed, context of use is largely unexplored in text entry. I intend to explore this topic using a range of qualitative methods. I intend to perform interviews, conduct field studies (e.g. studying participants trying a prototype mobile speech recognizer at a caf), and diary-studies. The latter will be conducted with a system that provides users of a choice of a few text entry methods that I hypothesize will be useful for different situations. I also intend to read literature on design and architecture to further my understanding of the complete design space of text entry.

Planned Impact

Research dissemination The research results will be submitted for publications in the best relevant fora. I will first aim for the top-tier multi-disciplinary journals because both of the proposed systems have a high chance of breaking new ground and interest the general research community. System 1 (eye-typing by inference) have a high probability to break record-speeds of writing with the eyes only (currently held by Dasher), and System 2 (stenography by inference) has a high probability to break the record speeds of pen-based and single-finger based input (currently held by ShapeWriter). For justifications behind my estimations see the system descriptions outlined earlier. I also aim to continually publish intermediate research results in the human-computer interaction literature: CHI, IUI and UIST, and/or the HCI-journals Human-Computer Interaction, ACM Transactions on Computer-Human Interaction, and others. Transfer of knowledge to the general public My experience with previous press coverage of my research is that despite all the previous press articles, what contributed the most to reaching the general public was the release of ShapeWriter on Apple's AppStore for the iPhone. Learning from this experience I propose make software available for download as soon as it is practical to do so. I will start a website with a blog that explains how the systems work and provides users with the possibility to download the software. Technology transfer I will investigate the opportunity to patent isolated or entire parts of systems 1 and 2. I am an inventor or co-inventor of six patent applications and have so far been granted two. I have also co-founded a technology start-up based on my previous research results (ShapeWriter, Inc.), and I have experience in attracting capital, talking to enterprise customers, and all the other (extreme) challenges in actually making a technical start-up company work. I have also some experience of the technology-transfer services available within the University of Cambridge. I was an Executive Committee Member of Cambridge University Entrepreneurs 2008 - 2009, and I have attended seminars organised by Cambridge Enterprises (technology transfer body at Cambridge). It should be remembered that there are many ways to make technology transfer work. I want to have an academic career and I am not interested in starting another company unless an extraordinary opportunity arises. I do think there is a huge possibility to improve society by providing users with more efficient text entry methods, particularly for users who prefer to avoid the desktop computer. However, impact on society (number of customers) and business opportunity (market size) are not the same things. I know from my work on ShapeWriter, Inc. that it is extremely hard to make money from licensing to mobile phone manufacturers. At the same time, while enthusiasts are willing to pay for better text entry, the majority will not pay retail for a new text entry method. A possible exception is the market for accessibility software. However, I think it is morally objectionable to sell tax-funded research outcomes at high prices for disabled users. Therefore, I am probably going to release the software under an open source license, if this ends up being feasible. I predict two problems with the open source release scheme. First, there needs to be an active developer community that maintains the software. I will try to solve this by either attracting third-party developers, or by merging my project into a larger project that has an established developer community. Second, users need to be aware that my software exists. I will tackle this by putting up a website, uploading demonstration videos to websites, participating in university outreach efforts, and by demonstrating my systems at scientific conferences.
 
Description In this project we have investigated how to use machine learning and other AI approaches to improve text entry and control of computer systems. We have investigated dwell-free eye-typing and found that is has the potential to be twice as fast as regular dwell-based eye-typing. (Proc. ETRA 2012; Best Paper Honourable Mention). We have also investigated how to create error correction interfaces for speech recognition.We have also devised a new way to perform voice-only correction we call 'one-step' correction. Using our method one can correct speech recognition errors by merely speak the correction interface and there is no need for the user to first 'select' erroneous text. The system automatically infers the location of incorrect text and replaces it with a spoken correction (Proc. SLT 2010). We have used the same algorithm to create a system that fuses gesture and speech interaction. It enables users to enter text by speaking, gesturing, or a combination of both. Our system automatically fuses both input modalities and generates the most likely result (Proc. Interspeech 2011). Another issue in intelligent text entry is the underlying language model. A language model assigns probabilities to word sequences. For a language model to be useful it needs to be trained on appropriate text data. A predictive text entry system is limited by the predictive power of the underlying language model. This problem of creating appropriate language models is particularly acute for predictive Augmentative and Alternative Communication (AAC) devices that predict text that motor-disabled non-speaking individuals want to communicate. This lack of efficient language models due to lack of representative data has been a long-standing problem in the AAC field for over 25 years. We invented a new method to create efficient language models for AAC using a combination of crowdsourcing and intelligent mining of social media (Twitter and blog data). Using our new method we could create efficient language models for AAC that outperform existing models (Proc. EMNLP 2011; article in New Scientist in February 2012). We later used our language model to enable an illiterate AAC user to communicate on her own for the first time (Proc. SLPAT 2012; Proc. ASSETS 2012; ACM SIGACCESS Best Student Paper Award). Our new method and models are now actively used in the AAC field.The language models were also used (in combination with an error correction model) in a paper that investigated how to support efficient text entry on touchscreen tablets (Proc. CHI 2013). Two surveys on text entry and intelligent interaction have also been produced (one 'Research Highlight' in Communications of the ACM and one survey article in Foundations and Trends in Human-Computer Interaction.Understanding gesture interaction is often crucial for efficient intelligent interaction. We have investigated the memorability of gestures and found that in a series of three experiments self-defined gestures were significantly easier to remember than pre-designed gestures, even when one takes controls for training time (Proc. CHI 2013).Finally, we have explored how to leverage proxemics to design new intelligent interactive systems (Proc. IUI 2013; Pervasive and Mobile Computing 2013; Ext. Abstracts CHI 2013).To help the text entry field progress we have also acted as the lead organiser for the text entry workshops at CHI 2012 and CHI 2013.
Exploitation Route The empirical and technical work has lead to implications for design that can be used as solution principles for developing similar user interfaces in industry. Researchers can build on these design implications to further study potential additional principles of design.
Sectors Digital/Communication/Information Technologies (including Software)