LUCID: Clearer Software by Integrating Natural Language Analysis into Software Engineering

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Abstracts are not currently available in GtR for all funded research. This is normally because the abstract was not required at the time of proposal submission, but may be because it included sensitive information such as personal details.

Publications

10 25 50
 
Description Source code combines two channels, a formal channel that specifies an algorithm for a computer and a natural language channel that explains that algorithm to developers. Both channels have been extensively studied in isolation. Lucid established a new line of research explicitly focused on how they interact. Some of these interactions form dual channel constraints; Lucid showed how to exploit these constraints to solve software engineering problems.

Among its achievements, Lucid tackled the problem of overloading builtin types rather than defining problem-specific types (Refinym, FSE'18), advanced the state of the art in comment placement and quality (Detecting Redundant Comments, arXiv 2019 and Where Should I Comment my Code?, ICSE NEIR'20); and showed the utility of probabilistic type inference. Good comments speed understanding and maintaining code. A class of bad comments are those that redundantly repeat the code. Lucid built a technique and a tool to detect such comments. Lucid also surfaced and investigated the problem that precedes writing any specific comment: the question of where to add a comment. Explicitly handling this problem promises to ease the more important and much harder problem of generating comments. Lucid built an effective tool to solve this problem and contributed a data set for researchers to use. Dynamic languages, like JavaScript and Python, are well-suited to writing prototypes, but are expensive to maintain, partly because they lack type annotations. Therefore, many companies, Google, Facebook, and Microsoft, among others, have invested in adding static typing to them. Static typing, however, requires developers to add type annotations, which can be a monumental effort for a large codebase. Type inference, the traditional solution to this problem, cannot soundly deduce the precise type of many expressions in dynamic languages. Lucid's DeepTyper project showed that probabilistic type inference can usefully ease this annotation burden and infer types, just from local lexical context. Types follow a fat-tailed Zipfian distribution. DeepTyper does not handle infrequent types well. Our Typilus work, published at PLDI, uses metalearning to infer infrequent types.

Lucid has been a productive project, leading to eight papers published at top-tier venues in software engineering and programming languages: FSE, PLDI, ICSE, TSE, and ICSE's NIER.
Exploitation Route Titans of the software industry, namely Google, Microsoft, Facebook, and Amazon, have all made substantial investments into applying machine learning and natural language processing techniques to improve developer productivity. Their teams are currently working on commercialising tools and techniques, some inspired by approaches and perspectives pioneered by Lucid, that exploit dual channel constraints. These approaches tackle a range of software engineering problems, including autocompletion, commit untangling, code search, probabilistic type inference for dynamically typed language.
Sectors Digital/Communication/Information Technologies (including Software)

 
Title Comment locator dataset 
Description This dataset contains a curated set of open source C projects, from which we have extracted the inline comments and added their locations as test targets. This is to facilitate research on predicting comment locations from code. 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This is fairly recent work at the time of writing (March 2021), so there has not been enough time to observe notable impacts from this dataset 
URL http://groups.inf.ed.ac.uk/cup/comment-locator/
 
Description Huawei Shenzhen Dec 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Gave tech talk about our research to software engineers at Huawei, one of the largest companies in China
Year(s) Of Engagement Activity 2017
 
Description sourced Tech Talk 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Hour long tech talk for workshop aimed at software engineers in Eastern Europe
Year(s) Of Engagement Activity 2017
URL https://blog.sourced.tech/post/ml_talks_moscow/