Inferring test cases from user bug reports'

Lead Research Organisation: University College London
Department Name: Computer Science

Abstract

As part of recommended practice, projects employ issue trackers as a platform for communication between users and developers or contributors. This includes the request of new features or the report of issues and bugs. The information on these trackers is in a natural language and sometimes structured per a recommended format or template. However, it is yet untapped to guide the generation or repair of test cases. This project, in the first instance, aims to use this information to deduce action sequences that caused the bug to surface so that test cases can be generated for the confirmation of the issue, and later - the validation of the fix. To do so, techniques from natural language processing should be used for the following tasks:
increase the quality of the collected data by recreating missing links between corpora, such as code commits and user bug reports - a task usually performed by a domain expert;
classification of requests into features or bugs;
using actor models to deduce action sequences when they are presented in a free-form narrative;
use action sequences to generate or enhance test suits for the code under analysis;
While NLP techniques have been used for the constructions of project ontologies and these in turn employed for creating links between software artefacts, these techniques have not been previously used directly.
Ideally, this project should automate a typical task that occurs as part of the software life-cycle and thus should reduce the cost of software maintenance. It should also prompt other researchers to follow suit in tapping into unstructured information that exists in most projects using NLP techniques and should provide corpora of software artefacts in a linked data format for researchers to work with to tackle other questions than the ones considered here.'

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509577/1 01/10/2016 24/03/2022
1817557 Studentship EP/N509577/1 26/09/2016 25/06/2021 Profir-Petru Partachi
 
Description Data as generated by open source developers and as often consumed by researchers is often of low quality: issues and their attendant fixes are not linked breaking the trace from problem to solution as manifested in source-code. Technical debt, choices taken early to facilitate a project moving fast, accumulates and causes issues later in the project life-span while biasing the results observed by researchers. Thus the project deviated from the original goal due to lack of data to attempt it. Instead, we focused on tooling to mitigate the issues encountered so that others may follow in our footsteps and attempt the idea.
Exploitation Route Both software products can be directly used by other researchers either as preprocessing steps or as part of their own data processing pipelines. POSIT can facilitate processing mixed text and open research paths for DOCSTRINGS, code comments or code fora where text mixing formal and natural languages exist. Flexeme can facilitate data cleaning for any project making use of commit data. Aide-memoire can serve as a traceability tool/preprocessing step and serve as boot-strap for other approaches. This last is, however, still under review with a journal.
Sectors Digital/Communication/Information Technologies (including Software)

 
Title Aide-memoire 
Description The back-end and predictive model for the Aide-memoire paper (DOI: https://doi.org/10.1145/3542937) to facilitate online traceability and/or paper replication. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact The code for the paper and model are provided as OSS to facilitate replication and the adoption of the traceability model. 
 
Title Flexeme 
Description This project provides several implementations for commit untagling and proposes a new representation of git patches by projecting the patch onto a PDG. 
Type Of Technology Software 
Year Produced 2021 
Open Source License? Yes  
Impact This software is the basis of the "Flexeme: Untangling Commits Using Lexical Flows" paper presented to FSE20. 
URL https://github.com/PPPI/Flexeme
 
Title POSIT: Tool to segment and tag mixed Natural and Formal Language text 
Description POSIT is a tool that simultaniously segments and provides AST parent node or Part-of-Speech tags for text that freely mixes Natural and Formal languages. It is realised as a biLSTM model with a CRF layer prior to prediction output and built on top of the tensorflow framework. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact POSIT, the tool, is the main techincal contribution behind "POSIT : Simultaneously Tagging Natural and Programming Languages" which was accepted to ICSE2020.