Inferring test cases from user bug reports'

Lead Research Organisation: University College London

Department Name: Computer Science

Abstract

As part of recommended practice, projects employ issue trackers as a platform for communication between users and developers or contributors. This includes the request of new features or the report of issues and bugs. The information on these trackers is in a natural language and sometimes structured per a recommended format or template. However, it is yet untapped to guide the generation or repair of test cases. This project, in the first instance, aims to use this information to deduce action sequences that caused the bug to surface so that test cases can be generated for the confirmation of the issue, and later - the validation of the fix. To do so, techniques from natural language processing should be used for the following tasks:
increase the quality of the collected data by recreating missing links between corpora, such as code commits and user bug reports - a task usually performed by a domain expert;
classification of requests into features or bugs;
using actor models to deduce action sequences when they are presented in a free-form narrative;
use action sequences to generate or enhance test suits for the code under analysis;
While NLP techniques have been used for the constructions of project ontologies and these in turn employed for creating links between software artefacts, these techniques have not been previously used directly.
Ideally, this project should automate a typical task that occurs as part of the software life-cycle and thus should reduce the cost of software maintenance. It should also prompt other researchers to follow suit in tapping into unstructured information that exists in most projects using NLP techniques and should provide corpora of software artefacts in a linked data format for researchers to work with to tackle other questions than the ones considered here.'

Student:

Profir-Petru Partachi

Period of Study:

Sep 16 - Jun 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Studentship

Project Reference:

1817557

Research Topic:

Unclassified

Organisations

University College London (Lead Research Organisation)

People	ORCID iD
Mark Harman (Primary Supervisor)
Profir-Petru Partachi (Student)

Publications

Author Name

Title Publication Date Published

10 25 50

Studentship Projects

Project Reference	Relationship	Related To	Start	End	Student Name
EP/N509577/1			01/10/2016	24/03/2022
1817557	Studentship	EP/N509577/1	26/09/2016	25/06/2021	Profir-Petru Partachi

Key Findings
Software and Technical Products


Description	Data as generated by open source developers and as often consumed by researchers is often of low quality: issues and their attendant fixes are not linked breaking the trace from problem to solution as manifested in source-code. Technical debt, choices taken early to facilitate a project moving fast, accumulates and causes issues later in the project life-span while biasing the results observed by researchers. Thus the project deviated from the original goal due to lack of data to attempt it. Instead, we focused on tooling to mitigate the issues encountered so that others may follow in our footsteps and attempt the idea.
Exploitation Route	Both software products can be directly used by other researchers either as preprocessing steps or as part of their own data processing pipelines. POSIT can facilitate processing mixed text and open research paths for DOCSTRINGS, code comments or code fora where text mixing formal and natural languages exist. Flexeme can facilitate data cleaning for any project making use of commit data. Aide-memoire can serve as a traceability tool/preprocessing step and serve as boot-strap for other approaches. This last is, however, still under review with a journal.
Sectors	Digital/Communication/Information Technologies (including Software)


Title	Aide-memoire
Description	The back-end and predictive model for the Aide-memoire paper (DOI: https://doi.org/10.1145/3542937) to facilitate online traceability and/or paper replication.
Type Of Technology	Software
Year Produced	2022
Open Source License?	Yes
Impact	The code for the paper and model are provided as OSS to facilitate replication and the adoption of the traceability model.


Title	Flexeme
Description	This project provides several implementations for commit untagling and proposes a new representation of git patches by projecting the patch onto a PDG.
Type Of Technology	Software
Year Produced	2021
Open Source License?	Yes
Impact	This software is the basis of the "Flexeme: Untangling Commits Using Lexical Flows" paper presented to FSE20.
URL	https://github.com/PPPI/Flexeme


Title	POSIT: Tool to segment and tag mixed Natural and Formal Language text
Description	POSIT is a tool that simultaniously segments and provides AST parent node or Part-of-Speech tags for text that freely mixes Natural and Formal languages. It is realised as a biLSTM model with a CRF layer prior to prediction output and built on top of the tensorflow framework.
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	POSIT, the tool, is the main techincal contribution behind "POSIT : Simultaneously Tagging Natural and Programming Languages" which was accepted to ICSE2020.

Abstract

Organisations

People

ORCID iD

Publications

Studentship Projects