Stable Prediction of Defect-Inducing Software Changes (SPDISC)

Lead Research Organisation: University of Birmingham
Department Name: School of Computer Science

Abstract

Context: software systems have become ever larger and more complex. This inevitably leads to software defects, whose debugging is estimated to cost the global economy 312 billion USD annually. Reducing the number of software defects is a challenging problem, and is particularly important considering the strong pressure towards rapid delivery. Such pressure impedes different parts of the software source code to all receive equally large amount of inspection and testing effort.

With that in mind, machine learning approaches have been proposed for predicting defect-inducing changes in the source code as soon as these changes finish being implemented. Such approaches could enable software engineers to target special testing and inspection attention towards parts of the source code most likely to induce defects, reducing the risk of committing defective changes.

Problem: the predictive performance of existing approaches is unstable, because the underlying defect generating process being modelled may vary over time (i.e., there may be concept drift). This means that practitioners cannot be confident about the prediction ability of existing approaches -- at any given point in time, predictive models may be performing very well or failing dramatically.

Aim and vision: SPDISC aims at creating more stable models for predicting defect-inducing changes, through the development of a novel machine learning approach for automatically adapting to concept drift. When integrated with software versioning systems, the models will provide early, reliable and automated defect-inducing change alerts throughout the lifetime of software projects.

Impact: SPDISC will enable a transformation in the way software developers review and commit their changes. By creating stable models to make software developers aware of defect-inducing changes as soon as these are implemented, it will allow targeted inspection and testing attention towards defect-inducing code throughout the lifetime of software projects. This will reduce the debugging cost and ultimately lead to better software quality.

Proposed approach: an online learning algorithm will be developed to process incoming data as they become available, enabling fast reaction to concept drift. Concept drift will be detected using methods designed to cope with class imbalance, which typically occurs in prediction of defect-inducing software changes. Class imbalance refers to the issue of having a much smaller number of defect-inducing changes than the number of safe changes. The proposed approach will also make use of data from different projects (i.e., transfer learning between domains) to speed up adaptation to concept drift.

Novelty: SPDISC is the first proposal to look into the stability of predictive performance over time in the context of defect-inducing software changes. Most previous work ignored the fact that predictions are required over time, being oblivious of the instability of predictive performance in this problem. To deal with instability, SPDISC will develop the first online transfer learning approach for predicting defect-inducing software changes.

Ambitiousness: online transfer learning between domains with concept drift is not only a very new area of research in software engineering, but also in machine learning. Very few approaches exist for that, and none of them can deal with class-imbalanced problems. Therefore, SPDISC will not only advance software engineering by enabling a transformation in the way software developers review and commit their changes, but also advance the area of machine learning itself.

Timeliness: given the current size and complexity of software systems, the increased number of life-critical applications, and the high competitiveness of the software industry, approaches for improving software quality and reducing the cost of producing and maintaining software are currently of utmost importance.

Planned Impact

SPDISC's beneficiaries are the software industry, software users and related scientific communities.

1) Software Industry
The software industry is SPDISC's main beneficiary. The UK software industry is estimated to be worth more than 9bn GBP, and is the second largest market by value in the EU. Globally, the software industry's estimated value is over 407bn USD. And yet, the global cost of debugging software is estimated to be 312 billion USD annually, representing an enormous loss of revenue. SPDISC will lead to an impact on the economy by reducing debugging cost and increasing software quality.

In particular, SPDISC will empower software developers with early, reliable and automated alerts of defect-inducing software changes throughout the lifetime of software projects. It will enable a transformation in the way software changes are reviewed and committed in software development companies who use software versioning and bug-tracking systems. Defect-inducing changes will be automatically pinpointed for attention right after their implementation, allowing easy and wise allocation of the limited testing and inspection resources. This is specially desirable in companies leaning towards a more agile software development process.

As the software changes will be fresh in the developers' minds when defect alerts are triggered, their inspection will be much cheaper than later debugging cost. In addition, changes typically have few lines of code, further facilitating inspection. Therefore, SPDISC's approach will reduce the risk of committing changes that will lead to defects, reducing debugging cost and increasing software quality. The lower debugging cost will translate into cheaper software cost, as finding and fixing defects typically takes 50% of a software developer's time.

From a project management perspective, as each software change is inherently associated to a single developer, the assignment of developers to inspect defect-inducing changes will be straightforward. With SPDISC, the task of deciding which parts of the source code should receive increased attention and by whom can be delegated to the software developers themselves, freeing project managers to other tasks.

Both large enterprises and SMEs can benefit from SPDISC, as its approach automatically adapts to different environments. I anticipate that software development tools based on SPDISC will be commercialised in the future. One of SPDISC's industrial partners has already expressed interest in doing that. This will assist SMEs in benefitting from SPDISC, increasing their competitiveness and driving faster and more balanced economic growth. This will in turn lead to an impact on society by increasing wealth and employment.

2) Software Users
The more cost-effective software development enabled by SPDISC will consequently bring benefits to software users, who can be private users, users of public services, or other enterprises. Cheaper cost will facilitate access of private users and public services to software. Higher quality will improve quality of life through better and safer software experience. This is key to a world of smart cities, which are greatly controlled by software. It is also important to life-critical software applications, which could pose serious threats if defective. Cheaper and higher quality software will increase the competitiveness of other enterprises who depend on software, driving faster economic growth. Extensions of SPDISC's approach can also potentially help to solve other data analytics problems than defect prediction.

3) Scientific Communities
SPDISC will create a tighter bond between software engineering and machine learning through its new machine learning approach for software engineering. These two areas will benefit from this research. There will also be some impact on mathematical sciences, as part of SPDISC's foundation lies in this area. More details are in the academic beneficiaries summary.

Publications

10 25 50

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/R006660/1 03/01/2018 03/09/2018 £100,542
EP/R006660/2 Transfer EP/R006660/1 04/09/2018 01/11/2019 £47,775