USING FAULT CHARACTERISTICS TO IMPROVE SOFTWARE FAULT PREDICTION

Lead Research Organisation: Brunel University London

Department Name: Computer Science

Abstract

SIGNIFICANCE: Faults in software code are a significant cost to companies, as well as a risk to human safety and business success. Finding and fixing faults in code costs the UK software industry billions of pounds every year. Significant cost savings are available with even small improvements in our capability to find faults before systems are delivered to users.

BACKGROUND: Our previous work shows that during the last 10 years, 208 studies have published hundreds of different fault prediction models. These studies are usually typified by researchers applying one or more of the many modeling techniques to one or more of the many available data sets, then applying performance measures to report how well that model predicts faults.

PROBLEM: Models do not perform consistently above the current predictive performance ceiling of about 80% recall. We propose that an important contributor to this underperformance is that models treat all faults as homogeneous. No previous attempt has been made to understand what characteristics make a fault predictable or what features a model needs in order to predict faults with particular characteristics.

AIM: To build a fault prediction model ensemble which is focused on the characteristics of faults and which consistently performs above the current performance ceiling.

METHOD: This 36 month project is based on analysing the code and fault data from six commercial systems and from six open source systems. We will conduct detailed quantitative and qualitative analysis of the characteristics of the faults in these systems, identifying for example whether the characteristics of faults are problems in code interfaces, algorithmic problems, structural problems, typographic problems, etc. We will construct a set of prediction models with a large variety of features (e.g. different modeling techniques, different independent variables, etc.). We will use these models to empirically identify relationships between fault characteristics and the features of individual models. This means that we will identify what features of prediction models predict faults with particular characteristics. We will build ensembles of models with features that cover the widest range of fault characteristics. We will evaluate those models on industrial systems in collaboration with a company.

Planned Impact

Faults are hugely costly to the UK software industry. Resourcing developers to find and fix faults is very expensive. The opportunity cost of developers doing this is significant. The costs of failing to find and fix faults can be catastrophic to both human life and business success. Finding faults early in the lifecycle reduces the cost of fixing these faults and mitigates the risk to humans and businesses. Research that improves our capability to find faults offers companies huge potential benefits. Indeed because the cost of finding and fixing faults is so significant even a small improvement in fault finding capability will save a large amount of money. Consequently our proposal is very important to the UK software industry.

Despite the potential importance of fault prediction to the software industry, uptake of fault prediction has been slow by companies. This is predominately because, as described earlier, the predictive performance of models typically does not go beyond 80% recall. Companies want better predictive performances to justify an investment in fault prediction models. In addition generating in-house fault prediction expertise is not straightforward. The field is complex and few companies have such expertise available. As a result industry does not currently have much appetite for fault prediction.

Our work will make a significant impact on increasing industrial take-up of fault prediction models for two reasons: First, our model ensemble will offer improved predictive performances. This in itself will make it more attractive to companies. Second, our impact and dissemination strategy is designed to explicitly target industrial take-up of our model ensemble. Ensuring our model ensemble impacts on the software industry is based around producing genuinely useful and usable tools for industry, as well as effectively drawing these tools to the attention of industry.

Our impact strategy has the following elements: Delivering our model ensemble in the form of a highly usable IDE/ANT plug-in tool; Developing a fault analysis web site that companies can submit code to and it will be automatically analysed for faults; An industry workshop on fault prediction; Both industry and academic orientated publications.

Funded Value:

£394,315

Funded Period:

Apr 14 - Apr 17

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/L011751/1

Principal Investigator:

Tracy Hall

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Software Engineering (100%)

Organisations

People	ORCID iD
Tracy Hall (Principal Investigator)
David Bowes (Co-Investigator)
Steve Counsell (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Bowes D (2016) Mutation-aware fault prediction

Bowes D (2015) Different Classifiers Find Different Defects Although With Different Level of Consistency

Bowes D (2017) Getting Defect Prediction Into Industrial Practice: the ELFF Tool

Bowes D (2017) Software defect prediction: do different classifiers find the same defects? in Software Quality Journal

Child M (2019) A comparison and evaluation of variants in the coupling between objects metric in Journal of Systems and Software

Kirbas S (2017) Evolutionary coupling measurement: Making sense of the current chaos in Science of Computer Programming

Kirbas S (2017) The relationship between evolutionary coupling and defects in large industrial software The relationship between evolutionary coupling and defects in large industrial software in Journal of Software: Evolution and Process

Mahmood Z (2015) What is the Impact of Imbalance on Software Defect Prediction Performance?

Petric J (2016) Building an Ensemble for Software Defect Prediction Based on Diversity Selection

Petric J (2016) The jinx on the NASA software defect data sets

Shippey T (2019) Automatically identifying code features for software defect prediction: Using AST N-grams in Information and Software Technology

Shippey T (2016) So You Need More Method Level Datasets for Your Software Defect Prediction?

Key Findings
Impact Summary
Research Tools and Methods
Intellectual Property
Engagement Activities


Description	We have extensively analysed faults in commercial and open source systems. Our findings are that: - commercial systems seem to contain fewer faults than open source systems. - ensemble approaches seem to predict faults better than single model approaches. - Code cleaning could improve the prediction of faults.
Exploitation Route	Our findings and tools will take the fault prediction community forward and hopefully will be used by companies. Development of our tools is on-going.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	Our company partners have used the tools we developed in their software engineering process. Our tools are also available as open source tools for anyone to use.
First Year Of Impact	2015
Sector	Digital/Communication/Information Technologies (including Software)
Impact Types	Economic


Title	ELFF Defect prediction tool
Description	A software defect prediction tool for use by researchers and developers.
Type Of Material	Improvements to research infrastructure
Provided To Others?	No
Impact	The tool is currently in evaluation with our industrial partner (Sky Plc). We are also talking to other companies about transferring the tool into industrial practice.


Title	ELFF software defect prediction tool
Description	Our ELFF defect prediction tool is about to go under licence to a commercial company. The licence negotiations are currently at advanced stages.
IP Reference
Protection	Copyrighted (e.g. software)
Year Protection Granted
Licensed	No
Impact	This is all pending and depends on the outcomes of licencing negotiations.


Description	Fixie YouTube Channel
Form Of Engagement Activity	Engagement focused website, blog or social media channel
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	We developed a YouTube channel with a series of online training videos explaining a variety of aspects of Automatic Programme Repair. These videos explained our tools and techniques as well as the use cases of these primarily for software practitioners, but available for anyone to view via YouTube. We publicised our YouTube channel via of Twitter pages (for the Fixie Project and for our SE research groups).
Year(s) Of Engagement Activity	2022
URL	https://www.youtube.com/watch?v=zJpap6U2yv4

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications