Stable Prediction of Defect-Inducing Software Changes (SPDISC)

Lead Research Organisation: University of Leicester

Department Name: Computer Science

Abstract

Context: software systems have become ever larger and more complex. This inevitably leads to software defects, whose debugging is estimated to cost the global economy 312 billion USD annually. Reducing the number of software defects is a challenging problem, and is particularly important considering the strong pressure towards rapid delivery. Such pressure impedes different parts of the software source code to all receive equally large amount of inspection and testing effort.

With that in mind, machine learning approaches have been proposed for predicting defect-inducing changes in the source code as soon as these changes finish being implemented. Such approaches could enable software engineers to target special testing and inspection attention towards parts of the source code most likely to induce defects, reducing the risk of committing defective changes.

Problem: the predictive performance of existing approaches is unstable, because the underlying defect generating process being modelled may vary over time (i.e., there may be concept drift). This means that practitioners cannot be confident about the prediction ability of existing approaches -- at any given point in time, predictive models may be performing very well or failing dramatically.

Aim and vision: SPDISC aims at creating more stable models for predicting defect-inducing changes, through the development of a novel machine learning approach for automatically adapting to concept drift. When integrated with software versioning systems, the models will provide early, reliable and automated defect-inducing change alerts throughout the lifetime of software projects.

Impact: SPDISC will enable a transformation in the way software developers review and commit their changes. By creating stable models to make software developers aware of defect-inducing changes as soon as these are implemented, it will allow targeted inspection and testing attention towards defect-inducing code throughout the lifetime of software projects. This will reduce the debugging cost and ultimately lead to better software quality.

Proposed approach: an online learning algorithm will be developed to process incoming data as they become available, enabling fast reaction to concept drift. Concept drift will be detected using methods designed to cope with class imbalance, which typically occurs in prediction of defect-inducing software changes. Class imbalance refers to the issue of having a much smaller number of defect-inducing changes than the number of safe changes. The proposed approach will also make use of data from different projects (i.e., transfer learning between domains) to speed up adaptation to concept drift.

Novelty: SPDISC is the first proposal to look into the stability of predictive performance over time in the context of defect-inducing software changes. Most previous work ignored the fact that predictions are required over time, being oblivious of the instability of predictive performance in this problem. To deal with instability, SPDISC will develop the first online transfer learning approach for predicting defect-inducing software changes.

Ambitiousness: online transfer learning between domains with concept drift is not only a very new area of research in software engineering, but also in machine learning. Very few approaches exist for that, and none of them can deal with class-imbalanced problems. Therefore, SPDISC will not only advance software engineering by enabling a transformation in the way software developers review and commit their changes, but also advance the area of machine learning itself.

Timeliness: given the current size and complexity of software systems, the increased number of life-critical applications, and the high competitiveness of the software industry, approaches for improving software quality and reducing the cost of producing and maintaining software are currently of utmost importance.

Planned Impact

SPDISC's beneficiaries are the software industry, software users and related scientific communities.

1) Software Industry
The software industry is SPDISC's main beneficiary. The UK software industry is estimated to be worth more than 9bn GBP, and is the second largest market by value in the EU. Globally, the software industry's estimated value is over 407bn USD. And yet, the global cost of debugging software is estimated to be 312 billion USD annually, representing an enormous loss of revenue. SPDISC will lead to an impact on the economy by reducing debugging cost and increasing software quality.

In particular, SPDISC will empower software developers with early, reliable and automated alerts of defect-inducing software changes throughout the lifetime of software projects. It will enable a transformation in the way software changes are reviewed and committed in software development companies who use software versioning and bug-tracking systems. Defect-inducing changes will be automatically pinpointed for attention right after their implementation, allowing easy and wise allocation of the limited testing and inspection resources. This is specially desirable in companies leaning towards a more agile software development process.

As the software changes will be fresh in the developers' minds when defect alerts are triggered, their inspection will be much cheaper than later debugging cost. In addition, changes typically have few lines of code, further facilitating inspection. Therefore, SPDISC's approach will reduce the risk of committing changes that will lead to defects, reducing debugging cost and increasing software quality. The lower debugging cost will translate into cheaper software cost, as finding and fixing defects typically takes 50% of a software developer's time.

From a project management perspective, as each software change is inherently associated to a single developer, the assignment of developers to inspect defect-inducing changes will be straightforward. With SPDISC, the task of deciding which parts of the source code should receive increased attention and by whom can be delegated to the software developers themselves, freeing project managers to other tasks.

Both large enterprises and SMEs can benefit from SPDISC, as its approach automatically adapts to different environments. I anticipate that software development tools based on SPDISC will be commercialised in the future. One of SPDISC's industrial partners has already expressed interest in doing that. This will assist SMEs in benefitting from SPDISC, increasing their competitiveness and driving faster and more balanced economic growth. This will in turn lead to an impact on society by increasing wealth and employment.

2) Software Users
The more cost-effective software development enabled by SPDISC will consequently bring benefits to software users, who can be private users, users of public services, or other enterprises. Cheaper cost will facilitate access of private users and public services to software. Higher quality will improve quality of life through better and safer software experience. This is key to a world of smart cities, which are greatly controlled by software. It is also important to life-critical software applications, which could pose serious threats if defective. Cheaper and higher quality software will increase the competitiveness of other enterprises who depend on software, driving faster economic growth. Extensions of SPDISC's approach can also potentially help to solve other data analytics problems than defect prediction.

3) Scientific Communities
SPDISC will create a tighter bond between software engineering and machine learning through its new machine learning approach for software engineering. These two areas will benefit from this research. There will also be some impact on mathematical sciences, as part of SPDISC's foundation lies in this area. More details are in the academic beneficiaries summary.

Funded Value:

£100,541

Funded Period:

Jan 18 - Sep 18

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/R006660/1

Principal Investigator:

Leandro Minku

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Artificial Intelligence (30%)

Software Engineering (70%)

Organisations

People	ORCID iD
Leandro Minku (Principal Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Agrawal A (2020) Better software analytics via "DUO": Data mining algorithms using/used-by optimizers in Empirical Software Engineering

Brzezinski D (2021) The impact of data difficulty factors on classification of imbalanced and concept drifting data streams in Knowledge and Information Systems

Cabral G (2019) Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction

Du H (2019) Multi-Source Transfer Learning for Non-Stationary Environments

Minku L (2019) A novel online supervised hyperparameter tuning procedure applied to cross-company software effort estimation in Empirical Software Engineering

Minku, L.L. (2018) Learning from Data Streams in Evolving Environments

Nair V (2018) Data-driven search-based software engineering

Oliveira G (2019) GMM-VRD: A Gaussian Mixture Model for Dealing With Virtual and Real Concept Drifts

Song L (2018) A novel automated approach for software effort estimation based on data augmentation

Song L (2019) Software Effort Interval Prediction via Bayesian Inference and Synthetic Bootstrap Resampling in ACM Transactions on Software Engineering and Methodology

Key Findings
Impact Summary
Policy Influence
Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Description	Context: software systems have become ever larger and more complex. This inevitably leads to software defects, whose debugging is estimated to cost the global economy 312 billion USD annually. Reducing the number of software defects is a challenging problem, and is particularly important considering the strong pressure towards rapid delivery. Such pressure impedes different parts of the software source code to all receive equally large amount of inspection and testing effort. With that in mind, machine learning approaches have been proposed for predicting defect-inducing changes in the source code as soon as these changes finish being implemented. Such approaches could enable software engineers to target special testing and inspection attention towards parts of the source code most likely to induce defects, reducing the risk of committing defective changes. Problem: the predictive performance of existing approaches is unstable, because the underlying defect generating process being modelled may vary over time (i.e., there may be concept drift). This means that practitioners cannot be confident about the prediction ability of existing approaches -- at any given point in time, predictive models may be performing very well or failing dramatically. Key findings: we performed the first detailed investigation of concept drifts that affect the proportion of defect-inducing software changes in software projects. The study reveals that there are both gradual and sudden changes in this proportion over time, and that they negatively affect the prediction performance of existing approaches, preventing them from being used in realistic scenarios. The study also reveals that simplistic approaches such as building predictive models using only the most recent software changes are not enough to tackle this issue. Based on these findings, we proposed a novel approach that is able to perform more consistently well in the presence of this type of concept drift. The approach implements an online learning algorithm able to process incoming data as they become available, enabling fast reaction to the type of concept drift investigated in the study. The approach obtained up to 45.38% higher predictive performance than the best existing approach when applied to software defect prediction, being much closer to potential adoption in practice. Other key findings include the advancement of the machine learning literature on approaches to achieve more consistently high predictive performance over time by making use of data from other domains. Future work: other types of concept drift as well as an approach that makes use of data from different projects to further improve predictive performance on a given project are investigated in the continuation of this grant (EP/R006660/2) at the University of Birmingham. During the initial phase of this grant, the proposed approach was evaluated on GitHub open source projects. A case study with proprietary data is also performed in the continuation of this grant (EP/R006660/2) at the University of Birmingham.
Exploitation Route	A software can be developed for practitioners to be able to adopt our proposed approach within their software development environments. This could potentially be done through one of the industrial partners of the project, or through the creation of a spin out.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	We have shown that our approaches for predicting defect-inducing software changes can provide more consistently high predictive performance in open source projects. A case study of our approaches for predicting defect-inducing software changes has been performed with a company using their proprietary data, showing that these approaches can also improve predictive performance in such scenarios. This shows that such approaches can potentially be adopted by practitioners to help preventing defects in software code both in open source and proprietary data.
Sector	Digital/Communication/Information Technologies (including Software)


Description	Citation in systematic lietrature review - ICSE2019
Geographic Reach	Multiple continents/international
Policy Influence Type	Citation in systematic reviews
URL	https://dl.acm.org/doi/pdf/10.1145/3567550


Description	Citation in systematic lietrature review - IJCNN2019
Geographic Reach	Multiple continents/international
Policy Influence Type	Citation in systematic reviews
URL	https://dl.acm.org/doi/pdf/10.1145/3567550


Description	IASESE School
Geographic Reach	Multiple continents/international
Policy Influence Type	Influenced training of practitioners or researchers
Impact	I gave a tutorial entitled "Data Science for Software Engineering: Important Considerations and Typical Setbacks" at the 15th International Advanced School on Empirical Software Engineering (IASESE 2018). The tutorial discussed how to apply data science for software engineering, including problems such as software defect prediction investigated in this grant. The tutorial raised the audience's awareness of important considerations to make when applying data science for software engineering, and typical setbacks resulting from ignoring such considerations. It provided attendees with knowledge on how to make more informed decisions when applying data science to software engineering, increasing their skill level in this area. The tutorial was attended by around 35 researchers, students and practitioners.


Description	IJCNN 2018
Geographic Reach	Multiple continents/international
Policy Influence Type	Influenced training of practitioners or researchers
Impact	I gave a tutorial entitled "Learning Class Imbalanced Data Streams" at the 2018 IEEE World Congress on Computational Intelligence. The tutorial explained the topic of class imbalance in data streams, which is one of the main characteristics of the problem being investigated in this grant. It then explained approaches to tackle class imbalance in data streams, increasing the attendees' skill level in this area. The tutorial had 52 attendees, according to the Whova app used in the congress.


Description	Transfer Learning for Software Engineering
Geographic Reach	South America
Policy Influence Type	Influenced training of practitioners or researchers


Description	Stable Prediction of Defect-Inducing Software Changes (SPDISC)
Amount	£47,775 (GBP)
Funding ID	EP/R006660/2
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	09/2018
End	05/2019


Title	ICSE 2019
Description	A novel method to take chronology into account when creating and evaluating models for predicting defect-inducing software changes has been proposed. The method was published at: CABRAL, G.; MINKU, L.; SHIHAB, E.; MUJAHID, S. . "Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction", Proceedings of the International Conference on Software Engineering (ICSE), p. 666-676, May 2019. The source code of the implementation is available publicly at Zenodo: https://zenodo.org/record/2555695
Type Of Material	Improvements to research infrastructure
Year Produced	2019
Provided To Others?	Yes
Impact	Other researchers and practitioners will be able to adopt the same methodology in their software defect prediction studies, being able to perform more realistic studies of prediction of defect-inducing software changes. So far, 173 downloads of the tool have been made via Zenodo.
URL	https://zenodo.org/record/2594681


Title	IJCNN 2019 GMM-VRD: A Gaussian Mixture Model for Dealing With Virtual and Real Concept Drifts
Description	A novel method to deal with real and virtual concept drifts based on gaussian mixture models. The method was published at: OLIVEIRA, G. H. F. M.; MINKU, L. L.; OLIVEIRA, A. L. I. . "GMM-VRD: A Gaussian Mixture Model for Dealing With Virtual and Real Concept Drifts", Proceedings of the International Joint Conference on Neural Networks (IJCNN), 8 pages, July 2019. The source code of the implementation is available publicly at GitHub: https://github.com/GustavoHFMO/GMM-VRD
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	Yes
Impact	Other researchers and practitioners will be able to adopt the same methodology in their data stream learning studies.
URL	https://github.com/GustavoHFMO/GMM-VRD


Title	IJCNN 2019 Melanie: Multi-Source Transfer Learning for Non-Stationary Environments
Description	A novel method to transfer knowledge between domains in data stream learning. The method was published at: DU, H.; MINKU, L. L.; ZHOU, H. . "Multi-Source Transfer Learning for Non-Stationary Environments", Proceedings of the International Joint Conference on Neural Networks (IJCNN), July 2019. The source code of the implementation is available publicly at Github: https://github.com/nino2222/Melanie
Type Of Material	Improvements to research infrastructure
Year Produced	2019
Provided To Others?	Yes
Impact	Other researchers and practitioners will be able to adopt the same methodology in their transfer learning studies in data stream mining.
URL	https://github.com/nino2222/Melanie


Title	KAIS 2021 The impact of data difficulty factors on classification of imbalanced and concept drifting data streams
Description	Software implementation of a synthetic data stream generator, which was proposed in the following paper: BRZEZINSKI, D.; MINKU, L.; PEWINSKI, T.; STEFANOWSKI, J.; SZUMACZUK, A. . "The Impact of Data Difficulty Factors on Classification of Imbalanced and Concept Drifting Data Streams", Knowledge and Information Systems (KAIS), 2021.
Type Of Material	Improvements to research infrastructure
Year Produced	2021
Provided To Others?	Yes
Impact	Other researchers will be able to use this tool to generate data streams for their experiments.
URL	https://github.com/dabrze/imbalanced-stream-generator


Title	ICSE 2019 algorithm
Description	A novel algorithm to tackle class imbalance in prediction of defect-inducing software changes has been proposed and implemented. The algorithm was published at: CABRAL, G.; MINKU, L.; SHIHAB, E.; MUJAHID, S. . "Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction", Proceedings of the International Conference on Software Engineering (ICSE), pp. 666-676, May 2019. The algorithm is able to operate in realistic scenarios that take the chronology of the data into account, and achieves better predictive performance than other algorithms proposed in the literature. The source code of the implementation is publicly available at Zenodo: https://zenodo.org/record/2555695
Type Of Material	Computer model/algorithm
Year Produced	2019
Provided To Others?	Yes
Impact	Prediction of defect-inducing software changes has become more suitable for use in practice, due to improved predictive performance under realistic scenarios. So far, 173 downloads of our repository have been made at Zenodo.
URL	https://zenodo.org/record/2555695


Title	ICSE 2019 data
Description	We have collected and pre-processed data from ten GitHub open source projects. The purpose of the data is to train and evaluate machine learning models for prediction of defect-inducing software changes under realistic scenarios that take chronology into account. The data are available at Zenodo: https://zenodo.org/record/2555695
Type Of Material	Database/Collection of data
Year Produced	2019
Provided To Others?	Yes
Impact	Other researchers and practitioners will be able to use the data for building and evaluating machine learning models for prediction of defect-inducing software changes in realistic scenarios that take chronology into account. So far, 173 downloads of our repository at Zenodo have been performed.
URL	https://zenodo.org/record/2555695


Title	IJCNN 2019 GMM-VRD algorithm
Description	A novel method to deal with real and virtual concept drifts based on gaussian mixture models. The method was published at: OLIVEIRA, G. H. F. M.; MINKU, L. L.; OLIVEIRA, A. L. I. . "GMM-VRD: A Gaussian Mixture Model for Dealing With Virtual and Real Concept Drifts", Proceedings of the International Joint Conference on Neural Networks (IJCNN), 8 pages, July 2019. The source code of the implementation is available publicly at GitHub: https://github.com/GustavoHFMO/GMM-VRD
Type Of Material	Computer model/algorithm
Year Produced	2019
Provided To Others?	Yes
Impact	Other researchers and practitioners will be able to adopt the same methodology in their data stream learning studies.
URL	https://github.com/GustavoHFMO/GMM-VRD


Title	IJCNN 2019 Melanie algorithm
Description	A novel algorithm to transfer knowledge between domains in data stream learning. The method was published at: DU, H.; MINKU, L. L.; ZHOU, H. . "Multi-Source Transfer Learning for Non-Stationary Environments", Proceedings of the International Joint Conference on Neural Networks (IJCNN), July 2019. The source code of the implementation is available publicly at Github: https://github.com/nino2222/Melanie
Type Of Material	Computer model/algorithm
Year Produced	2019
Provided To Others?	Yes
Impact	Other researchers and practitioners will be able to adopt the same methodology in their transfer learning studies in data stream mining.
URL	https://github.com/nino2222/Melanie


Title	KAIS 2021 Imbalanced Data Streams and Data Stream Generator
Description	Software code to produce synthetic data streams, including data streams with different imbalanced distributions. The specific data streams created for our study below are also available: BRZEZINSKI, D.; MINKU, L.; PEWINSKI, T.; STEFANOWSKI, J.; SZUMACZUK, A. . "The Impact of Data Difficulty Factors on Classification of Imbalanced and Concept Drifting Data Streams", Knowledge and Information Systems (KAIS), 2021.
Type Of Material	Database/Collection of data
Year Produced	2021
Provided To Others?	Yes
Impact	Other researchers will be able to use the data streams and data stream generator for their studies on data stream learning.
URL	https://github.com/dabrze/imbalanced-stream-generator


Description	Are 20% of Files Responsible for 80% of Defects?
Organisation	University of Sheffield
Department	Department of Computer Science
Country	United Kingdom
Sector	Academic/University
PI Contribution	I contributed with knowledge on software defect prediction in discussions about the research topic, and helped with: the formulation of the research questions, the analysis of the results and their potential impact on software defect prediction studies, writing parts of the paper, and discussing the presentation prepared for delivery at the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2018.
Collaborator Contribution	My partner developed the approach to investigate whether 20% of files are responsible for 80% of defects, discussed the research topic, formulated research questions, designed and ran experiments, analysed the results, wrote a large portion of the paper, prepared and delivered a presentation at the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2018.
Impact	WALKINSHAW, N.; MINKU, L. . "Are 20% of Files Responsible for 80% of Defects?", Proceedings of the 9th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), p. 2.1:2.10, October 2018. Collaboration involving the disciplines of data analytics and software engineering.
Start Year	2018


Description	Dealing with Real and Virtual Concept Drifts
Organisation	Federal University of Pernambuco
Country	Brazil
Sector	Academic/University
PI Contribution	Proposing the research problem, guiding the proposal of the machine learning approach to solve the problem, guiding the design of experiments to evaluate the approach, guiding the evaluation of the approach, guiding and revising the writing of the paper.
Collaborator Contribution	Discussing the proposed approach and experimental design, implementing the approach, running experiments, analysing the results, and writing first draft of the paper.
Impact	OLIVEIRA, G. H. F. M.; MINKU, L. L.; OLIVEIRA, A. L. I. . "GMM-VRD: A Gaussian Mixture Model for Dealing With Virtual and Real Concept Drifts", Proceedings of the International Joint Conference on Neural Networks (IJCNN), July 2019. OLIVEIRA, G.H.F.M.; MINKU, L.L.; OLIVEIRA, A. . "Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model Approach", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021, doi: 10.1109/TKDE.2021.3099690. This collaboration is not multi-disciplinary.
Start Year	2018


Description	Github software changes dataset collection
Organisation	Concordia University
Country	Canada
Sector	Academic/University
PI Contribution	Myself and my team contributed with the proposal of the research topic, formulation of research questions, development of new approach to predict defects in software changes, design of experiments, experimental runs, analysis of results, paper writing, paper response preparation and paper revision.
Collaborator Contribution	My partner contributed with the collection of Github data to evaluate the proposed approach.
Impact	The following paper was accepted for publication: Cabral, G.; Minku, L.; Shibab, E.; Mujahid, S. Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction. International Conference on Software Engineering (ICSE 2019).
Start Year	2018


Description	Transfer learning in non-stationary environments
Organisation	University of Leicester
Country	United Kingdom
Sector	Academic/University
PI Contribution	Proposing research problem, guiding the proposal of the machine learning approach to solve the problem, guiding the design of experiments to evaluate the approach, guiding the evaluation of the approach, guiding and revising the writing of the paper.
Collaborator Contribution	Discussing the proposed approach and experimental design, implementing the approach, running experiments, analysing the results, and writing first draft of the paper.
Impact	DU, H.; MINKU, L.; ZHOU, H. . "MARLINE: Multi-Source Mapping Transfer Learning for Non-Stationary Environments", 20th IEEE International Conference on Data Mining (ICDM), 10 pages, November 2020. DU, H.; MINKU, L. L.; ZHOU, H. . "Multi-Source Transfer Learning for Non-Stationary Environments", Proceedings of the International Joint Conference on Neural Networks (IJCNN), 8 pages, July 2019. This collaboration is not multi-disciplinary.
Start Year	2018


Title	IJCNN 2019 GMM-VRD: A Gaussian Mixture Model for Dealing With Virtual and Real Concept Drifts
Description	A novel method to deal with real and virtual concept drifts based on gaussian mixture models. The method was published at: OLIVEIRA, G. H. F. M.; MINKU, L. L.; OLIVEIRA, A. L. I. . "GMM-VRD: A Gaussian Mixture Model for Dealing With Virtual and Real Concept Drifts", Proceedings of the International Joint Conference on Neural Networks (IJCNN), 8 pages, July 2019. The source code of the implementation is available publicly at GitHub: https://github.com/GustavoHFMO/GMM-VRD
Type Of Technology	Software
Year Produced	2020
Open Source License?	Yes
Impact	Other researchers and practitioners will be able to adopt the same methodology in their data stream learning studies.


Title	IJCNN 2019 Melanie: Multi-Source Transfer Learning for Non-Stationary Environments
Description	A novel method to transfer knowledge between domains in data stream learning. The method was published at: DU, H.; MINKU, L. L.; ZHOU, H. . "Multi-Source Transfer Learning for Non-Stationary Environments", Proceedings of the International Joint Conference on Neural Networks (IJCNN), July 2019. The source code of the implementation is available publicly at Github: https://github.com/nino2222/Melanie
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	Other researchers and practitioners will be able to adopt the same methodology in their transfer learning studies in data stream mining.


Title	KAIS 2021 Data Stream Generator
Description	Software implementation of a synthetic data stream generator, which was proposed in the following paper: BRZEZINSKI, D.; MINKU, L.; PEWINSKI, T.; STEFANOWSKI, J.; SZUMACZUK, A. . "The Impact of Data Difficulty Factors on Classification of Imbalanced and Concept Drifting Data Streams", Knowledge and Information Systems (KAIS), 2021.
Type Of Technology	Software
Year Produced	2021
Impact	Other researchers will be able to use this tool to generate data streams for their experiments.
URL	https://link.springer.com/article/10.1007/s10115-021-01560-w


Title	ORB code
Description	The software extends the Massive Online Analysis (MOA) framework for online learning to include our proposed approach Oversampling Rate Boosting (ORB). The approach is able to perform online class imbalance learning in non-stationary environments with verification latency. The approach was published at: CABRAL, G.; MINKU, L.; SHIHAB, E.; MUJAHID, S. . "Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction", Proceedings of the International Conference on Software Engineering (ICSE), pp. 666-676, May 2019. The license is Creative Commons Attribution 3.0 License (https://creativecommons.org/licenses/by-sa/3.0/legalcode).
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The software has recently been released. It is therefore early to describe its impact. However, due to its improved predictive performance, the proposed approach is more suitable for potential adoption in practice than existing approaches. Its main potential impact is the production of higher quality software for a cheaper cost, through the automatic identification of software changes likely to contain defects at an early stage of software development. So far, 173 downloads of our repository have been made at Zenodo.


Description	Article for the general public
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	An article on the impact of artificial intelligence on skilled jobs was written and published at The Conversation (https://theconversation.com/ai-doctors-and-engineers-are-coming-but-they-wont-be-stealing-high-skill-jobs-101701). The article was also re-published by Yahoo News (https://www.yahoo.com/news/ai-doctors-engineers-coming-won-103101847.html). The intended purpose of the article was to increase the general public's awareness of the potential impact that artificial intelligence technologies such those investigated in this project could have on skilled jobs. The article published at The Conversation was read by more than 9000 from USA, France, UK, Australia and others.
Year(s) Of Engagement Activity	2018
URL	https://theconversation.com/ai-doctors-and-engineers-are-coming-but-they-wont-be-stealing-high-skill...


Description	Artificial Intelligence: What Is It And How Can It Help Us?
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Schools
Results and Impact	This talk was part of a science festival at the North and North-East regions of Brazil. It discussed what is artificial intelligence and how it can help us on various different tasks, including tasks investigated in this grant. The talk was broadcasted live and is available on YouTube. It currently has 407 views, 75 likes and 0 dislikes. The talk was intended at increasing awareness about artificial intelligence and encouraging pupils to join this field once they reach university level.
Year(s) Of Engagement Activity	2020
URL	https://youtu.be/VUiySDwKha4


Description	How will machine learning / AI change the way IT professionals work?
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Postgraduate students
Results and Impact	This was a Panel on "How will machine learning / AI change the way IT professionals work" at the 2020 International Conference on the Quality of Information and Communications Technology. It aimed at sparking discussions and raising awareness of how machine learning and artificial intelligence can benefit software practitioners. The discussion led to participants building up or changing their views in terms of the future of IT in view of machine learning and artificial intelligence.
Year(s) Of Engagement Activity	2020
URL	https://2020.quatic.org/


Description	IEEE Software Column for Practitioners -- Empirical Software Engineering, Predictive Models, and Product Lines
Form Of Engagement Activity	A magazine, newsletter or online publication
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Industry/Business
Results and Impact	This is a column for practitioners published at IEEE Software. The intention of my contribution was to increase practitioners' awareness of the existence of machine learning-based approaches for helping with software engineering tasks.
Year(s) Of Engagement Activity	2018
URL	https://ieeexplore.ieee.org/document/8354421


Description	The Whole is Greater than The Sum of the Parts: On the Value of Machine Learning Ensemble Methods
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Postgraduate students
Results and Impact	This was a keynote at the BEAR PGR Conference in Birmingham. It aimed at discussing machine learning approaches and their uses to postgraduate students who may not be from the area of computer science. Some of the machine learning approaches discussed in the talk have been influenced by this grant. The talk sparked questions, discussions and led to a student requesting detailed information on how to apply machine learning for her work.
Year(s) Of Engagement Activity	2020


Description	Transfer Learning for Software Engineering
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Postgraduate students
Results and Impact	This talk explained transfer learning algorithms and how they can be applied to solve software engineering problems. The purpose was to raise awareness of the potential benefits of this kind of approach to software engineering and to teach students about how they work. Following the talk, we held discussions with all those present. Some members of the audience have shown interest in applying this kind of technique and having future collaborations on this topic.
Year(s) Of Engagement Activity	2021

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications