Stable Prediction of Defect-Inducing Software Changes (SPDISC)
Lead Research Organisation:
University of Birmingham
Department Name: School of Computer Science
Abstract
Context: software systems have become ever larger and more complex. This inevitably leads to software defects, whose debugging is estimated to cost the global economy 312 billion USD annually. Reducing the number of software defects is a challenging problem, and is particularly important considering the strong pressure towards rapid delivery. Such pressure impedes different parts of the software source code to all receive equally large amount of inspection and testing effort.
With that in mind, machine learning approaches have been proposed for predicting defect-inducing changes in the source code as soon as these changes finish being implemented. Such approaches could enable software engineers to target special testing and inspection attention towards parts of the source code most likely to induce defects, reducing the risk of committing defective changes.
Problem: the predictive performance of existing approaches is unstable, because the underlying defect generating process being modelled may vary over time (i.e., there may be concept drift). This means that practitioners cannot be confident about the prediction ability of existing approaches -- at any given point in time, predictive models may be performing very well or failing dramatically.
Aim and vision: SPDISC aims at creating more stable models for predicting defect-inducing changes, through the development of a novel machine learning approach for automatically adapting to concept drift. When integrated with software versioning systems, the models will provide early, reliable and automated defect-inducing change alerts throughout the lifetime of software projects.
Impact: SPDISC will enable a transformation in the way software developers review and commit their changes. By creating stable models to make software developers aware of defect-inducing changes as soon as these are implemented, it will allow targeted inspection and testing attention towards defect-inducing code throughout the lifetime of software projects. This will reduce the debugging cost and ultimately lead to better software quality.
Proposed approach: an online learning algorithm will be developed to process incoming data as they become available, enabling fast reaction to concept drift. Concept drift will be detected using methods designed to cope with class imbalance, which typically occurs in prediction of defect-inducing software changes. Class imbalance refers to the issue of having a much smaller number of defect-inducing changes than the number of safe changes. The proposed approach will also make use of data from different projects (i.e., transfer learning between domains) to speed up adaptation to concept drift.
Novelty: SPDISC is the first proposal to look into the stability of predictive performance over time in the context of defect-inducing software changes. Most previous work ignored the fact that predictions are required over time, being oblivious of the instability of predictive performance in this problem. To deal with instability, SPDISC will develop the first online transfer learning approach for predicting defect-inducing software changes.
Ambitiousness: online transfer learning between domains with concept drift is not only a very new area of research in software engineering, but also in machine learning. Very few approaches exist for that, and none of them can deal with class-imbalanced problems. Therefore, SPDISC will not only advance software engineering by enabling a transformation in the way software developers review and commit their changes, but also advance the area of machine learning itself.
Timeliness: given the current size and complexity of software systems, the increased number of life-critical applications, and the high competitiveness of the software industry, approaches for improving software quality and reducing the cost of producing and maintaining software are currently of utmost importance.
With that in mind, machine learning approaches have been proposed for predicting defect-inducing changes in the source code as soon as these changes finish being implemented. Such approaches could enable software engineers to target special testing and inspection attention towards parts of the source code most likely to induce defects, reducing the risk of committing defective changes.
Problem: the predictive performance of existing approaches is unstable, because the underlying defect generating process being modelled may vary over time (i.e., there may be concept drift). This means that practitioners cannot be confident about the prediction ability of existing approaches -- at any given point in time, predictive models may be performing very well or failing dramatically.
Aim and vision: SPDISC aims at creating more stable models for predicting defect-inducing changes, through the development of a novel machine learning approach for automatically adapting to concept drift. When integrated with software versioning systems, the models will provide early, reliable and automated defect-inducing change alerts throughout the lifetime of software projects.
Impact: SPDISC will enable a transformation in the way software developers review and commit their changes. By creating stable models to make software developers aware of defect-inducing changes as soon as these are implemented, it will allow targeted inspection and testing attention towards defect-inducing code throughout the lifetime of software projects. This will reduce the debugging cost and ultimately lead to better software quality.
Proposed approach: an online learning algorithm will be developed to process incoming data as they become available, enabling fast reaction to concept drift. Concept drift will be detected using methods designed to cope with class imbalance, which typically occurs in prediction of defect-inducing software changes. Class imbalance refers to the issue of having a much smaller number of defect-inducing changes than the number of safe changes. The proposed approach will also make use of data from different projects (i.e., transfer learning between domains) to speed up adaptation to concept drift.
Novelty: SPDISC is the first proposal to look into the stability of predictive performance over time in the context of defect-inducing software changes. Most previous work ignored the fact that predictions are required over time, being oblivious of the instability of predictive performance in this problem. To deal with instability, SPDISC will develop the first online transfer learning approach for predicting defect-inducing software changes.
Ambitiousness: online transfer learning between domains with concept drift is not only a very new area of research in software engineering, but also in machine learning. Very few approaches exist for that, and none of them can deal with class-imbalanced problems. Therefore, SPDISC will not only advance software engineering by enabling a transformation in the way software developers review and commit their changes, but also advance the area of machine learning itself.
Timeliness: given the current size and complexity of software systems, the increased number of life-critical applications, and the high competitiveness of the software industry, approaches for improving software quality and reducing the cost of producing and maintaining software are currently of utmost importance.
Planned Impact
SPDISC's beneficiaries are the software industry, software users and related scientific communities.
1) Software Industry
The software industry is SPDISC's main beneficiary. The UK software industry is estimated to be worth more than 9bn GBP, and is the second largest market by value in the EU. Globally, the software industry's estimated value is over 407bn USD. And yet, the global cost of debugging software is estimated to be 312 billion USD annually, representing an enormous loss of revenue. SPDISC will lead to an impact on the economy by reducing debugging cost and increasing software quality.
In particular, SPDISC will empower software developers with early, reliable and automated alerts of defect-inducing software changes throughout the lifetime of software projects. It will enable a transformation in the way software changes are reviewed and committed in software development companies who use software versioning and bug-tracking systems. Defect-inducing changes will be automatically pinpointed for attention right after their implementation, allowing easy and wise allocation of the limited testing and inspection resources. This is specially desirable in companies leaning towards a more agile software development process.
As the software changes will be fresh in the developers' minds when defect alerts are triggered, their inspection will be much cheaper than later debugging cost. In addition, changes typically have few lines of code, further facilitating inspection. Therefore, SPDISC's approach will reduce the risk of committing changes that will lead to defects, reducing debugging cost and increasing software quality. The lower debugging cost will translate into cheaper software cost, as finding and fixing defects typically takes 50% of a software developer's time.
From a project management perspective, as each software change is inherently associated to a single developer, the assignment of developers to inspect defect-inducing changes will be straightforward. With SPDISC, the task of deciding which parts of the source code should receive increased attention and by whom can be delegated to the software developers themselves, freeing project managers to other tasks.
Both large enterprises and SMEs can benefit from SPDISC, as its approach automatically adapts to different environments. I anticipate that software development tools based on SPDISC will be commercialised in the future. One of SPDISC's industrial partners has already expressed interest in doing that. This will assist SMEs in benefitting from SPDISC, increasing their competitiveness and driving faster and more balanced economic growth. This will in turn lead to an impact on society by increasing wealth and employment.
2) Software Users
The more cost-effective software development enabled by SPDISC will consequently bring benefits to software users, who can be private users, users of public services, or other enterprises. Cheaper cost will facilitate access of private users and public services to software. Higher quality will improve quality of life through better and safer software experience. This is key to a world of smart cities, which are greatly controlled by software. It is also important to life-critical software applications, which could pose serious threats if defective. Cheaper and higher quality software will increase the competitiveness of other enterprises who depend on software, driving faster economic growth. Extensions of SPDISC's approach can also potentially help to solve other data analytics problems than defect prediction.
3) Scientific Communities
SPDISC will create a tighter bond between software engineering and machine learning through its new machine learning approach for software engineering. These two areas will benefit from this research. There will also be some impact on mathematical sciences, as part of SPDISC's foundation lies in this area. More details are in the academic beneficiaries summary.
1) Software Industry
The software industry is SPDISC's main beneficiary. The UK software industry is estimated to be worth more than 9bn GBP, and is the second largest market by value in the EU. Globally, the software industry's estimated value is over 407bn USD. And yet, the global cost of debugging software is estimated to be 312 billion USD annually, representing an enormous loss of revenue. SPDISC will lead to an impact on the economy by reducing debugging cost and increasing software quality.
In particular, SPDISC will empower software developers with early, reliable and automated alerts of defect-inducing software changes throughout the lifetime of software projects. It will enable a transformation in the way software changes are reviewed and committed in software development companies who use software versioning and bug-tracking systems. Defect-inducing changes will be automatically pinpointed for attention right after their implementation, allowing easy and wise allocation of the limited testing and inspection resources. This is specially desirable in companies leaning towards a more agile software development process.
As the software changes will be fresh in the developers' minds when defect alerts are triggered, their inspection will be much cheaper than later debugging cost. In addition, changes typically have few lines of code, further facilitating inspection. Therefore, SPDISC's approach will reduce the risk of committing changes that will lead to defects, reducing debugging cost and increasing software quality. The lower debugging cost will translate into cheaper software cost, as finding and fixing defects typically takes 50% of a software developer's time.
From a project management perspective, as each software change is inherently associated to a single developer, the assignment of developers to inspect defect-inducing changes will be straightforward. With SPDISC, the task of deciding which parts of the source code should receive increased attention and by whom can be delegated to the software developers themselves, freeing project managers to other tasks.
Both large enterprises and SMEs can benefit from SPDISC, as its approach automatically adapts to different environments. I anticipate that software development tools based on SPDISC will be commercialised in the future. One of SPDISC's industrial partners has already expressed interest in doing that. This will assist SMEs in benefitting from SPDISC, increasing their competitiveness and driving faster and more balanced economic growth. This will in turn lead to an impact on society by increasing wealth and employment.
2) Software Users
The more cost-effective software development enabled by SPDISC will consequently bring benefits to software users, who can be private users, users of public services, or other enterprises. Cheaper cost will facilitate access of private users and public services to software. Higher quality will improve quality of life through better and safer software experience. This is key to a world of smart cities, which are greatly controlled by software. It is also important to life-critical software applications, which could pose serious threats if defective. Cheaper and higher quality software will increase the competitiveness of other enterprises who depend on software, driving faster economic growth. Extensions of SPDISC's approach can also potentially help to solve other data analytics problems than defect prediction.
3) Scientific Communities
SPDISC will create a tighter bond between software engineering and machine learning through its new machine learning approach for software engineering. These two areas will benefit from this research. There will also be some impact on mathematical sciences, as part of SPDISC's foundation lies in this area. More details are in the academic beneficiaries summary.
Organisations
- University of Birmingham (Lead Research Organisation)
- Federal Rural University of Pernambuco (Collaboration)
- Concordia University (Collaboration, Project Partner)
- UNIVERSITY OF LEICESTER (Collaboration)
- Federal University of Pernambuco (Collaboration)
- UNIVERSITY OF SHEFFIELD (Collaboration)
- XiLiu Technology Ltd (Project Partner)
- Microsoft Research (Project Partner)
People |
ORCID iD |
Leandro Minku (Principal Investigator) |
Publications


Agrawal A
(2020)
Better software analytics via "DUO": Data mining algorithms using/used-by optimizers
in Empirical Software Engineering

Brzezinski D
(2021)
The impact of data difficulty factors on classification of imbalanced and concept drifting data streams
in Knowledge and Information Systems

Cabral G
(2023)
An investigation of online and offline learning models for online Just-in-Time Software Defect Prediction
in Empirical Software Engineering

Cabral G
(2023)
Towards Reliable Online Just-in-Time Software Defect Prediction
in IEEE Transactions on Software Engineering


Chiu C
(2023)
Smoclust: synthetic minority oversampling based on stream clustering for evolving data streams
in Machine Learning

Chiu CW
(2022)
A Diversity Framework for Dealing With Multiple Types of Concept Drift Based on Clustering in the Model Space.
in IEEE transactions on neural networks and learning systems
Related Projects
Project Reference | Relationship | Related To | Start | End | Award Value |
---|---|---|---|---|---|
EP/R006660/1 | 03/01/2018 | 02/09/2018 | £100,542 | ||
EP/R006660/2 | Transfer | EP/R006660/1 | 03/09/2018 | 01/11/2019 | £47,775 |
Description | Context: software systems have become ever larger and more complex. This inevitably leads to software defects, whose debugging is estimated to cost the global economy 312 billion USD annually. Reducing the number of software defects is a challenging problem, and is particularly important considering the strong pressure towards rapid delivery. Such pressure impedes different parts of the software source code to all receive equally large amount of inspection and testing effort. With that in mind, machine learning approaches have been proposed for predicting defect-inducing changes in the source code as soon as these changes finish being implemented. Such approaches could enable software engineers to target special testing and inspection attention towards parts of the source code most likely to induce defects, reducing the risk of committing defective changes. Problem: the predictive performance of existing approaches is unstable, because the underlying defect generating process being modelled may vary over time (i.e., there may be concept drift). This means that practitioners cannot be confident about the prediction ability of existing approaches -- at any given point in time, predictive models may be performing very well or failing dramatically. Key findings: we provided a detailed understanding of the characteristics of concept drift in prediction of defect-inducing software changes, enabling new approaches to be proposed to overcome the problem posed above. We also performed the first detailed investigation of the benefit of using data from different projects to improve predictive performance in realistic online learning scenarios faced by prediction of defect-inducing software changes. Three different approaches to make use of such data have been proposed to solve the problem posed above. Two of these approaches perform particularly well, helping the predictive performance of prediction models to be more consistently high, dealing the key problem proposed to be addressed in this project. Improvements in predictive performance are of up to 40% during periods of likely concept drifts, and up to 53.9% during the initial stage of the projects. A case study with industry has also been performed, showing that these approaches are not only helpful for open source, but also proprietary projects. Such approaches can be much more reliably adopted by practitioners than previous approaches. When adopted in practice, they have the potential to help significantly reduce the number of software defects. Future work: this project has opened up the path for research in a number of different areas, including further research on how to deal with different types of concept drift, how to automatically tune hyperparameters that control machine learning approaches in realistic scenarios, and how to better use data from different domains to improve predictions in a given domain. Some papers have already been submitted on these topics, and a further case study with industry is being performed. Tools could potentially be developed to make the approaches proposed in this project available for practitioners. |
Exploitation Route | A software tool can be developed for practitioners to be able to adopt our proposed approach within their software development environments. This could potentially be done through one of the industrial partners of the project, or through the creation of a spin out. Related approaches developed by this grant for data stream learning under multiple types of concept drift and with transfer learning could also potentially be applied to different real world problems. |
Sectors | Digital/Communication/Information Technologies (including Software) |
Description | We have shown that our approaches for predicting defect-inducing software changes can provide more consistently high predictive performance in open source projects. A case study of our approaches for predicting defect-inducing software changes has been performed with a company using their proprietary data, showing that these approaches can also improve predictive performance in such scenarios. This shows that such approaches can potentially be adopted by practitioners to help preventing defects in software code both in open source and proprietary data. A software application for online prediction of defect-inducing software changes is being developed in collaboration with a company for adoption in practice. We are also continuing to engage with another company who is demonstrating a similar interest in producing a comercial application. Academic impact is being observed with the online learning approaches for predicting defect-inducing software changes inspiring further work in this area. Our studies featured several times in a recent survey on prediction of defect-inducing software changes (https://dl.acm.org/doi/full/10.1145/3567550), which considers challenges highlighted in our studies (e.g., streaming data) as significant challenges in the field. Among others, our work has influenced studies on new lifelong learning approaches for defect-inducing software changes (https://arxiv.org/pdf/2305.09824.pdf), continuous software bug prediction (https://dl.acm.org/doi/pdf/10.1145/3475716.3475790), and new incremental learning approaches for software defect prediction (https://arxiv.org/pdf/2310.12289.pdf). |
Sector | Digital/Communication/Information Technologies (including Software) |
Description | Citation in systematic lietrature review - ICSE2020 |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Citation in systematic reviews |
URL | https://dl.acm.org/doi/pdf/10.1145/3567550 |
Description | Citation in systematic lietrature review - IJCNN2019 |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Citation in systematic reviews |
URL | https://dl.acm.org/doi/pdf/10.1145/3567550 |
Description | IASESE School |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Influenced training of practitioners or researchers |
Impact | I gave a tutorial entitled "Data Science for Software Engineering: Important Considerations and Typical Setbacks" at the 15th International Advanced School on Empirical Software Engineering (IASESE 2018). The tutorial discussed how to apply data science for software engineering, including problems such as software defect prediction investigated in this grant. The tutorial raised the audience's awareness of important considerations to make when applying data science for software engineering, and typical setbacks resulting from ignoring such considerations. It provided attendees with knowledge on how to make more informed decisions when applying data science to software engineering, increasing their skill level in this area. The tutorial was attended by around 35 researchers, students and practitioners. |
Description | Transfer Learning for Software Engineering |
Geographic Reach | South America |
Policy Influence Type | Influenced training of practitioners or researchers |
Description | Online Semisupervised Learning for Predicting Defect-Inducing Software Changes (Aprendizado Semissupervisionado Online Para Predição de Mudanças Críticas em Software) |
Amount | R$Â 10,000 (BRL) |
Organisation | National Council for Scientific and Technological Development (CNPq) |
Sector | Public |
Country | Brazil |
Start | 02/2023 |
End | 01/2025 |
Title | A12 Effect Size |
Description | Implementation of A12 effect size, facilitating other researchers' use of this measure of effect size in their experimental analyses. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt this measure of effect size in their experimental studies. So far, 3 downloads of the tool have been performed. |
URL | https://zenodo.org/record/3353573 |
Title | CommitGuru - Chinese |
Description | This is an extension of the Commit Guru tool to enable collecting just-in-time software defect prediction data from Chinese git repositories. It was used in the following paper to collect proprietary data in the following paper: TABASSUM, S.; MINKU, L.L.; FENG, D.; CABRAL, G.; SONG, L. . "An Investigation of Cross-Project Learning in Online Just-In-Time Software Defect Prediction", 2020 International Conference on Software Engineering (ICSE), 2020. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use this tool to collect just-in-time software defect prediction data from Chinese git repositories. This tool has been used to perform a just-in-time software defect prediction case study with a Chinese company. |
URL | https://zenodo.org/record/3684635 |
Title | EMSE 2023 - An Investigation of Online and Offline Learning Models for Online Just-in-Time Software Defect Prediction |
Description | Software code for the paper "CABRAL, G.G.; MINKU, L.L.; OLIVEIRA, A.L.I.; PESSOA, D.A.; TABASSUM, S. . "An Investigation of Online and Offline Learning Models for Online Just-in-Time Software Defect Prediction", Empirical Software Engineering Journal (EMSE), vol. 28, article no. 121, September 2023". |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the approaches investigated in the paper. |
URL | https://github.com/dinaldoap/jit-sdp-nn |
Title | ICDM 2020 MARLINE: Multi-source Mapping Transfer Learning for Non-Stationary Environments. |
Description | Software code of the MARLINE approach for multi-source mapping transfer learning for non-stationary environments: DU, H.; MINKU, L.; ZHOU, H. . "MARLINE: Multi-Source Mapping Transfer Learning for Non-Stationary Environments", 20th IEEE International Conference on Data Mining (ICDM), 10 pages, November 2020. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their transfer learning studies in data stream mining. |
URL | https://github.com/nino2222/MARLINE |
Title | ICSE 2020 |
Description | Novel methods to make use of cross-project data for creating models for predicting defect-inducing software changes in realistic environments has been proposed. The methods were published at: TABASSUM, S.; MINKU, L.L.; FENG, D.; CABRAL, G.; SONG, L. . "An Investigation of Cross-Project Learning in Online Just-In-Time Software Defect Prediction", 2020 International Conference on Software Engineering (ICSE), 2020 (accepted). |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their software defect prediction studies, being able to perform more realistic studies of prediction of defect-inducing software changes. However, the methodology was recently accepted for publication. Therefore, it is early to quantify its impact in practice. |
Title | IJCNN 2019 Melanie: Multi-Source Transfer Learning for Non-Stationary Environments |
Description | A novel method to transfer knowledge between domains in data stream learning. The method was published at: DU, H.; MINKU, L. L.; ZHOU, H. . "Multi-Source Transfer Learning for Non-Stationary Environments", Proceedings of the International Joint Conference on Neural Networks (IJCNN), July 2019. The source code of the implementation is available publicly at Github: https://github.com/nino2222/Melanie |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their transfer learning studies in data stream mining. |
URL | https://github.com/nino2222/Melanie |
Title | KAIS 2021 The impact of data difficulty factors on classification of imbalanced and concept drifting data streams |
Description | Software implementation of a synthetic data stream generator, which was proposed in the following paper: BRZEZINSKI, D.; MINKU, L.; PEWINSKI, T.; STEFANOWSKI, J.; SZUMACZUK, A. . "The Impact of Data Difficulty Factors on Classification of Imbalanced and Concept Drifting Data Streams", Knowledge and Information Systems (KAIS), 2021. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use this tool to generate data streams for their experiments. |
URL | https://github.com/dabrze/imbalanced-stream-generator |
Title | Machine Learning 2023 - SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams |
Description | Software code for the paper CHIU, C.W.; MINKU, L.L. . "SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams", Machine Learning, 2023. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the proposed approach for data stream learning. |
URL | https://github.com/michaelchiucw/SMOClust |
Title | PROMISE 2021 OATES Multi-stream online transfer learning for software effort estimation: is it necessary? |
Description | Software code for the OATES approach published in: MINKU, L.L. . "Multi-Stream Online Transfer Learning For Software Effort Estimation - Is It Necessary?", 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE), August 2021. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the tool for estimating software development effort. |
URL | http://doi.org/10.5281/zenodo.5068001 |
Title | TKDE 2021 OGMMF-VRD Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model Approach |
Description | Software code for the OGMMF-VRD approach published in the paper: OLIVEIRA, G.H.F.M.; MINKU, L.L.; OLIVEIRA, A. . "Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model Approach", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021, doi: 10.1109/TKDE.2021.3099690 |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the proposed approach for data stream learning to solve their data stream learning problems. |
URL | https://github.com/GustavoHFMO/OGMMF-VRD |
Title | TNNLS 2020 A Diversity Framework for Dealing with Multiple Types of Concept Drift Based on Clustering in the Model Space |
Description | A novel method to deal with multiple types of concept drift in data stream learning based on diveristy and clustering in the model space mechanisms. The method was published at: CHIU, C. W.; MINKU, L. L. . "A Diversity Framework for Dealing with Multiple Types of Concept Drift Based on Clustering in the Model Space", IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020. The source code of the implementation is available publicly at GitHub (https://github.com/michaelchiucw/CDCMS) and Zenodo (https://zenodo.org/record/4294789#.YDFup8-mOgU). |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their data stream learning studies. |
URL | https://zenodo.org/record/4294789#.YDFup8-mOgU |
Title | TNNLS 2023 - OSNN: An Online Semisupervised Neural Network for Nonstationary Data Streams |
Description | Software code used for the paper "SOARES, R.; MINKU, L. . "OSNN: An Online Semisupervised Neural Network for Nonstationary Data Streams", IEEE Transactions on Neural Networks and Learning Systems, vol. 34, n. 9, September 2023" |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the approaches developed in the paper. |
URL | https://bitbucket.org/rodrigogfs/ossn/src/master/ |
Title | TSE 2022 - A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development |
Description | Code used to continuously evaluate predictive performance of just-in-time software defect prediction models over time, used in the paper: SONG, L.; MINKU, L.L. . "A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development", IEEE Transactions on Software Engineering (TSE), 2022, doi: 10.1109/TSE.2022.3158831. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the code to evaluate just-in-time software defect prediction models over time during the software development process. |
URL | https://github.com/sunnysong14/ContinualPerformanceValidityTSE2022 |
Title | TSE 2022 - Towards Reliable Online Just-in-time Software Defect Prediction |
Description | Software code for the Prediction-Based Sampling Adjustment (PBSA) approach published in the paper CABRAL, G.; MINKU, L.L. . "Towards Reliable Online Just-in-time Software Defect Prediction", IEEE Transactions on Software Engineering (TSE), 2022. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the proposed approach for just-in-time software defect prediction. |
URL | http://doi.org/10.5281/zenodo.6548768 |
Title | TSE 2022 Cross-Project Online Just-In-Time Software Defect Prediction |
Description | Code, datasets and result tables for the online cross-project just-in-time software defect prediction approach published in: Tabassum S, Minku L, Feng D. Cross-Project Online Just-In-Time Software Defect Prediction. IEEE Transactions on Software Engineering, 2022. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers will be able to adopt this tool and datasets in their studies for predicting defects in software changes. Practitioners will also be able to adopt them in their software development companies. |
URL | https://zenodo.org/badge/latestdoi/455513474 |
Title | EMSE 2023 - Algorithm |
Description | Algorithm proposed in the paper "CABRAL, G.G.; MINKU, L.L.; OLIVEIRA, A.L.I.; PESSOA, D.A.; TABASSUM, S. . "An Investigation of Online and Offline Learning Models for Online Just-in-Time Software Defect Prediction", Empirical Software Engineering Journal (EMSE), vol. 28, article no. 121, September 2023" |
Type Of Material | Computer model/algorithm |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the algorithm proposed in the paper. |
URL | https://github.com/dinaldoap/jit-sdp-nn |
Title | EMSE 2023 - Data |
Description | Data used in the paper "CABRAL, G.G.; MINKU, L.L.; OLIVEIRA, A.L.I.; PESSOA, D.A.; TABASSUM, S. . "An Investigation of Online and Offline Learning Models for Online Just-in-Time Software Defect Prediction", Empirical Software Engineering Journal (EMSE), vol. 28, article no. 121, September 2023" |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use the data for their studies and for replication purposes. |
URL | https://github.com/dinaldoap/jit-sdp-data |
Title | ICDM 2020 MARLINE Algorithm |
Description | MARLINE algorithm for multi-source mapping transfer learning for non-stationary environments. |
Type Of Material | Data analysis technique |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use this algorithm for classification problems involving data streams from multiple sources. |
URL | https://github.com/nino2222/MARLINE |
Title | ICSE 2020 algorithm |
Description | A novel algorithm for prediction of defect-inducing software changes using cross-project data has been proposed and implemented. The algorithm was published at: TABASSUM, S.; MINKU, L.L.; FENG, D.; CABRAL, G.; SONG, L. . "An Investigation of Cross-Project Learning in Online Just-In-Time Software Defect Prediction", 2020 International Conference on Software Engineering (ICSE), 2020. The algorithm is able to operate in realistic scenarios that take the chronology of the data into account, and achieves better predictive performance than other algorithms proposed in the literature. |
Type Of Material | Computer model/algorithm |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | A case study with a company has been performed, and the results were positive. The company has now provided more data for an additional case study with this algorithm. |
Title | ICSE 2020 data collection tool |
Description | This is an extension of the Commit Guru tool to enable collecting just-in-time software defect prediction data from Chinese git repositories. It was used in the following paper to collect proprietary data in the following paper: TABASSUM, S.; MINKU, L.L.; FENG, D.; CABRAL, G.; SONG, L. . "An Investigation of Cross-Project Learning in Online Just-In-Time Software Defect Prediction", 2020 International Conference on Software Engineering (ICSE), 2020. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use this tool to collect just-in-time software defect prediction data from Chinese git repositories. This tool has been used to perform a just-in-time software defect prediction case study with a Chinese company. |
URL | https://zenodo.org/record/3684635 |
Title | IJCNN 2019 Melanie algorithm |
Description | A novel algorithm to transfer knowledge between domains in data stream learning. The method was published at: DU, H.; MINKU, L. L.; ZHOU, H. . "Multi-Source Transfer Learning for Non-Stationary Environments", Proceedings of the International Joint Conference on Neural Networks (IJCNN), July 2019. The source code of the implementation is available publicly at Github: https://github.com/nino2222/Melanie |
Type Of Material | Computer model/algorithm |
Year Produced | 2019 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their transfer learning studies in data stream mining. |
URL | https://github.com/nino2222/Melanie |
Title | KAIS 2021 Data Streams |
Description | Several data streams containing multiple data distributions for evaluating data stream learning algorithms. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use these data streams for their experiments to evaluate data stream learning algorithms. |
URL | https://github.com/dabrze/imbalanced-stream-generator |
Title | KAIS 2021 Imbalanced Data Streams and Data Stream Generator |
Description | Software code to produce synthetic data streams, including data streams with different imbalanced distributions. The specific data streams created for our study below are also available: BRZEZINSKI, D.; MINKU, L.; PEWINSKI, T.; STEFANOWSKI, J.; SZUMACZUK, A. . "The Impact of Data Difficulty Factors on Classification of Imbalanced and Concept Drifting Data Streams", Knowledge and Information Systems (KAIS), 2021. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use the data streams and data stream generator for their studies on data stream learning. |
URL | https://github.com/dabrze/imbalanced-stream-generator |
Title | MARLINE:Multi-Source Mapping Transfer Learning forNon-Stationary Environments |
Description | This release include grid searches' results and Supplementary Material for {MARLINE}: Multi-Source Mapping TransferLearning for Non-Stationary Environments. |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Other researchers will be able to replicate the results in our paper: DU, H.; MINKU, L.; ZHOU, H. . "MARLINE: Multi-Source Mapping Transfer Learning for Non-Stationary Environments", 20th IEEE International Conference on Data Mining (ICDM), 10 pages, November 2020. |
URL | https://zenodo.org/record/4040990 |
Title | Machine Learning 2023 - Data |
Description | Data used in the paper "CHIU, C.W.; MINKU, L.L. . "SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams", Machine Learning, 2023". |
Type Of Material | Database/Collection of data |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use the data adopted in the paper. |
URL | https://github.com/michaelchiucw/SMOClust |
Title | Machine Learning 2023 - SMOClust Algorithm |
Description | Algorithm proposed in the paper "CHIU, C.W.; MINKU, L.L. . "SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams", Machine Learning, 2023" |
Type Of Material | Computer model/algorithm |
Year Produced | 2023 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the algorithm proposed in the paper. |
URL | https://github.com/michaelchiucw/SMOClust |
Title | PROMISE 2021 OATES Algorithm |
Description | This algorithm can be used for learning predictive models able to estimate software development effort. |
Type Of Material | Data analysis technique |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the algorithm for learning models able to estimate software development effort. |
URL | http://doi.org/10.5281/zenodo.5068001 |
Title | Prediction Based Sampling Adjustment (PBSA) |
Description | Initial version of the PBSA. Be aware that when executed, this version triggers 30 MOA threads at the same time. Therefore, running a experiment may take from several minutes to hours. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use the same preprocessed data as in the paper CABRAL, G.; MINKU, L.L. . "Towards Reliable Online Just-in-time Software Defect Prediction", IEEE Transactions on Software Engineering (TSE), 2022. |
URL | https://zenodo.org/record/6548768 |
Title | TKDE 2021 Data |
Description | Data streams used in the following paper: OLIVEIRA, G.H.F.M.; MINKU, L.L.; OLIVEIRA, A. . "Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model Approach", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021, doi: 10.1109/TKDE.2021.3099690. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | These data enable the evaluation of data stream learning algorithms under various concept drift conditions. |
URL | https://github.com/GustavoHFMO/OGMMF-VRD |
Title | TKDE 2021 OGMMF-VRD Algorithm |
Description | Algorithm for mining data streams with virtual and real concept drifts. |
Type Of Material | Data analysis technique |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Researchers and practitioners will be able to use the algorithm for solving various classification problems where data become available in the form of streams. |
URL | https://github.com/GustavoHFMO/OGMMF-VRD |
Title | TNNLS 2020 algorithm |
Description | A novel algorithm to deal with multiple types of concept drift in data stream learning based on diveristy and clustering in the model space mechanisms. The method was published at: CHIU, C. W.; MINKU, L. L. . "A Diversity Framework for Dealing with Multiple Types of Concept Drift Based on Clustering in the Model Space", IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020. The source code of the implementation is available publicly at GitHub (https://github.com/michaelchiucw/CDCMS) and Zenodo (https://zenodo.org/record/4294789#.YDFup8-mOgU). |
Type Of Material | Computer model/algorithm |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their data stream learning studies. |
URL | https://zenodo.org/record/4294789#.YDFxfs-mOgV |
Title | TNNLS 2023 - Data |
Description | Data used in the paper "SOARES, R.; MINKU, L. . "OSNN: An Online Semisupervised Neural Network for Nonstationary Data Streams", IEEE Transactions on Neural Networks and Learning Systems, vol. 34, n. 9, September 2023" |
Type Of Material | Database/Collection of data |
Year Produced | 2024 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use the data in their studies and for replication purposes. |
URL | https://bitbucket.org/rodrigogfs/ossn |
Title | TNNLS 2023 - OSNN Algorithm |
Description | Algorithm proposed in the paper "SOARES, R.; MINKU, L. . "OSNN: An Online Semisupervised Neural Network for Nonstationary Data Streams", IEEE Transactions on Neural Networks and Learning Systems, vol. 34, n. 9, September 2023". |
Type Of Material | Computer model/algorithm |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the algorithm proposed in the paper. |
URL | https://bitbucket.org/rodrigogfs/ossn |
Title | TSE 2022 Algorithm |
Description | Algorithm for cross-project online just-in-time software defect prediction proposed in the following paper: TABASSUM, S.; MINKU, L.L.; FENG, D. . "Cross-Project Online Just-In-Time Software Defect Prediction", IEEE Transactions on Software Engineering (TSE), 2022. |
Type Of Material | Data analysis technique |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use this algorithm for creating models able to predict defects in software changes. |
URL | https://zenodo.org/badge/latestdoi/455513474 |
Title | TSE 2022 Continuous Evaluation |
Description | Preprocessed data used in the paper SONG, L.; MINKU, L.L. . "A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development", IEEE Transactions on Software Engineering (TSE), 2022, doi: 10.1109/TSE.2022.3158831. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers will be able to use the same preprocessed data as in the paper SONG, L.; MINKU, L.L. . "A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development", IEEE Transactions on Software Engineering (TSE), 2022, doi: 10.1109/TSE.2022.3158831. |
Title | TSE 2022 PBSA Algorithm |
Description | Algorithm to create machine learning models for just-in-time software defect prediction published in the paper: CABRAL, G.; MINKU, L.L. . "Towards Reliable Online Just-in-time Software Defect Prediction", IEEE Transactions on Software Engineering (TSE), 2022. |
Type Of Material | Computer model/algorithm |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the proposed algorithm to create just-in-time software defect prediction models. |
URL | http://doi.org/10.5281/zenodo.6548768 |
Title | TSE 2022 Preprocessed Data |
Description | Preprocessed just-in-time software defect prediction data streams adopted in the following paper; TABASSUM, S.; MINKU, L.L.; FENG, D. . "Cross-Project Online Just-In-Time Software Defect Prediction", IEEE Transactions on Software Engineering (TSE), 2022. |
Type Of Material | Database/Collection of data |
Year Produced | 2022 |
Provided To Others? | Yes |
Impact | Other researchers and practitioners will be able to use the preprocessed datasets adopted in this study for learning software defect prediction models. |
URL | https://zenodo.org/badge/latestdoi/455513474 |
Title | michaelchiucw/CDCMS: CDCMS-TNNLS2020 |
Description | This version of CDCMS is used in [1]. [1] CHIU, C. W.; MINKU, L. L. . "A Diversity Framework for Dealing with Multiple Types of Concept Drift Based on Clustering in the Model Space", IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020 (accepted). |
Type Of Material | Database/Collection of data |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Datasets generated for the study conducted in the paper CHIU, C. W.; MINKU, L. L. . "A Diversity Framework for Dealing with Multiple Types of Concept Drift Based on Clustering in the Model Space", IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020 |
URL | https://zenodo.org/record/4294789 |
Description | Are 20% of Files Responsible for 80% of Defects? |
Organisation | University of Sheffield |
Department | Department of Computer Science |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | I contributed with knowledge on software defect prediction in discussions about the research topic, and helped with: the formulation of the research questions, the analysis of the results and their potential impact on software defect prediction studies, writing parts of the paper, and discussing the presentation prepared for delivery at the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2018. |
Collaborator Contribution | My partner developed the approach to investigate whether 20% of files are responsible for 80% of defects, discussed the research topic, formulated research questions, designed and ran experiments, analysed the results, wrote a large portion of the paper, prepared and delivered a presentation at the ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) 2018. |
Impact | WALKINSHAW, N.; MINKU, L. . "Are 20% of Files Responsible for 80% of Defects?", Proceedings of the 9th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), p. 2.1:2.10, October 2018. Collaboration involving the disciplines of data analytics and software engineering. |
Start Year | 2018 |
Description | Dealing with Real and Virtual Concept Drifts |
Organisation | Federal University of Pernambuco |
Country | Brazil |
Sector | Academic/University |
PI Contribution | Proposing the research problem, guiding the proposal of the machine learning approach to solve the problem, guiding the design of experiments to evaluate the approach, guiding the evaluation of the approach, guiding and revising the writing of the paper. |
Collaborator Contribution | Discussing the proposed approach and experimental design, implementing the approach, running experiments, analysing the results, and writing first draft of the paper. |
Impact | OLIVEIRA, G. H. F. M.; MINKU, L. L.; OLIVEIRA, A. L. I. . "GMM-VRD: A Gaussian Mixture Model for Dealing With Virtual and Real Concept Drifts", Proceedings of the International Joint Conference on Neural Networks (IJCNN), July 2019. OLIVEIRA, G.H.F.M.; MINKU, L.L.; OLIVEIRA, A. . "Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model Approach", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021, doi: 10.1109/TKDE.2021.3099690. This collaboration is not multi-disciplinary. |
Start Year | 2018 |
Description | Github software changes dataset collection |
Organisation | Concordia University |
Country | Canada |
Sector | Academic/University |
PI Contribution | Myself and my team contributed with the proposal of the research topic, formulation of research questions, development of new approach to predict defects in software changes, design of experiments, experimental runs, analysis of results, paper writing, paper response preparation and paper revision. |
Collaborator Contribution | My partner contributed with the collection of Github data to evaluate the proposed approach. |
Impact | The following paper was accepted for publication: Cabral, G.; Minku, L.; Shibab, E.; Mujahid, S. Class Imbalance Evolution and Verification Latency in Just-in-Time Software Defect Prediction. International Conference on Software Engineering (ICSE 2019). |
Start Year | 2018 |
Description | Semi-Supervised Data Stream Learning |
Organisation | Federal Rural University of Pernambuco |
Country | Brazil |
Sector | Academic/University |
PI Contribution | The initial work on semi-supervised data stream learning done based on this grant has led to several directions of future research. I continue to work on this topic in collaborations resulting from this grant. |
Collaborator Contribution | Discussing the proposed approach and experimental design, implementing the approach, running experiments, analysing the results, and writing first draft of papers. |
Impact | We are currently writing papers with the outcomes of the research. |
Start Year | 2023 |
Description | Transfer learning in non-stationary environments |
Organisation | University of Leicester |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Proposing research problem, guiding the proposal of the machine learning approach to solve the problem, guiding the design of experiments to evaluate the approach, guiding the evaluation of the approach, guiding and revising the writing of the paper. |
Collaborator Contribution | Discussing the proposed approach and experimental design, implementing the approach, running experiments, analysing the results, and writing first draft of the paper. |
Impact | DU, H.; MINKU, L.; ZHOU, H. . "MARLINE: Multi-Source Mapping Transfer Learning for Non-Stationary Environments", 20th IEEE International Conference on Data Mining (ICDM), 10 pages, November 2020. DU, H.; MINKU, L. L.; ZHOU, H. . "Multi-Source Transfer Learning for Non-Stationary Environments", Proceedings of the International Joint Conference on Neural Networks (IJCNN), 8 pages, July 2019. This collaboration is not multi-disciplinary. |
Start Year | 2018 |
Title | EMSE 2023 - An Investigation of Online and Offline Learning Models for Online Just-in-Time Software Defect Prediction (2023) |
Description | Software code implementing the offline just-in-time software defect prediction methods from the paper "CABRAL, G.G.; MINKU, L.L.; OLIVEIRA, A.L.I.; PESSOA, D.A.; TABASSUM, S. . "An Investigation of Online and Offline Learning Models for Online Just-in-Time Software Defect Prediction", Empirical Software Engineering Journal (EMSE), vol. 28, article no. 121, September 2023" |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | Other researchers and practitioners will be able to adopt the method proposed in the paper. |
Title | ICDM 2020 MARLINE |
Description | Software code of the MARLINE approach for multi-source mapping transfer learning for non-stationary environments, published at: DU, H.; MINKU, L.; ZHOU, H. . "MARLINE: Multi-Source Mapping Transfer Learning for Non-Stationary Environments", 20th IEEE International Conference on Data Mining (ICDM), 10 pages, November 2020. |
Type Of Technology | Software |
Year Produced | 2020 |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their transfer learning studies in data stream mining. |
URL | https://ieeexplore.ieee.org/document/9338433 |
Title | IJCNN 2019 Melanie: Multi-Source Transfer Learning for Non-Stationary Environments |
Description | A novel method to transfer knowledge between domains in data stream learning. The method was published at: DU, H.; MINKU, L. L.; ZHOU, H. . "Multi-Source Transfer Learning for Non-Stationary Environments", Proceedings of the International Joint Conference on Neural Networks (IJCNN), July 2019. The source code of the implementation is available publicly at Github: https://github.com/nino2222/Melanie |
Type Of Technology | Software |
Year Produced | 2019 |
Open Source License? | Yes |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their transfer learning studies in data stream mining. |
Title | KAIS 2021 Data Stream Generator |
Description | Software implementation of a synthetic data stream generator, which was proposed in the following paper: BRZEZINSKI, D.; MINKU, L.; PEWINSKI, T.; STEFANOWSKI, J.; SZUMACZUK, A. . "The Impact of Data Difficulty Factors on Classification of Imbalanced and Concept Drifting Data Streams", Knowledge and Information Systems (KAIS), 2021. |
Type Of Technology | Software |
Year Produced | 2021 |
Impact | Other researchers will be able to use this tool to generate data streams for their experiments. |
URL | https://link.springer.com/article/10.1007/s10115-021-01560-w |
Title | Machine Learning 2023 - SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams (2023) |
Description | Software code implementing the SMOClust method for data stream learning proposed in the paper "CHIU, C.W.; MINKU, L.L. . SMOClust: Synthetic Minority Oversampling based on Stream Clustering for Evolving Data Streams, Machine Learning, 2023" |
Type Of Technology | Software |
Year Produced | 2023 |
Open Source License? | Yes |
Impact | Other researchers and practitioners will be able to adopt the method proposed in the paper. |
Title | PROMISE 2021 OATES |
Description | Software code for the OATES approach published in: MINKU, L.L. . "Multi-Stream Online Transfer Learning For Software Effort Estimation - Is It Necessary?", 17th International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE), August 2021. |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | Other researchers and practitioners will be able to adopt the tool for estimating software development effort. |
URL | https://dl.acm.org/doi/10.1145/3475960.3475988 |
Title | Software - ICSE 2020 data collection |
Description | This is an extension of the Commit Guru tool to enable collecting just-in-time software defect prediction data from Chinese git repositories. It was used in the following paper to collect proprietary data in the following paper: TABASSUM, S.; MINKU, L.L.; FENG, D.; CABRAL, G.; SONG, L. . "An Investigation of Cross-Project Learning in Online Just-In-Time Software Defect Prediction", 2020 International Conference on Software Engineering (ICSE), 2020. |
Type Of Technology | Software |
Year Produced | 2020 |
Open Source License? | Yes |
Impact | Other researchers will be able to use this tool to collect just-in-time software defect prediction data from Chinese git repositories. This tool has been used to perform a just-in-time software defect prediction case study with a Chinese company. |
Title | TKDE 2021 OGMMF-VRD |
Description | Software code for the OGMMF-VRD approach published in the paper: OLIVEIRA, G.H.F.M.; MINKU, L.L.; OLIVEIRA, A. . "Tackling Virtual and Real Concept Drifts: An Adaptive Gaussian Mixture Model Approach", IEEE Transactions on Knowledge and Data Engineering (TKDE), 2021, doi: 10.1109/TKDE.2021.3099690 |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
Impact | Other researchers and practitioners will be able to use the proposed approach for data stream learning to solve their data stream learning problems. |
URL | https://ieeexplore.ieee.org/document/9501986 |
Title | TNNLS 2023 - OSNN: An Online Semisupervised Neural Network for Nonstationary Data Streams (2022) |
Description | Software code implementing the OSNN algorithm from the paper "SOARES, R.; MINKU, L. . "OSNN: An Online Semisupervised Neural Network for Nonstationary Data Streams", IEEE Transactions on Neural Networks and Learning Systems, vol. 34, n. 9, September 2023" |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Other researchers and practitioners will be able to adopt the proposed method. |
Title | TSE 2022 |
Description | Software code for the online cross-project just-in-time software defect prediction approach published in: Tabassum S, Minku L, Feng D. Cross-Project Online Just-In-Time Software Defect Prediction. IEEE Transactions on Software Engineering, 2022. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Other researchers will be able to adopt this tool in their studies for predicting defects in software changes. Practitioners will also be able to adopt it in their software development companies. |
URL | https://ieeexplore.ieee.org/document/9709674 |
Title | TSE 2022 - A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development (2022) |
Description | Code used to continuously evaluate predictive performance of just-in-time software defect prediction models over time, used in the paper: SONG, L.; MINKU, L.L. . "A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development", IEEE Transactions on Software Engineering (TSE), 2022, doi: 10.1109/TSE.2022.3158831. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Other researchers and practitioners will be able to use the code to evaluate just-in-time software defect prediction models over time during the software development process. |
URL | https://github.com/sunnysong14/ContinualPerformanceValidityTSE2022 |
Title | TSE 2022 - Towards Reliable Online Just-in-time Software Defect Prediction (2022) |
Description | Software code for the Prediction-Based Sampling Adjustment (PBSA) approach published in the paper CABRAL, G.; MINKU, L.L. . "Towards Reliable Online Just-in-time Software Defect Prediction", IEEE Transactions on Software Engineering (TSE), 2022. |
Type Of Technology | Software |
Year Produced | 2022 |
Open Source License? | Yes |
Impact | Other researchers and practitioners will be able to use the proposed approach for just-in-time software defect prediction. |
URL | http://doi.org/10.5281/zenodo.6548768 |
Title | michaelchiucw/CDCMS: CDCMS-TNNLS2020 |
Description | This version of CDCMS is used in [1]. [1] CHIU, C. W.; MINKU, L. L. . "A Diversity Framework for Dealing with Multiple Types of Concept Drift Based on Clustering in the Model Space", IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020 (accepted). |
Type Of Technology | Software |
Year Produced | 2020 |
Impact | Other researchers and practitioners will be able to adopt the same methodology in their data stream learning studies. |
URL | https://zenodo.org/record/4294789 |
Title | michaelchiucw/CDCMS: CDCMS-TNNLS2020 |
Description | [Please use this version] Adding back the required interface class for QStatistics.java. It will cause "java.lang.ExceptionInInitializerError" when it is missing. This is not an update to the implementation of CDCMS. This file had been used in the experiments of the published paper. I just forgot to upload it. My apologies. This version of CDCMS is used in [1]. [1] CHIU, C. W.; MINKU, L. L. . "A Diversity Framework for Dealing with Multiple Types of Concept Drift Based on Clustering in the Model Space", IEEE Transactions on Neural Networks and Learning Systems (IEEE TNNLS), 2020 (accepted). |
Type Of Technology | Software |
Year Produced | 2021 |
URL | https://zenodo.org/record/4294776 |
Title | michaelchiucw/DiversityPool: DP-IJCNN2018 |
Description | This version of Diversity Pool is used in [1]. [1] CHIU, C.W.; MINKU, L.L. . "Diversity-Based Pool of Models for Dealing with Recurring Concepts", IEEE International Joint Conference on Neural Networks, p. 2759-2766, July 2018. |
Type Of Technology | Software |
Year Produced | 2020 |
URL | https://zenodo.org/record/4119216 |
Title | minkull/OATES: |
Description | This release contains OATES code used in the following paper, after renaming some of the variables and class names: MINKU, L.L. Multi-Stream Online Transfer Learning For Software Effort Estimation -- Is It Necessary? International Conference on Predictive Models and Data Analytics in Software Engineering (PROMISE) 2021. |
Type Of Technology | Software |
Year Produced | 2021 |
Open Source License? | Yes |
URL | https://zenodo.org/record/5068000 |
Description | Artificial Intelligence: What Is It And How Can It Help Us? |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Schools |
Results and Impact | This talk was part of a science festival at the North and North-East regions of Brazil. It discussed what is artificial intelligence and how it can help us on various different tasks, including tasks investigated in this grant. The talk was broadcasted live and is available on YouTube. It currently has 407 views, 75 likes and 0 dislikes. The talk was intended at increasing awareness about artificial intelligence and encouraging pupils to join this field once they reach university level. |
Year(s) Of Engagement Activity | 2020 |
URL | https://youtu.be/VUiySDwKha4 |
Description | How will machine learning / AI change the way IT professionals work? |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | This was a Panel on "How will machine learning / AI change the way IT professionals work" at the 2020 International Conference on the Quality of Information and Communications Technology. It aimed at sparking discussions and raising awareness of how machine learning and artificial intelligence can benefit software practitioners. The discussion led to participants building up or changing their views in terms of the future of IT in view of machine learning and artificial intelligence. |
Year(s) Of Engagement Activity | 2020 |
URL | https://2020.quatic.org/ |
Description | IEEE Software Column for Practitioners -- Highlights from ICSE 2019 |
Form Of Engagement Activity | A magazine, newsletter or online publication |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | This is a column for practitioners published at IEEE Software. The intention of my contribution was to increase practitioners' awareness of the existence of intelligent automated approaches for software testing. |
Year(s) Of Engagement Activity | 2019 |
URL | https://ieeexplore.ieee.org/document/8802626 |
Description | Keynote at the International Joint Conference on Neural Networks |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Postgraduate students |
Results and Impact | This was a keynote that has mainly featured the following paper: SOARES, R.; MINKU, L. . "OSNN: An Online Semisupervised Neural Network for Nonstationary Data Streams", IEEE Transactions on Neural Networks and Learning Systems, 2021 (in press), doi: 10.1109/TNNLS.2021.3132584. It was intended to disseminate research results to an audience of academics, but also some professional practitioners. |
Year(s) Of Engagement Activity | 2022 |
Description | Overcoming the Challenge of Limited Labeled Data in Data Stream Learning |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | This talk explained semi-supervised learning and transfer learning strategies to deal with lack of labelled data in data stream learning. Following the talk, we held discussions with all those present. Some members of the audience have shown interest in building on the work. |
Year(s) Of Engagement Activity | 2023 |
Description | The Whole is Greater than The Sum of the Parts: On the Value of Machine Learning Ensemble Methods |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Postgraduate students |
Results and Impact | This was a keynote at the BEAR PGR Conference in Birmingham. It aimed at discussing machine learning approaches and their uses to postgraduate students who may not be from the area of computer science. Some of the machine learning approaches discussed in the talk have been influenced by this grant. The talk sparked questions, discussions and led to a student requesting detailed information on how to apply machine learning for her work. |
Year(s) Of Engagement Activity | 2020 |
Description | Transfer Learning for Software Engineering |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | This talk explained transfer learning algorithms and how they can be applied to solve software engineering problems. The purpose was to raise awareness of the potential benefits of this kind of approach to software engineering and to teach students about how they work. Following the talk, we held discussions with all those present. Some members of the audience have shown interest in applying this kind of technique and having future collaborations on this topic. |
Year(s) Of Engagement Activity | 2021 |