Automatic repair of natural source code

Lead Research Organisation: University of York

Department Name: Computer Science

Abstract

Fixing software defects is a time-consuming and costly activity. One of the main reasons why software debugging is so expensive is that it still remains mainly a manual activity. Fixing a bug is a complex process, which consists of many different steps including finding and understanding the underlying cause of the bug, identifying a set of changes that address the bug correctly, and finally verifying that those changes are correct. Automating this process (or parts of it) can potentially reduce the time, cost and effort it takes to fix bugs, and therefore the quality of the produced software.

Automatic Program repair (APR) is defined as the activity of removing a software defect during software maintenance without human intervention. The most popular approaches to automatic program repair rely on test suites as proxies of the behavioural specification of the system under repair. Although these approaches have demonstrated promising results, such as the synthesis of a fix for the famous Hearbleed bug, they face serious challenges. The main one is the correctness of the generated patches. Test suites are an imperfect metric of program correctness. Therefore techniques, which rely on test suites to evaluate the correctness of the generated patches, tend to overfit the test suite. Additionally, test-driven APR techniques can have scalability limitations in terms of execution time. The space of possible patches is vast and correct patches occur sparsely. In test-driven APR approaches the search time is mainly dominated by the validation of the patches, i.e. by the execution of the test suite. Existing approaches try to address this issue by employing various strategies such as parallel execution of test suites or the use of just a subset of the test suite (sample of positive test cases and all the negative ones). Although, such strategies improve the execution time, scalability of APR approaches remains a challenge.

The motivation of the project is to address overfitting and scalability limitations of APR techniques. The main hypothesis is the following: Statistical properties of source code can improve the state-of-the-art generate-and-validate techniques for APR in terms of patch correctness and execution time.

Statistical properties of source code (i.e. naturalness of source code) can help address the challenges of APR techniques in the following ways:

1) Heuristics based on the statistical properties of source code can be used to rank the results of fault localisation algorithms, and therefore augment their ability to correctly identify faulty lines of code.
2) Execution time of test-driven APR techniques can be reduced by discarding unnatural patches without executing the test suite.
3) Correctness of candidate patches can be improved by using the naturalness of code as part of the fitness function.

To measure the naturalness of source code, this project will mine existing software repositories such as GitHub and it will develop language models, similar to those used in the Natural Language Processing (NLP) domain. These models will be used to assess the naturalness of proposed patches.

Planned Impact

The potential impact of automatic program repair is substantial. Automatic repair of bugs has the potential to lower the cost of software development and improve the quality of the produced software. In the long run the automatic repair of software will likely be a necessity. History suggests that the systems built are becoming more and more complex and interconnected, and as a result finding and correcting defects in them becomes more and more challenging.
The impact of the project can be decomposed into four different areas:

- Economy: Engineering high quality software at low cost can be a potential competitive advantage for the UK software engineering sector. The results of the project will contribute towards this goal, and therefore they will be disseminated to companies developing software in the UK. The industrial collaborator of the project, is expected to use the developed tool internally in projects. Moreover, the results of the project will be disseminated to other industrial collaborators of the DEIS research group. Finally, we plan to expand the industrial user-base of the approach, by publishing the results in journals, which target an industrial audience. To strengthen the industrial impact of the project, we plan to make a number of visits to interested companies in order to promote the project and train engineers on the developed tools and techniques.

- Society: Software is transforming various domains of everyday life. For example, autonomous vehicles can provide mobility for individuals, who cannot normally drive; home automation enables ageing population to live independently for longer; and smart medical devices can improve the life of patients. Approaches which automate the removal of software defects can potentially bring value to society by reducing the overall cost of software development and by increasing the reliability of software. To increase awareness of software quality and debugging, we plan to reach the general public by publishing the results of the project in appropriate outlets, such as The Conversation website. Moreover, the results of the project will be used to establish future research in the important area of reliable software through funding bids, and research projects.

- Knowledge: We expect that the results of the project can enhance the research capacity, knowledge, and skills of businesses and organisations in software debugging. In order to ensure the impact of the proposed project several steps will be taken. First, close collaboration with engineers from the industrial partner (IBM) will enable us the dissemination of the produced knowledge to them. Moreover, to achieve wider dissemination of the results, the prototypical implementation, relevant datasets, and a set of demonstrators of the techniques will be made available via a public website. Moreover, a workshop on software debugging, which will bring together interested academics and commercial organisations, will be held at the end of the project. In addition to the workshop, the results of the project will be published in high-impact venues such as the ASE conference, and tutorial sessions will be organised at these events.

- People: One of the main aspirations of the project is to further develop the skills of engineers industry-wide by making them familiar with novel software engineering techniques and tools. To this end, the investigators will propose talks and presentations at local and national engineering user-groups such as SoCraTes UK.

Funded Value:

£131,677

Funded Period:

Dec 18 - Oct 20

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/S000143/1

Principal Investigator:

Nicholas Matragkas

Research Subject:

Info. & commun. Technol. (80%)

Linguistics (20%)

Research Topic:

Artificial Intelligence (20%)

Computational Linguistics (20%)

Software Engineering (60%)

Organisations

People	ORCID iD
Nicholas Matragkas (Principal Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Barmpis K (2020) Polyglot and Distributed Software Repository Mining with Crossflow

Dziurzanski P (2020) Empirical Analysis of 1-edit Degree Patches in Syntax-Based Automatic Program Repair

Kolovos D (2019) Crossflow: A Framework for Distributed Mining of Software Repositories

Key Findings
Further Funding
Research Databases and Models
Research Tools and Methods
Collaboration
Software and Technical Products
Engagement Activities


Description	The stochastic nature of the state-of-the-art syntax-based automatic program repair (APR) techniques limits the repetitiveness of the patching process. In the experiments conducted in the course of the project, a number of bugs prior reported to be fixed by the considered tools has not been successfully patched using the same tools despite repeating the APR process hundreds of times. Hence, we concluded that the stochastic nature of the known techniques questions their applicability by practitioners. We have analysed the patches automatically generated by the most popular APR tools and found that, in reality, they do not alter more than two items (modification points, MPs) in software code, even if they have been reported to modify more MPs. We concluded that using techniques capable of fixing many MPs that need to browse an enormous search space of patch candidates is hence unjustified. We have estimated the search space for patches altering just a single MP at a time and found that for the vast majority of benchmark bugs it can be browsed exhaustively in a reasonable time (shorter than one day on a typical PC computer). We have implemented exhaustive search of the patch candidates and integrated it with a popular APR framework. Using this approach, we managed to plausibly patch the largest number of bugs in the popular Defects4J bug benchmark to date, outperforming the second-best solution by 15%. In contrast to that technique, the proposed one is fully deterministic and is suitable for parallel processing at various granularity levels. As the patches found are minimal (just one modification of the original software code), they are less likely to be harmful to the software functionality.
Exploitation Route	Fixing software bugs automatically is a silver bullet for software engineering and it can potentially have huge economic and societal impact. This research area investigated the limitations of the state-of-the art techniques to program repair and we propose novel techniques for improving the performance and accuracy of the repair process. Our techniques are generic and can be adopted and incorporated by other search-based program repair techniques, in order to shrink the search space in order to make the search process more efficient.
Sectors	Digital/Communication/Information Technologies (including Software)
URL	https://github.com/Manatee-EPSRC-Project


Description	Assuring Dependability In Collective Autonomous Systems (ATLAS)
Amount	£88,600 (GBP)
Funding ID	ACC2006195
Organisation	Defence Science & Technology Laboratory (DSTL)
Sector	Public
Country	United Kingdom
Start	08/2019
End	06/2020


Description	Automated Robustness Testing Of Safety-Critical Multi-Robot Inspection Applications (SAFEMUV)
Amount	£207,000 (GBP)
Organisation	Lloyd's Register Foundation
Sector	Charity/Non Profit
Country	United Kingdom
Start	03/2020
End	08/2021


Description	Secure and Safe Multi-Robot Systems
Amount	€ 6,999,786 (EUR)
Funding ID	101017258
Organisation	European Commission
Sector	Public
Country	Belgium
Start	01/2021
End	12/2023


Title	One- and two-degree edit patch finder
Description	Adding a new mode to the state-of-the-art open-source Automatic Program Repair (APR) tool named ARJA for searching one- and two-degree edit patches. Depending on the search-space assessment, the tool performs either an exhaustive or random search. The tool is capable of benefiting from data independence and facilities parallel processing. The number of analysed modification point is automatically adapted based on the predicted computational cost. For two-degree edit patches, there is also dynamic filtering of the modification point pairs to be analysed based on the search space size and the predicted computational cost.
Type Of Material	Improvements to research infrastructure
Year Produced	2020
Provided To Others?	No
Impact	The tool has been used to generate the largest to date set of plausible patches for the open-source Java projects from the popular Defects4J benchmark. A few bugs from the benchmark have been plausibly patched for the first time, whereas for several other bugs smaller patches (in terms of the number of edited modification points) have been found that the ones published to date.


Title	n-gram model guidance for genetic-algorithm-based patch searching
Description	The majority of the state-of-the-art syntax-based Automatic Program Repair (APR) tools employ genetic algorithms (GAs) as a metaheuristic to find plausible patches. GA generates subsequent populations of patch candidates, assigns the fitness value to each of the candidates, selects the candidates with the highest fitness values and performs their mutations in order to improve the patch candidates' quality. This approach has been reported to lead to obtaining plausible patches in many cases. Traditionally, the fitness value depends solely on the number of failed tests from the associated test suite. Due to some adverse properties of such traditional fitness function, mainly the lack of smooth gradient in the search space landscape in the majority of bug cases, new techniques for the fitness function improvement are sought after. The developed method applies some principles from the Natural Language Processing (NLP) domain. According to the literature, a buggy extract of code is rated as more improbable by language models. In both the contexts of natural and programming language processing, the most popular language models are based on n-grams, i.e., contiguous sequences of n items (e.g. words or tokens) and a text probability can be expressed using cross-entropy. In this method, cross-entropy has been used to support the traditional fitness function in a syntax-based APR process. Such a compound fitness function assigns a fitness value to each of the patch candidates. The traditional genetic-algorithm operators, such as selection, mutation or elitism, are still used to generate subsequent populations.
Type Of Material	Improvements to research infrastructure
Year Produced	2019
Provided To Others?	No
Impact	After applying the new method it appeared that, contrary to the prior results reported in the literature, the correlation between code perplexity and its correctness is statistically insignificant. Hence, applying the perplexity measures is rather unlikely to improve the patch search process in a GA, which has been experimentally confimed. This result, despite being negative, is likely to have a notable impact on the research domain as it demonstrates the unsuitability of simple language models for enhancing fitness functions used in APR.


Title	Database of single-degree edit patches
Description	The database includes a set of 7689 plausible patches for 66 software bugs from projects JFreeChart, Apache commons-lang, Apache commons-math and Joda-Time from the most popular Java benchmark set in the automatic program repair (APR) research field. The database, both in terms of the number of fixed bugs and the number of generated patches, is the largest Defects4J patch database created to date.
Type Of Material	Database/Collection of data
Year Produced	2020
Provided To Others?	No
Impact	Defects4J is the primary benchmark for assessing the quality of automatic program repair (APR) tools. The database, being the largest plausible patch databases for this benchmark created to date, can be used as a reference for other APR tools in terms of the efficiency in plausible patch searching. Similarly, as the patches in the database are minimal in terms of the number of edited modification points in the software, they can be used as the reference set for minimising the patch sizes generated by APR tools (patch size minimisation is an important part of the state-of-the-art syntax-based APR tools such as GenProg or ARJA). As the single edit patches have been generated in an exhaustive manner, the database includes all modification points whose alteration can lead to passing the entire unit test set. Software functionalities containing bugs that can be patched with a large number of single-edit plausible patches can be viewed as being insufficiently tested and hence the database can be employed to identify software elements that need additional unit tests written by developers.


Title	n-gram Java language models
Description	Five n-gram Java programming language models (for n=2,3,...,6) constructed based on almost 10,000 popular Java repositories including more than 2,200,000 Java source codes, mined from GitHub. The Java tokens such as keywords, keyword literals, separators and operators have been treated as separate items for the n-gram models, whereas all the identifiers, integer literals, floating-point literals and string literals have been abstracted and treated as the same item for each of these token types.
Type Of Material	Computer model/algorithm
Year Produced	2019
Provided To Others?	No
Impact	The database can be used to determine the so-called normalness of a Java code. Since a buggy code has been shown in the literature to be more improbable by a language model. Using these language models, potential bugs can be identified by computing the perplexity of subsequent tokens, for example using the cross-entropy or log-likelihood measures.


Description	Collaboration with CROSSMINER
Organisation	University of York
Country	United Kingdom
Sector	Academic/University
PI Contribution	We have collaborated with colleagues from the University of York, who are working on the H2020 European Union project called CROSSMINER. They are developing a platform for mining software engineering data from software repositories. Our team extended their tool with specific functionality, which enables the application of specific filters during the extraction of the online data. Such an extension was needed in our project, in order to mine data from software projects, which meet specific criteria. The results of this work are integrated in their tool and is made available as open source. This collaboration resulted in two publications in the Mining Software Repositories conference - one in 2019 and one to appear in the 2020 version of the conference.
Collaborator Contribution	Our partners have provided the knowledge and the software tool, which enabled us to extract quality data from online software repositories.
Impact	Two publications have resulted from this collaboration so far: * Kolovos, D., Neubauer, P., Barmpis, K., Matragkas, N. and Paige, R., 2019, May. Crossflow: a framework for distributed mining of software repositories. In 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR) (pp. 155-159). IEEE. *Barmpis,K., Neubauer, P., Matragkas, N., Paige, R. and Kolovos, D., 2020, (To appear). Polyglot and Distributed Software Repository Mining with Crossflow. In 20120 IEEE/ACM 17th International Conference on Mining Software Repositories (MSR). IEEE.
Start Year	2019


Title	CROSSFLOW
Description	Mining software repositories at a large scale typically requires substantial computational and storage resources. This creates an increasing need for repository mining programs to be executed in a distributed manner, such that remote collaborators can contribute local computational and storage resources. Crossflow is a novel framework for building polyglot distributed repository mining programs. It offers delegation of mining jobs to remote workers and can cache their results. Workers are able to implement advanced behavior like load balancing and rejecting jobs they either cannot perform or would execute sub-optimally. Workers can be written in different programming languages like Java and Python.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	The final version of Crossflow has only been released in end of December. IBM has chosen to adapt and extend Crossflow for one of the KTP projects it runs on data analytics (KTP010852).
URL	https://github.com/crossminer/scava/tree/master/crossflow


Description	Participation in the 2019 Mining Software Repositories Conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Researchers and practitioners attended event with a focus on presenting and discussing about the recent advances on mining software repositories.
Year(s) Of Engagement Activity	2019
URL	https://conf.researchr.org/home/msr-2019


Description	Participation in the 2020 IEEE Congress on Evolutionary Computation (CEC)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Researchers and practitioners attended event with focus on recent advances on evolutionary computation.
Year(s) Of Engagement Activity	2020
URL	https://wcci2020.org/cec-sessions/


Description	Participation in the 2020 Mining Software Repositories Conference
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	Researchers and practitioners attended event with a focus on presenting and discussing about the recent advances on mining software repositories.
Year(s) Of Engagement Activity	2020
URL	https://conf.researchr.org/home/msr-2020


Description	Participation in the 62nd CREST Open Workshop related to Automated Program Repair and Genetic Improvement
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Professional Practitioners
Results and Impact	Two members of the research team participated in the 62nd CREST Open Workshop related to Automated Program Repair and Genetic Improvement on 20th and 21st January 2020 in London, organised by UCL. The event has attracted about 50 participants, mainly top researchers in the domain of Automated Program Repair and Genetic Improvement. The members of the research team conducted a series of informal talk with the participants, advertised and discussed the research related to the award.
Year(s) Of Engagement Activity	2020
URL	http://crest.cs.ucl.ac.uk/cow/62/

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications