GReaTest: Growing Readable Software Tests

Lead Research Organisation: University of Passau

Department Name: Mathematics and Computer Science

Abstract

Testing is a crucial part of any software development process. Testing is also very expensive: Common estimations list the effort of software testing at 50% of the average budget. Recent studies suggest that 77% of the time that software developers spend with testing is used for reading tests. Tests are read when they are generated, when they are updated, fixed, or refactored, when they serve as API usage examples and specification, or during debugging. Reading and understanding tests can be challenging, and evidence suggests that, despite the popularity of unit testing frameworks and test-driven development, the majority of software developers do not practice testing actively. Automatically generated tests tend to be particularly unreadable, severely inhibiting the widespread use of automated test generation in practice. The effects of insufficient testing can be dramatic, with large economic damage, and the potential to harm people relying on software in safety critical applications.

Our proposed solution to address this problem is to improve the effectiveness and efficiency of testing by improving the readability of tests. We will investigate which syntactic and semantic aspects make tests readable, such that we can make readability measurable by modelling it. This, in turn, will allow us to provide techniques that guide manual or automatic improvement of the readability of software tests. This is made possible by a unique combination of machine learning, crowd sourcing, and search-based testing techniques. The GReaTest project will provide tools to developers that help them to identify readability problems, to automatically improve readability, and to automatically generate readability optimised test suites. The importance of readability and the usefulness of readability improvement will be evaluated with a range of empirical studies in conjunction with our industrial collaborators Microsoft, Google, and Barclays, investigating the relation of test readability to fault finding effectiveness, developer productivity, and software quality.

Automated analysis and optimisation of test readability is novel, and traditional analyses only focused on easily measurable program aspects, such as code coverage. Improving readability of software tests has a direct impact on industry, where testing is a major economic and technical factor: More readable tests will reduce the costs of testing and increase effectiveness, thus improving software quality. Readability optimisation will be a key enabler for automated test generation in practice. Once readability of software tests is understood, this opens the doors to a new research direction on analysis and improvement of other software artefacts based on human understanding and performance.

Planned Impact

The main beneficiaries of the project outcomes will be all stakeholders involved in IT projects:

-- Software developers and testers: Improved test readability will lead to higher programmer and tester productivity, as the time necessary to understand and perform maintenance actions with tests will be reduced. Furthermore, readability optimisation will help to overcome one of the main show-stoppers preventing wide-spread application of automated test generation techniques. Thus, the possibility to automatically generate readable tests will support software engineers in achieving sufficient degrees of testing, and will allow them to maintain more tests.

-- Organisations that develop IT systems: Software testing is one of the major cost factors in software engineering, commonly estimated at around 50% of the average budget. However, missing a software bug can have an even higher economic impact, as regularly demonstrated by bugs resulting in product recalls (e.g. Toyota), system downtimes (e.g. NatWest), or even accidents (e.g. Therac 25, Ariane 5). Improving test readability will reduce the costs of testing, while at the same time improving its efficiency and increasing software quality. This will allow IT companies to deliver more value to their clients at lower costs.

-- Clients, users, and other stakeholders of IT projects will benefit from the improved software quality resulting from more efficient testing. This is particularly important as our society increasingly depends on a working information infrastructure for more and more aspects of civic, commercial, and social life, while software at the same time becomes ever more complex.

Funded Value:

£200,008

Funded Period:

Jan 18 - May 21

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/N023978/2

Principal Investigator:

Gordon Fraser

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Software Engineering (100%)

Organisations

University of Passau (Lead Research Organisation)

People	ORCID iD
Gordon Fraser (Principal Investigator)
Philip McMinn (Co-Investigator)
José Miguel Rojas Siles (Researcher)	http://orcid.org/0000-0002-0079-5355

Publications

Author Name

Title Publication Date Published

10 25 50

Campos J (2018) An empirical evaluation of evolutionary algorithms for unit test suite generation in Information and Software Technology

Deiner A (2020) Search-Based Software Engineering - 12th International Symposium, SSBSE 2020, Bari, Italy, October 7-8, 2020, Proceedings

Gambi A (2019) Generating effective test cases for self-driving cars from police reports

Gambi A (2019) Automatically testing self-driving cars with search-based procedural content generation

Gordon Fraser (2018) A Tutorial on Using and Extending the EvoSuite Search-Based Test Generator

Keller S (2019) Improving Scratch Programming with CRC-Card Design

Panichella A (2020) Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

Panichella A (2022) Test smells 20 years later: detectability, validity, and reliability in Empirical Software Engineering

Stahlbauer A (2019) Testing scratch programs automatically

Stahlbauer A (2020) Verified from scratch

Vogl S (2021) Evosuite at the SBST 2021 Tool Competition

White T (2019) Improving random GUI testing with image-based widget detection

Key Findings
Impact Summary
Software and Technical Products
Engagement Activities


Description	The project drives the development of the open source unit test generation tool "EvoSuite" (http://www.evosuite.org), which has users in academia and industry. In particular, EvoSuite has been used for experimentation by other researchers, the published papers have produced follow-up work by other researchers. The prototypes have also been tested by users in industry, who provided useful feedback for the further course of the project. The project has further resulted in the Code Defenders web-based game (http://www.code-defenders.org), which has been used as a "game with a purpose" resulting in strong software tests, and it has seen applications in an educational setting. As a further application domain for generating readable test case, the analysis of blocks-based programs written in the Scratch programming language has been explored, resulting in the Whisker automated test generation tool for Scratch. The concept of readable test has also been extended to complex tests for self-driving cars, where tests consist of one or more roads on a fixed-size map (whose size can be specified), and the goal of the self-driving AI agent is to follow a given and predetermined path from an initial position to a goal position. A good test is one that creates multiple situations where the self-driving car drives off the lane. Tests are executed virtually in a simulator.
Exploitation Route	The work on Code Defenders has triggered new collaborations and is being integrated into programming education in higher education and secondary schools. The test generators for multiple application domains are available for researchers and practitioners.
Sectors	Digital/Communication/Information Technologies (including Software),Education


Description	The Code Defenders educational game, which also resulted from this project, is now widely used in higher education, teaching students how to write effective and readable tests.
First Year Of Impact	2017
Sector	Digital/Communication/Information Technologies (including Software),Education
Impact Types	Societal


Title	AsFault
Description	AsFault is a tool to generate test cases for self-driving cars. The tests are executed in a simulator called BeamNG. Currently, the test cases aim at testing the "lane keeping" functionality of the self-driving cars, that is, the ability of the self-driving car of driving on its own lane. Each test consists of one or more roads on a fixed-size map (whose size can be specified), and the goal of the self-driving AI agent is to follow a given and predetermined path from an initial position to a goal position. A good test is one that creates multiple situations where the self-driving car drives off the lane. We call such situations out-of-bounds examples (OBE). The tests are designed and executed in different phases. The tests are first designed using poly-lines. Then AsFault creates the code needed for the simulator, BeamNG, to simulate the driving situations, execute the tests and collect, report and visualise the results (that is, the number of times the car drove off the lane).
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	AsFault has successfully extended the ideas of evolutionary test generation, on which this grant is based, to the domain of self-driving cars. This adds a new angle on the problem of readable test cases, and AsFault is now used by other researchers to further research in test generation.
URL	https://github.com/alessiogambi/AsFault


Title	Whisker
Description	Block-based programming environments like Scratch foster engagement with computer programming and are used by millions of young learners. Scratch allows learners to quickly create entertaining programs and games, while eliminating syntactical program errors that could interfere with progress. However, functional programming errors may still lead to incorrect programs, and learners and their teachers need to identify and understand these errors. This is currently an entirely manual process. In our paper on Testing Scratch Programs Automatically, we introduced a formal testing framework that describes the problem of Scratch testing in detail. We instantiate this formal framework with the Whisker tool, which provides automated and property-based testing functionality for Scratch programs. The implementation of Whisker can be found in this repository.
Type Of Technology	Software
Year Produced	2019
Open Source License?	Yes
Impact	Whisker is the first automated tool to test and dynamically analyse Scratch programs. It is foundational to enable researchers to apply testing and analysis to Scratch programs; release of the tool is very recent, but multiple interested research groups have started experimenting with it, and will likely lead to new collaborations.
URL	https://github.com/se2p/whisker-main


Description	Invited tutorial at the International Conference on Search-Based Software Engineering
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Other audiences
Results and Impact	60 international researchers attended the SSBSE conference and the invited tutorial, in which participants learned how to use the EvoSuite test generation tool.
Year(s) Of Engagement Activity	2017
URL	http://ssbse17.github.io/tutorials/


Description	Keynote at CBSOFT/SAST 2019
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Undergraduate students
Results and Impact	Invited keynote speaker at the IV Brazilian Symposium on Systematic and Automated Software Testing (SAST)
Year(s) Of Engagement Activity	2019
URL	https://cbsoft2019.ufba.br/#/sastkeynotes

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications