GReaTest: Growing Readable Software Tests

Lead Research Organisation: University of Passau
Department Name: Mathematics and Computer Science


Testing is a crucial part of any software development process. Testing is also very expensive: Common estimations list the effort of software testing at 50% of the average budget. Recent studies suggest that 77% of the time that software developers spend with testing is used for reading tests. Tests are read when they are generated, when they are updated, fixed, or refactored, when they serve as API usage examples and specification, or during debugging. Reading and understanding tests can be challenging, and evidence suggests that, despite the popularity of unit testing frameworks and test-driven development, the majority of software developers do not practice testing actively. Automatically generated tests tend to be particularly unreadable, severely inhibiting the widespread use of automated test generation in practice. The effects of insufficient testing can be dramatic, with large economic damage, and the potential to harm people relying on software in safety critical applications.

Our proposed solution to address this problem is to improve the effectiveness and efficiency of testing by improving the readability of tests. We will investigate which syntactic and semantic aspects make tests readable, such that we can make readability measurable by modelling it. This, in turn, will allow us to provide techniques that guide manual or automatic improvement of the readability of software tests. This is made possible by a unique combination of machine learning, crowd sourcing, and search-based testing techniques. The GReaTest project will provide tools to developers that help them to identify readability problems, to automatically improve readability, and to automatically generate readability optimised test suites. The importance of readability and the usefulness of readability improvement will be evaluated with a range of empirical studies in conjunction with our industrial collaborators Microsoft, Google, and Barclays, investigating the relation of test readability to fault finding effectiveness, developer productivity, and software quality.

Automated analysis and optimisation of test readability is novel, and traditional analyses only focused on easily measurable program aspects, such as code coverage. Improving readability of software tests has a direct impact on industry, where testing is a major economic and technical factor: More readable tests will reduce the costs of testing and increase effectiveness, thus improving software quality. Readability optimisation will be a key enabler for automated test generation in practice. Once readability of software tests is understood, this opens the doors to a new research direction on analysis and improvement of other software artefacts based on human understanding and performance.

Planned Impact

The main beneficiaries of the project outcomes will be all stakeholders involved in IT projects:

-- Software developers and testers: Improved test readability will lead to higher programmer and tester productivity, as the time necessary to understand and perform maintenance actions with tests will be reduced. Furthermore, readability optimisation will help to overcome one of the main show-stoppers preventing wide-spread application of automated test generation techniques. Thus, the possibility to automatically generate readable tests will support software engineers in achieving sufficient degrees of testing, and will allow them to maintain more tests.

-- Organisations that develop IT systems: Software testing is one of the major cost factors in software engineering, commonly estimated at around 50% of the average budget. However, missing a software bug can have an even higher economic impact, as regularly demonstrated by bugs resulting in product recalls (e.g. Toyota), system downtimes (e.g. NatWest), or even accidents (e.g. Therac 25, Ariane 5). Improving test readability will reduce the costs of testing, while at the same time improving its efficiency and increasing software quality. This will allow IT companies to deliver more value to their clients at lower costs.

-- Clients, users, and other stakeholders of IT projects will benefit from the improved software quality resulting from more efficient testing. This is particularly important as our society increasingly depends on a working information infrastructure for more and more aspects of civic, commercial, and social life, while software at the same time becomes ever more complex.

Related Projects

Project Reference Relationship Related To Start End Award Value
EP/N023978/1 01/03/2016 01/10/2017 £516,859
EP/N023978/2 Transfer EP/N023978/1 01/01/2018 31/05/2021 £200,009
Description The project drives the development of the open source unit test generation tool "EvoSuite" (, which has users in academia and industry. In particular, EvoSuite has been used for experimentation by other researchers, the published papers have produced follow-up work by other researchers. The prototypes have also been tested by users in industry, who provided useful feedback for the further course of the project.

The project has further resulted in the Code Defenders web-based game (, which has been used as a "game with a purpose" resulting in strong software tests, and it has seen applications in an educational setting.

As a further application domain for generating readable test case, the analysis of blocks-based programs written in the Scratch programming language has been explored, resulting in the Whisker automated test generation tool for Scratch.

The concept of readable test has also been extended to complex tests for self-driving cars, where tests consist of one or more roads on a fixed-size map (whose size can be specified), and the goal of the self-driving AI agent is to follow a given and predetermined path from an initial position to a goal position. A good test is one that creates multiple situations where the self-driving car drives off the lane. Tests are executed virtually in a simulator.
Exploitation Route The work on Code Defenders has triggered new collaborations and is being integrated into programming education in higher education and secondary schools. The test generators for multiple application domains are available for researchers and practitioners.
Sectors Digital/Communication/Information Technologies (including Software),Education

Description The Code Defenders testing game has been used in education at several universities (e.g. Sheffield, Leicester, Delft, Madrid, Porto, Berlin, Passau, GMU, Valencia), the Halmstad Summer School on Testing (, and the HEADSTART summer school for Y12 students at the University of Sheffield.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software),Education
Impact Types Societal

Title AsFault 
Description AsFault is a tool to generate test cases for self-driving cars. The tests are executed in a simulator called BeamNG. Currently, the test cases aim at testing the "lane keeping" functionality of the self-driving cars, that is, the ability of the self-driving car of driving on its own lane. Each test consists of one or more roads on a fixed-size map (whose size can be specified), and the goal of the self-driving AI agent is to follow a given and predetermined path from an initial position to a goal position. A good test is one that creates multiple situations where the self-driving car drives off the lane. We call such situations out-of-bounds examples (OBE). The tests are designed and executed in different phases. The tests are first designed using poly-lines. Then AsFault creates the code needed for the simulator, BeamNG, to simulate the driving situations, execute the tests and collect, report and visualise the results (that is, the number of times the car drove off the lane). 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact AsFault has successfully extended the ideas of evolutionary test generation, on which this grant is based, to the domain of self-driving cars. This adds a new angle on the problem of readable test cases, and AsFault is now used by other researchers to further research in test generation. 
Title Whisker 
Description Block-based programming environments like Scratch foster engagement with computer programming and are used by millions of young learners. Scratch allows learners to quickly create entertaining programs and games, while eliminating syntactical program errors that could interfere with progress. However, functional programming errors may still lead to incorrect programs, and learners and their teachers need to identify and understand these errors. This is currently an entirely manual process. In our paper on Testing Scratch Programs Automatically, we introduced a formal testing framework that describes the problem of Scratch testing in detail. We instantiate this formal framework with the Whisker tool, which provides automated and property-based testing functionality for Scratch programs. The implementation of Whisker can be found in this repository. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact Whisker is the first automated tool to test and dynamically analyse Scratch programs. It is foundational to enable researchers to apply testing and analysis to Scratch programs; release of the tool is very recent, but multiple interested research groups have started experimenting with it, and will likely lead to new collaborations. 
Description Invited tutorial at the International Conference on Search-Based Software Engineering 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other audiences
Results and Impact 60 international researchers attended the SSBSE conference and the invited tutorial, in which participants learned how to use the EvoSuite test generation tool.
Year(s) Of Engagement Activity 2017
Description Keynote at CBSOFT/SAST 2019 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact Invited keynote speaker at the IV Brazilian Symposium on Systematic and Automated Software Testing (SAST)
Year(s) Of Engagement Activity 2019