RE-COST: REducing the Cost of Oracles for Software Testing

Lead Research Organisation: University College London
Department Name: Computer Science


Testing involves examining the behaviour of a system in order to discover potential faults. The problem of determining the desired correct behaviour for a given input is called the Oracle Problem. Since manual testing is expensive and time consuming there has been a great deal of work on automation and part automation of Software Testing. Unfortunately, it is often impossible to fully automate the process of determining whether the system behaves correctly. This must be performed by a human, and the cost of the effort expended is referred to as the Human Oracle Cost.RE-COST will develop Search-Based Optimisation techniques to attack the Human Oracle Cost problem quantitatively and qualitatively. The quantitative approach will develop methods and algorithms to both reduce the number of test cases and the evaluation effort per test case. The qualitative approach will develop methods and algorithms that will reduce test case cognition time.The RE-COST project seeks to transform the way that researchers and practitioners think about the problem of Software Test Data Generation. This has the potential to provide a breakthrough in Software Testing, dramatically increasing real world industrial uptake of automated techniques for Software Test Data Generation.

Planned Impact

The RECOST approach to reducing human effort targets the oracle effort. In current testing practice, the oracle is typically a team of engineers who have the burdensome task of checking copious software output for errors. This is tedious, costly and error-prone. Reducing the oracle cost will not only make software testing more reliable, it will alleviate some of this burden. It will facilitate a migration to the more rewarding and intellectually stimulating activities associated with software testing. RECOST will, therefore, directly impact on that part of society employed in software testing careers (which is often estimated to be about half of the software industry workforce). It will also have a longer term effect on society as a whole. Society depends critically on well-tested software, not merely for the familiar desktop applications and systems with which the public are well aware, but also the less visible software running in embedded controllers and devices. The 2009 EU ARTEMIS programme estimated that by 2016 there will be 40 billion such systems worldwide. These systems control transport, healthcare and military sectors and the Critical National Infrastructure. They pervade both the home and the workplace and so their successful operation is a critical concern. The RECOST project partners have an ambitious but realistic plan for ensuring that the project achieves maximum possible impact. To address the immediate academic and research communities, the project team will publish in the world's leading journals and conferences in Software Testing and Software Engineering. As the Case For Support demonstrates, the team has a strong track record of regular publication in all of these publication outlets and so this goal is demanding, but realistic. In order to reach out to the potential industrial users of the technologies that the project will develop, the team will partner both with the project's direct industrial collaborators and also the wider set of existing industrial collaborators of the CREST centre, UCL, and Sheffield departments of Computer Science. Both PIs, Harman and McMinn have many existing collaborations (and funding) from industry on which they will build during the the course of the RECOST project. In order to further widen this set of collaborators and initial technology adopters, the project team will seek to publish in trade press journals, and to present results at software developer events. Once again, the proposers have a strong track record of doing this. For example Prof. Harman has given several invited keynote talks at industrial Software Testing conferences, while Dr. McMinn is a key player in the development of Sheffield's Genesys initiative. Genesys is a stepping stone into industry for final year undergraduates and masters students, who will be among the first to use the techniques developed on the project. As the Case For Support demonstrates in more detail, the RECOST project team complement their outstanding record of publishing in the top international journals and conferences with an equally strong track record of industrial involvement in their research. For example, Prof. Harman's work on industrial impact was recognised by the EPSRC when it invited him to give the keynote talk on working with industry at the EPSRC Early Career Workshop in Information Communication Technology in November 2008. Finally, in order to reach out to the public at large, the RECOST proposers will seek to publish their work in popular science journals and will seek to participate in activities that offer opportunities to explain software testing issues to the general public. For example, the team will follow up on news stories relating to software problems, with short press releases that briefly explain the issues and point the interested reader to the RECOST website and relevant literature.


10 25 50
publication icon
Barr E (2015) The Oracle Problem in Software Testing: A Survey in IEEE Transactions on Software Engineering

Description We provided a comprehensive and thorough review of the software Oracle problem, which has been accepted for publication in the leading journal on software engineering. We back this up with a repository of all publications on this topic, which contains over 500 entries.

We also have results on fourth localisation, which are described under the "narrative impact" section.

We develop several prototype tools for extraction of realistic test cases, to reduce the human Oracle problem. Work has been picked up and developed by other researchers.
Exploitation Route The anticipated results on fault localisation will have a profound impact on the demand of this field.

We believe that the approach we introduced to realistic testing, based on extraction of real test cases from web-based services, will receive significant follow-on attention.

Finally, we also anticipate that the survey paper which is about to appear, will form a foundation for work the development approaches to tackle the Oracle.

The project only recently completed, and as further evidence of impact emerges, it will be updated and reported upon here.

Due to the foundational nature of this work, it could impact any and all of the sectors listed below, so I have simply ticked "other".
Sectors Other

Description update There has been significant recent industrial impact for the research underpinning these grant and others for which Prof. Harman was PI. The Prof Harman became engineering manager at Facebook London, where he leads the Sapienz team, working on Search Based Software Engineering (SBSE) for automated test case design and fault fixing. The development of SBSE was a key research direction for this grant that underpinned the work. Sapienz has been deployed to continuously test Facebook's apps, leading to thousands of bugs being automatically found and fixed (mostly by developers, but more recently, some of these faults have also been automatically fixed by Sapienz). The software tackled by Sapienz consists of tens of millions of lines of code; apps that are among the largest and most complex in the app store and that are used by over a billion people worldwide every day for communication, social networking and community building. This project was concerned with the Oracle problem: how do we know what the correct output should be for a program, when it comes to testing the program? We demonstrated that spectrum based fault localisation techniques can have no supreme localiser; this means that there can be no ideal approach to fault localisation, that is guaranteed to outperform others. We anticipate this finding will have important ramifications for the field, since many other researchers seek to try and create just such a one size fits all supreme fault localisation equation. This result first emerged in is fullness in 2013, as its origins in work undertaken during the project.
First Year Of Impact 2018
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

Description 2nd Workshop on Genetic Improvement at GECCO 2016 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact Co-Chaired by Dr David R White and Dr Justyna Petke

The growth in GI echoes a wider trend in research on the use of evolutionary and genetic search in optimising aspects of software engineering. For example, since 2002 there has been a track on Search Based Software Engineering at GECCO. There exists the dedicated SSBSE conference, and we now see the inauguration of regional conferences and workshops featuring or even dedicated to SBSE (in Brazil, China and recently the USA). In 2015 the inaugural Genetic Improvement Workshop was held in conjunction with GECCO. The workshop was a tremendous success.

Genetic Improvement is one of the most exciting and growing applications of evolutionary search. Including "to appear", since 2000, there have been more than 70 papers in this area and interest is growing. GI research has won three GECCO Human Competitive Awards (Gold, Silver and Bronze) and two best papers, including at the International Conference on Software Engineering and GECCO. Furthermore, a special issue on Genetic Improvement in the Genetic Programming and Evolvable Machines journal is due to appear in the coming months.

Whilst SBSE has traditionally been applied to software engineering problems there has been great interest in using it, particularly genetic programming, on software itself.

Genetic Improvement (GI) uses computational search to improve software while retaining its partial functionality. The technique was first applied to parallelise programs and optimise and find compromises between non-functional properties of software, such as execution time and power consumption. This work led on to automated bug fixing in commercial software. More recently, it has been shown that GP can use human written software as a feed stock for GP and is able to evolve mutant software dedicated to solving particular problems. Another interesting area is grow and graft GP, where software is incubated outside its target human written code and subsequently grafted into it via GP.
Year(s) Of Engagement Activity 2016