Reverse Engineering State Machine Hierarchies by Grammar Inference (REGI)

Lead Research Organisation: University of Sheffield
Department Name: Computer Science


Software systems pervade modern life; they control everything from fly-by-wire aircraft and financial transfer systems to ABS breaking systems in cars and cooking modes in microwaves. It is an accepted fact that, as software systems evolve and their requirements change, they become increasingly complex and difficult to maintain. Different developers carry out (sometimes conflicting) changes to address different bugs or add different features, and before long the original design of the system is neglected and it becomes impossible to understand exactly how the system operates.One way around this problem is to keep a high-level design document that specifies exactly how the system is supposed to behave. A major benefit of developing such specifications along with the actual software itself is the fact that powerful testing and verification techniques exist that can be used to determine whether the actual system conforms to the specification. However, one major problem hampers the routine use of such specifications: as the software system evolves, they often tedious and time-consuming to generate and maintain separately from the code. With this project we will produce a technique that addresses the above problem by automating the process of generating a specification. Given a system that has no specification (or only a partial one), our technique will analyse and probe the system by both running it and looking at its static structure. It will produce a complete document specifying the behaviour of the system as a collection of state machines. Often, depending on the facet of behaviour that is of interest, it is useful to display certain aspects at a greater level of detail than others (for a financial system for example, the developer might be interested in details sub-prime loan management and not the transaction processing system). For this reason, state machines that are generated will be presented as a hierarchy. Devising a practical technique to automatically reverse engineer such specifications is challenging; the system in question needs to be suitably sampled to identify relevant behaviour, and the final specification will have to be a valid generalisation of these samples. How do we identify the set(s) of samples that can be used as a basis for generating the specification? How do we go about building an accurate specification of the system from this set of samples? One key realisation that will be investigated in this work is the fact that identical challenges to these arise in the field of grammar inference, where the challenge is to build a grammar (which can be represented by a state machine) from a sample of sentences in the target language. Grammar inference is a very mature field, with many very powerful techniques, but has never been linked to the similar problem of reverse-engineering specifications from software systems.This work will explore the relationship between the two fields of reverse-engineering and grammar inference, and will produce a set of approaches based on and extending existing techniques that, when combined, will produce a practical technique that generates complete and accurate hierarchy of state machines of a software system.
Description The project led to development of improved techniques permitting one to learn models of software using both powerful algorithms and specific knowledge available to practitioners. With such models, one might be able to use powerful testing/analysis methods leading to a higher-quality software. The effectiveness was demonstrated using inference of TCP/IP, the protocol on which Internet is built. Additional outcomes were model comparison and guided abstraction to facilitate model understanding.
Exploitation Route The inference technique developed in this work may be useful to both software developers and software development tool builders. The developed model inference technique has been implemented in the open-source Statechum tool. In the long term, one may envisage offering a paid consultancy based on it.
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software)

Description An algorithm developed as part of this grant was taken up by Quviq, a Swedish SME offering tools and consultancy. This contributed to Quviq doubling in size and gave it a market advantage with an improved version of their flagship product QuickCheck, aimed at efficient automated testing of Erlang software. Erlang is the language used in many telecommunications switches. Since then, the work contributed to the outcomes of the EU FP7 grant "Prowess"
First Year Of Impact 2015
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

Title Statechum 
Description This tool implements the inference method developed as part of AutoAbstract and REGI research grants. 
Type Of Technology Software 
Year Produced 2007 
Open Source License? Yes  
Impact Was reimplemented by Quviq (Swedish SME) and contributed to their significant growth.