Trustworthy refactoring

Lead Research Organisation: University of Kent
Department Name: Sch of Computing

Abstract

When we write programs, we don't always get the code right first time. A function may start off with one purpose that changes to another: the function is now mis-named; we may have chosen a particular way of representing some real-world object as a data structure, but we find that we need to change the representation to make it more general, or adapt to a change in other parts of the organisation. In both cases we need to transform our program, without changing what it already does, even though we change how it does it. This process is called refactoring.

In a program of any size, making the simplest refactoring happen can be complicated. For example, if we rename a commonly used library function this may result in changes in hundreds of files, and making the change can be too costly and error prone to do by hand. For this reason, engineers have built refactoring tools that automate the process of making common refactoring transformations, and these tools can be found for many languages, integrated into programmers' standard workflow (e.g. into the Eclipse IDE for Java). Even so, refactoring tools are used less than might be anticipated: a reason for this is a lack of trust in the tools (see e.g. E Murphy-Hill et al, How We Refactor, and How We Know It, 2012). When we apply a tool, how do we know that it hasn't changed what our program does?

In this project our aim is to make a step change in the way that refactoring tools are built, so that we demonstrably improve the trustworthiness of the way that the tools make the refactorings. For example, in the case of our industrial partners, code changes need to be reviewed by up to three reviewers, and even a straightforward refactoring can give a substantial review overhead. This project will allow reviewers to assure themselves of the safety of changes with substantially less effort.

How will we achieve this? The UK is a world leader in showing how computers can support logical reasoning, either by answering logic questions automatically (SAT/SMT solvers) or by supporting "by hand" proof (proof assistants such as Isabelle and HOL). In this project we will build on our work in the HOL proof assistant to formally verify a compiler for the CakeML language. This infrastructure will help us to build a refactoring tool for CakeML in which we are able to say, in advance, that a particular refactoring - such as renaming - is guaranteed never to break a program.

While this work advances the state of the art for academic computer science, we also want to support the practising engineer, and so in this project we will work closely with Jane Street Capital, a leading user of the OCaml language, to develop a trustworthy refactoring tool for this industrially-relevant language. This tool will provide different levels of assurance from the (strongest) case of a fully formal proof that a refactoring can be trusted to work on all programs, given some pre-conditions, to other, more generally applicable guarantees, that a refactoring applied to a particular program does not change the behaviour of that program.

We will ensure the practicality of the tool by working with industrial partners, Jane Street, and embed one of the researchers in industry for part of the project. The refactorings implemented will be chosen to support development not only in our partners, but also in the wider OCaml developer community, and the tool will be freely available under an open source licence, supporting its sustainability beyond the life of the project.

In providing verified refactorings for CakeML we will go beyond the state of the art for program verification, and in the OCaml development, we will set a new benchmark for building refactoring tools for programming languages in general.

Planned Impact

The project brings together CS theory and the practice of programming, as supported by software tools. We see the project as having an impact beyond the purely academic in two complementary directions.

- It will deliver state of the art software development tools, which will deliver a benefit to the software developers in the OCaml programming community.

- The advancements in assurance will have an impact beyond the particular OCaml language on the developers of software tools for other languages and paradigms (and in particular developers of refactoring tools).

Industrial impact

The project will deliver a tool which is intended to become the default refactoring tool for the OCaml developer community. Jane Street is a leading member of that community, with a history of supporting it through open source release of software, such as its versions of the base libraries. They have supported the proposal by promising to contribute 600 hours of effort to the project should it be funded.

In order to consolidate the work done during the project, we have planned a final six month period during which the strands of the project will be integrated together, documented and demonstrated by means of video and other tutorials. We have therefore requested funding for 42 months, allowing 36 months for the theoretical and practical development and the final six for integration activities. This final consolidation period substantially increases the potential range and depth of the industrial impact of the project, for the benefit of the developer community and the wider industrial sector using OCaml.


Mid-term workshop

This approach will be supplemented by a mid-term workshop to be held early in 2018. The workshop will invite leaders in the fields of mechanised (meta)theory and refactoring, allowing us to showcase the achievements of the project thus far and plans for the remainder of the project. The workshop will be supplemented by a tutorial day to give attendees hands-on experience of using the interim results of the project.

User community

For a tool to be of practical value, it needs to be designed with the needs of the user community in mind. We will ensure the relevance of the software to the OCaml developer community by embedding the software development RA with the industrial partner for a total period of six months over the life of the project. This will ensure that the RA will not only be receptive to the requirements of the partner but will also be able to observe for him/herself how such tools are used in practice and suggest functionality for themselves.

Open source development

The software will be documented through a comprehensive website that includes usage information and examples. Take-up of the tool will be promoted through the production of a number of demonstration videos. The software developed will be freely available under open source licence and will be accessible on Github in order to make it as easy as possible for users of the system to themselves become contributors to the software, thus ensuring the longer-term sustainability of the project outcomes.

Video

The project will use online video as a dissemination mechanism to all four of the interested parties: theoreticians, OCaml developers, refactoring tool developers and the wider public. It is planned for the final year of the project to record high production-quality videos, to a total of one hour each, for the four interest groups, and a budget of 10,000 GBP is requested for this.

Researcher training

While a number of skills, such as presentation of work to peers, have been are addressed through the PhD process, the wider task of explaining research to computer scientists and professionals and the general public tend to be more peripheral to the PG researcher. We will address this by requesting £2,000 to support media training for the RAs employed by the grant, in order to enhance their public engagement skills.

Publications

10 25 50
 
Description Software refactoring is the process of changing software code without changing what it does: for example, it's often the case that the purpose and function of a software component has evolved, and so the component should be renamed. But because a name might be used across a whole project, and the result of trying to change a name "by hand" could be both time consuming and error prone, it is typical to build tools that support this kind of process.

The ROTOR tool provides this for OCaml users, and this project has enabled it to be built on a sound theoretical foundation, too. In our work we extended the usual notion of name binding to encompass the complicated associations of names in the OCaml module system, and we were able to prove the correctness of this approach using automated theorem proving technology. The accepted version of the paper describing this is linked below.
Exploitation Route The Rotor tool has been integrated with the standard OCaml software "build" tool Dune, so that it can be used by the widest possible class of users.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://pure.royalholloway.ac.uk/portal/files/34708448/paper_camera_ready.pdf
 
Description Contribution to the Dune composable build system for OCaml.
First Year Of Impact 2020
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Economic

 
Description Eötvös Loránd University (ELTE) 
Organisation Eotvos Lorand University
Country Hungary 
Sector Academic/University 
PI Contribution Supervision and mentoring of students and researchers in the Programming Languages group, with Dr Zoltan Horvath and Dr Daniel Horpacsi. In particular supervision of students on relevant masters projects and a PhD student, Peter Bereczky.
Collaborator Contribution Formalisation of a subset of Erlang in the Coq theorem prover. Formalisation of matching logic in the Coq theorem prover, in collaboration with staff from UIUC and RV Inc. Design of language-independent refactoring schemes.
Impact A Comparison of Big-step Semantics Definition Styles P Bereczky, D Horpácsi, S Thompson arXiv preprint arXiv:2011.10373, 2020 Machine-checked natural semantics for Core Erlang: exceptions and side effects P Bereczky, D Horpácsi, S Thompson Erlang 2020: Proceedings of the 19th ACM SIGPLAN International Workshop on Erlang, 2020 A Proof Assistant Based Formalisation of a Subset of Sequential Core Erlang P Bereczky, D Horpácsi, S Thompson International Symposium on Trends in Functional Programming, 2020.
Start Year 2019
 
Description refactoring with Jane Street Capital 
Organisation Jane Street Capital
Country United States 
Sector Private 
PI Contribution Our project is to build refactoring tools for the programming languages OCaml and CakeML in such a way that they provide higher levels of assurance than the current state of the art.
Collaborator Contribution Our partners have contributed effort to the equivalent of 10%FTE staff over the 42 month period of the grant. They are providing expertise on the OCaml language infrastructure, including the compiler and other tools, and providing (confidential) case study information, and are hosting at their London offices one of the project researchers for a number of months in our to "embed" him in their development practice and mentor his refactoring tool development.
Impact Enhanced skills of the embedded RA (we are still at an early stage of collaboration).
Start Year 2016
 
Title Proof repository for Rotor 
Description This repository contains the machine checked proofs for the Coq proof assistant that underpin the correctness of the Rotor software refactoring tool. 
Type Of Technology Software 
Year Produced 2019 
Impact This repository contains the machine checked proofs for the Coq proof assistant that underpin the correctness of the Rotor software refactoring tool, which is listed in researchfish as a separate artefact. 
URL https://gitlab.com/trustworthy-refactoring/coq-smlms
 
Title ROTOR 
Description ROTOR is a tool for refactoring OCaml code, written in OCaml. The correctness of the tool is underpinned by a formally-checked theory delivered in the proof assistant Coq, listed in researchfish as a separate artefact. 
Type Of Technology Software 
Year Produced 2019 
Open Source License? Yes  
Impact The open source Rotor tool is freely available as a part of the OCaml developer ecosystem, and works with commonly-used tools such as the Dune build system. The tool that supports safe renaming across complex, multi-module, projects gives software developers the freedom to use renaming with more assurance, and therefore more frequently, and thus maintain a higher quality of software in general. The Rotor tool and its associated artifacts delivers the following impacts. - The Rotor tool, which implements renaming within multi-module OCaml projects. - A novel theory of theory of name binding for languages with complex module systems such as OCaml, reported at the ACM PLDI conference 2019. - A formalisation of this theory, and proofs of key properties of the theory fully formalised in the Coq proof assistant. results about its meta-theory, and establishment of key results. - The infrastructure underlying the refactoring implementations, embodying a 'strategic programming' approach, and with the capability to be extended to other refactorings and other language constructs. - Integration of Rotor tool with Dune build system, allowing Rotor to automate building information about the artifacts that comprise a multi-module codebase over which to run a refactoring. - Integration required modifications to the Dune system, available in released versions of Dune. - Building on the approach to implementing refactorings, the PI and a team at Eötvos Loránd University, Budapest have developed an approach to language-independent refactorings. 
URL https://gitlab.com/trustworthy-refactoring/refactorer