CamFort: Automated evolution and verification of computational science models

Lead Research Organisation: University of Cambridge
Department Name: Computer Science and Technology

Abstract

Scientific models play a vital role in science and policy making. Many models are now expressed as complex computer programs which are often the result of decades of research and development, possibly involving multiple researchers or teams. This has lead to significant investment in maintaining these models and evolving them to use modern programming approaches or to work efficiently on new hardware platforms (such as cloud computing resources). However, the complexity of these models makes maintenance and evolution difficult. In particular, changing a complex model's code whilst ensuring it produces the same results is difficult; maintenance/evolution of complex models is often error prone.

The complexity of a piece of software can be classified as either intrinsic or accidental. Intrinsic complexity is an essential reflection of the complexity inherent in the problem and solution at hand. Alternatively, accidental complexity arises from the particular programming language, design or tools used to implement the solution. Many of the research contributions of programming language design and software engineering have been aimed at reducing the accidental complexity of software. However, many of these approaches have not been targetted at scientific computing.

There is now a need to develop these contributions so that they meet the needs of scientists. Addressing these needs will provide huge benefits to science and policy through increased productivity and trust in models. Our collaborations with leading research groups in science have highlighted the huge existing investments in established models. We are therefore aiming to support the evolution, rather than replacement, of existing code and working practices. Our goal is to apply cutting edge programming language and software engineering research to help develop "sustainable" software, which maintains its value over generations of researcher. Our focus is on models developed in the Fortran language, as this remains a dominant programming language in scientific computing, owing in part to its longevity.

We will provide practical tools which scientists can use to reduce the accidental complexity of models through evolving a code base, as well as tools for automatically verifying that any maintenance/evolution activities preserve the models behaviour. We will develop new mechanisms for program comprehension and transformation in order to bring effective techniques from programming language design and software engineering across the chasm to scientific computing. Ultimately, reducing the effort to maintain and evolve code will free-up scientists to focus on the core aspects of the science, and will lead to models that are more easily communicated, disseminated, and reused between researchers, supporting core ideals of science.

Planned Impact

Computer modelling is now a large part of science and engineering research. However, there are many challenges, e.g., verification, parallelisation, and maintenance. Despite significant programming language and software engineering research to address these concerns, very few results are applied to science; there is little interaction between natural science and programming research communities. Given the importance, the benefits, and the challenges of computational science, it is critical that foundational programming language and software engineering research be targeted at computational science. There is a great opportunity for progress and impact if these communities interact. Our project does exactly this, addressing problems of verified model evolution. However, many more interdisciplinary projects are needed. Part of our project's impact is therefore in promoting a cultural shift to computer science engagement with the sciences, which is critical to the health of future scientific research.

In programming language and software engineering research, our project will provide contributions to various areas such as (1) parsing - accommodating languages with embedded fragments of an unrecognised language, e.g. pre-processor code; (2) program analysis - for advanced refactorings and new verification techniques, e.g. advanced resource analysis and high-level programming pattern detection; (3) verification - techniques for differential testing of program fragments (e.g., before and after refactoring). More broadly, the interdisciplinary nature of this project will provide knowledge and data on the programming needs of computational scientists, which will aid future language research into new programming approaches for science.

As noted by the 'Programming languages and compilers' research area of the EPSRC ICT theme, a lack of implementation effort in this area has held-back impact. Our project includes significant implementation and engagement effort to ensure adoption, impact, and appropriate evaluation of the new programming tools.

Our work benefits scientists and engineers in academia and industry who are involved in computer modelling using Fortran. Our tools will aid the evolution of existing code base which may use custom pre-processors. Furthermore, our tools for automated testing will improve verification of evolution activities, reducing bugs and the effort required to find and correct them. In the short term, our tools will have immediate impact on our collaborators as our research is evaluated in the context of active scientific research. Our collaborators include the Cambridge Centre for Climate Change Mitigation Research (4CMR), the Met Office (the Unified Model), the Centre for Climate Science (Department of Geography), and the Department of Chemistry. The scope of our impact is broadened by dissemination activities (workshops, seminars) and providing the tools free online (and open source).

On medium to long-term scales, our work will indirectly benefit other aspects of computer modelling by reducing maintenance and verification efforts and thus allowing scientists to focus on other aspects, e.g. underlying science and validation of models.

Scientific research benefits society mainly through technology (when science is applied via engineering) and policy (informed by predictions). Since our research aims to improve computer modelling in science and engineering, we expect more long-term, indirect impact, although this is difficult to quantify given the technical nature of our research. Nonetheless, many of our collaborators are climate scientists and thus there is route to social, environmental, and policy-based impact. For example, the 4CMR has a number of ongoing projects with policy makers (including OFWAT, the Environment Agency, Department of Energy and Climate Change, Highways Agency, and British Land); our collaboration with 4CMR provides an opportunity for indirect policy/social impact.

Publications

10 25 50

publication icon
Orchard D (2017) Verifying spatial properties of array computations in Proceedings of the ACM on Programming Languages

publication icon
Orchard D (2015) Evolving Fortran types with inferred units-of-measure in Journal of Computational Science

publication icon
Orchard D (2019) Quantitative program reasoning with graded modal types in Proceedings of the ACM on Programming Languages

publication icon
Orchard D (2018) Complexity bounds for container functors and comonads in Information and Computation

 
Description We have a developed an free software tool called CamFort which can be used by scientists to enhance the software used in scientific modelling. Our research has developed two specific enhancements. The first is unit checking: this ensures that a programmer does not accidentally add a quantity in e.g. metres to a quantity in e.g. seconds, and correctly notices that 5 metres divided by 2 seconds is 2.5 metres per second. The second is stencil specifications: this helps a programmer writing complicated loops over arrays (tables) of data to make sure that they are reading from the correct position: many numerical algorithms are described in terms such as "centred in space" or "backwards in time" - these map onto simple stencils which our tool can check are followed.
Exploitation Route Scientists may make free use of our tool and we are working to expand the range of Fortran code which we can deal with.
Sectors Environment,Manufacturing, including Industrial Biotechology

 
Description The Met Office are adopting CamFort in their software pipeline with the aiming of further improving their code quality. We have been awarded a IAA Postdoctoral placement award and Knowledge Transfer Placement to work with them on this. Bloomberg (USA) are using CamFort in their internal tools for managing their Fortran code base.
First Year Of Impact 2018
Sector Environment,Financial Services, and Management Consultancy
Impact Types Economic

 
Description EPSRC IAA Knowledge Transfer Fellowships
Amount £43,430 (GBP)
Organisation University of Cambridge 
Sector Academic/University
Country United Kingdom
Start 12/2018 
End 09/2019
 
Description EPSRC IAA Postdoctoral Placements
Amount £12,465 (GBP)
Organisation University of Cambridge 
Sector Academic/University
Country United Kingdom
Start 09/2018 
End 12/2018
 
Description Software Sustainability Institute Fellowship
Amount £3,000 (GBP)
Organisation University of Edinburgh 
Department UK Software Sustainability Institute
Sector Academic/University
Country United Kingdom
Start 01/2015 
End 12/2015
 
Description Supporting the development of CamFort (donation)
Amount £99,000 (GBP)
Organisation Bloomberg 
Sector Private
Country United States
Start 02/2020 
 
Description Undergraduate Intern Scholarships from the Science Faculty Impact Development Fund, University of Kent
Amount £4,750 (GBP)
Organisation University of Kent 
Sector Academic/University
Country United Kingdom
Start 06/2017 
End 08/2017
 
Title Research data supporting Verifying Spatial Properties of Array Computations 
Description This dataset reproduces findings from our paper "Verifying Spatial Properties of Array Computations". It contains the code necessary to reproduce our study of array programming idioms in scientific/numerical Fortran code (Section 2 of the paper), to demonstrate the inference, checking, and specification synthesis features of our verification tool via examples (Section 3 of the paper), and to produce statistics describing the various kinds of specifications that can be inferred over our test corpus (Section 7 of the paper). 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
 
Title Software and data supporting "Lightning Talk: Supporting Software Sustainability with Lightweight Specifications" 
Description Provides a code snapshot of CamFort, evaluation scripts, and sample code to reproduce the results shown in the associated paper. 
Type Of Material Computer model/algorithm 
Year Produced 2016 
Provided To Others? Yes  
Impact The CamFort tool and infrastructure has formed the basis of various papers following this initial paper. CamFort and its supporting libraries are now being employed in Bloomberg and the Met Office. 
URL https://www.repository.cam.ac.uk/handle/1810/257261
 
Description Bloomberg CamFort development 
Organisation Bloomberg
Country United States 
Sector Private 
PI Contribution Bloomberg have adopted our tools for their internal projects managing their codebase.
Collaborator Contribution Direct contributions to the CamFort software.
Impact Software updates and contributions to the CamFort tools
Start Year 2017
 
Description Met Office 
Organisation Meteorological Office UK
Country United Kingdom 
Sector Academic/University 
PI Contribution Adaption of the CamFort tool to work with the Unified Model. Application of units of measure checking to the Unified Model.
Collaborator Contribution New directions for CamFort research, in particular how to support the new lFric model under development.
Impact EPSRC IAA Knowledge Transfer Fellowship EPSRC IAA Postdoctoral Placement
Start Year 2018
 
Title CamFort 
Description CamFort is a refactoring and verification tool for scientific Fortran programs. It currently supports Fortran 66, 77, and 90 with various legacy exntesions. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact 14 people watching project for updates, 67 people have starred our repository, 8 people have forked the project to work on their own. We've given tutorials on Camfort at numerous venues - see engagement activities. 
URL https://camfort.github.io
 
Title fortran-src: Fortran parsing and analysis infrastructure 
Description fortran-src provides a Haskell library for lexing, parsing, and analysing Fortran code covering standards: FORTRAN 66, FORTRAN 77, Fortran 90, and Fortran 95. 
Type Of Technology Software 
Year Produced 2015 
Open Source License? Yes  
Impact fortran-src has now been adopted as a key piece of infrastructure for a team at Bloomberg. fortran-src is the backbone of the rest of the CamFort system. As of March 2019: 11 people are watching the repository, 23 people have 'starred', and 13 people have forked the repository. 
URL https://github.com/camfort/fortran-src
 
Description Cambridge Econometrics Parallelisation Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact Cambridge Econometrics were looking for ways to incorporate the CamFort tool in the development of their E3MG model. In particular they are looking for ways to improve the performance of their code through parallelisation. We identified some interesting new research ideas to try and they are keen to collaborate on a future grant.
Year(s) Of Engagement Activity 2018
 
Description Departmental seminar at University of Warwick 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Departmental seminar at University of Warwick "Save our Science: Experiences in lightweight verification for computational science and what you can do to help"
Year(s) Of Engagement Activity 2018
 
Description Invited lecture at Southampton CDT 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Invited lecture at Southampton CDT "Next Generation Computational Modelling", title "Lightweight Verification for Computational Science Models". The intention was to inform new computational scientists about the need for verification and testing in scientific code and to present ways of doing this.
Year(s) Of Engagement Activity 2016
 
Description Invited trip to Nasa JPL, USA 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Industry/Business
Results and Impact We were invited to present and discuss our work on units of measure. The organisers of the meeting wrote the original Fortran units of measure proposal. On the upshot of this we have been invited to attend the Fortran standards committee meetings.
Year(s) Of Engagement Activity 2018
 
Description Organised 2nd meeting on Testing and Verification for Computational Science (TVCS) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact The need for more rigorous software verification in computational science is well known. Often the responsibility is placed on the scientist, but increased care and attention is not enough. There is a wealth of research in computer science aimed at automating testing and verification, yet little of this has crossed over into practice in the sciences.

This meeting provides a forum to discuss recent work and new ideas, and to foster links between researchers interested in the intersection of verification, programming languages, and computational science. The meeting is aimed at both computer scientists and natural/physical scientists employing computational techniques.

By popular demand we ran this 2nd installation of the meeting after the first. We made contact with users to trial our units of measure software and got feedback on our ideas for stencil computations. We directly inspired at least one group to try using our software as part of their project. One attendee travelled internationally to attend.
Year(s) Of Engagement Activity 2017
URL https://camfort.github.io/tvcs2017/
 
Description Organised meeting on Testing and Verification for Computational Science (TVCS) 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Professional Practitioners
Results and Impact The need for more rigorous software verification in computational science is well known. Often the responsibility is placed on the scientist, but increased care and attention is not enough. There is a wealth of research in computer science aimed at automating testing and verification, yet little of this has crossed over into practice in the sciences.

This meeting will provide a forum to discuss recent work and new ideas, and to foster links between researchers interested in the intersection of verification, programming languages, and computational science. The meeting is aimed at both computer scientists and natural scientists employing computational techniques.

The meeting comprised of 10 talks given by a variety of computer scientists and computational scientists. Including a keynote from Sylvie Boldo from Inria in Orsay, France.

The objective was to start a dialogue between our fields and gain better understanding of problems. We made contact with users to trial our units of measure software and got feedback on our ideas for stencil computations.
Year(s) Of Engagement Activity 2016
URL http://www.cl.cam.ac.uk/~dao29/meeting-tvcs/
 
Description Presentation at Software Sustainability Institute's Collaborations Workshop 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Presentation on units of meaure and dimension checking for Fortran code.

We aimed to increase awareness and get feedback on the idea. One concrete outcome is that we were invited to present at the next Fortran Modernisation Workshop.
Year(s) Of Engagement Activity 2016
URL https://www.software.ac.uk/cw16
 
Description Presentations at Fortran Modernisation Workshops 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact We have given a number of presentations on the Camfort tool focussing on units of measure to computational scientists attending the Fortran Modernisation Workshop:

13-14 July 2016 - Oxford
28-29 July 2016 - University of Southampton
24-25 August 2016 - Culham Centre for Fusion Energy
01-02 September 2016 - Queen Mary University of London
28 September 2016 - Daresbury
03 Febuary 2017 - Manchester
28 April 2017 - Warwick
25 July 2017 - Barcelona

The most significant outcome of this has been that we were asked to provide Power8 binaries of CamFort so that our tool can be incorporated into practical tutorials. But we have also informed lots of scientists about the potential for lightweight verification.

Additional URLs
https://rsdn.oerc.ox.ac.uk/node/102
http://ngcm.soton.ac.uk/blog/2016-07-28/fortran-modernization-workshop.html
Year(s) Of Engagement Activity 2016
URL http://www.nag.co.uk/content/fortran-modernization-workshop
 
Description Radio show about computer modelling 
Form Of Engagement Activity A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Public/other audiences
Results and Impact Worked with the Naked Scientists team to produce a radio show about computer modelling and the research in the CamFort project. This was broadcast in the UK and South Africa and also made available online as part of their highly popular podcast series. There were more than 50,000 downloads.
Year(s) Of Engagement Activity 2019
URL https://www.thenakedscientists.com/podcasts/naked-scientists-podcast/computer-models-welcome-catwalk