CamFort: Automated evolution and verification of computational science models

Lead Research Organisation: University of Cambridge

Department Name: Computer Science and Technology

Abstract

Scientific models play a vital role in science and policy making. Many models are now expressed as complex computer programs which are often the result of decades of research and development, possibly involving multiple researchers or teams. This has lead to significant investment in maintaining these models and evolving them to use modern programming approaches or to work efficiently on new hardware platforms (such as cloud computing resources). However, the complexity of these models makes maintenance and evolution difficult. In particular, changing a complex model's code whilst ensuring it produces the same results is difficult; maintenance/evolution of complex models is often error prone.

The complexity of a piece of software can be classified as either intrinsic or accidental. Intrinsic complexity is an essential reflection of the complexity inherent in the problem and solution at hand. Alternatively, accidental complexity arises from the particular programming language, design or tools used to implement the solution. Many of the research contributions of programming language design and software engineering have been aimed at reducing the accidental complexity of software. However, many of these approaches have not been targetted at scientific computing.

There is now a need to develop these contributions so that they meet the needs of scientists. Addressing these needs will provide huge benefits to science and policy through increased productivity and trust in models. Our collaborations with leading research groups in science have highlighted the huge existing investments in established models. We are therefore aiming to support the evolution, rather than replacement, of existing code and working practices. Our goal is to apply cutting edge programming language and software engineering research to help develop "sustainable" software, which maintains its value over generations of researcher. Our focus is on models developed in the Fortran language, as this remains a dominant programming language in scientific computing, owing in part to its longevity.

We will provide practical tools which scientists can use to reduce the accidental complexity of models through evolving a code base, as well as tools for automatically verifying that any maintenance/evolution activities preserve the models behaviour. We will develop new mechanisms for program comprehension and transformation in order to bring effective techniques from programming language design and software engineering across the chasm to scientific computing. Ultimately, reducing the effort to maintain and evolve code will free-up scientists to focus on the core aspects of the science, and will lead to models that are more easily communicated, disseminated, and reused between researchers, supporting core ideals of science.

Planned Impact

Computer modelling is now a large part of science and engineering research. However, there are many challenges, e.g., verification, parallelisation, and maintenance. Despite significant programming language and software engineering research to address these concerns, very few results are applied to science; there is little interaction between natural science and programming research communities. Given the importance, the benefits, and the challenges of computational science, it is critical that foundational programming language and software engineering research be targeted at computational science. There is a great opportunity for progress and impact if these communities interact. Our project does exactly this, addressing problems of verified model evolution. However, many more interdisciplinary projects are needed. Part of our project's impact is therefore in promoting a cultural shift to computer science engagement with the sciences, which is critical to the health of future scientific research.

In programming language and software engineering research, our project will provide contributions to various areas such as (1) parsing - accommodating languages with embedded fragments of an unrecognised language, e.g. pre-processor code; (2) program analysis - for advanced refactorings and new verification techniques, e.g. advanced resource analysis and high-level programming pattern detection; (3) verification - techniques for differential testing of program fragments (e.g., before and after refactoring). More broadly, the interdisciplinary nature of this project will provide knowledge and data on the programming needs of computational scientists, which will aid future language research into new programming approaches for science.

As noted by the 'Programming languages and compilers' research area of the EPSRC ICT theme, a lack of implementation effort in this area has held-back impact. Our project includes significant implementation and engagement effort to ensure adoption, impact, and appropriate evaluation of the new programming tools.

Our work benefits scientists and engineers in academia and industry who are involved in computer modelling using Fortran. Our tools will aid the evolution of existing code base which may use custom pre-processors. Furthermore, our tools for automated testing will improve verification of evolution activities, reducing bugs and the effort required to find and correct them. In the short term, our tools will have immediate impact on our collaborators as our research is evaluated in the context of active scientific research. Our collaborators include the Cambridge Centre for Climate Change Mitigation Research (4CMR), the Met Office (the Unified Model), the Centre for Climate Science (Department of Geography), and the Department of Chemistry. The scope of our impact is broadened by dissemination activities (workshops, seminars) and providing the tools free online (and open source).

On medium to long-term scales, our work will indirectly benefit other aspects of computer modelling by reducing maintenance and verification efforts and thus allowing scientists to focus on other aspects, e.g. underlying science and validation of models.

Scientific research benefits society mainly through technology (when science is applied via engineering) and policy (informed by predictions). Since our research aims to improve computer modelling in science and engineering, we expect more long-term, indirect impact, although this is difficult to quantify given the technical nature of our research. Nonetheless, many of our collaborators are climate scientists and thus there is route to social, environmental, and policy-based impact. For example, the 4CMR has a number of ongoing projects with policy makers (including OFWAT, the Environment Agency, Department of Energy and Climate Change, Highways Agency, and British Land); our collaboration with 4CMR provides an opportunity for indirect policy/social impact.

Funded Value:

£542,082

Funded Period:

Jun 15 - Aug 18

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/M026124/1

Principal Investigator:

Andrew Rice

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (50%)

Software Engineering (50%)

Organisations

People	ORCID iD
Andrew Rice (Principal Investigator)
Dominic Orchard (Co-Investigator)
Alan Mycroft (Co-Investigator)

Publications

Author Name

Title Publication Date Published

|< < 1 2 > >|

10 25 50

Clark H (2017) Scrap Your Reprinter: A Datatype Generic Algorithm for Layout-Preserving Refactoring

Contrastin M (2018) Automatic Reordering for Dataflow Safety of Datalog

Contrastin M (2016) Units-of-Measure Correctness in Fortran Programs in Computing in Science & Engineering

Contrastin Mistral (2016) Units-of-Measure Correctness in Fortran Programs in COMPUTING IN SCIENCE & ENGINEERING

Contrastin MOJP (2016) Lightning Talk: Supporting Software Sustainability with Lightweight Specifications

Danish M (2019) Learning Units-of-Measure from Scientific Code

Dominic Orchard (2017) Behavioural Types: from Theory to Tools

Gaboardi M (2016) Combining effects and coeffects via grading

Imtiaz Q (2022) CuO-based materials for thermochemical redox cycles: the influence of the formation of a CuO percolation network on oxygen release and oxidation kinetics. in Discover chemical engineering

Orchard D (2017) Verifying spatial properties of array computations in Proceedings of the ACM on Programming Languages

Key Findings
Impact Summary
Further Funding
Research Databases and Models
Collaboration
Software and Technical Products
Engagement Activities


Description	We have a developed an free software tool called CamFort which can be used by scientists to enhance the software used in scientific modelling. Our research has developed two specific enhancements. The first is unit checking: this ensures that a programmer does not accidentally add a quantity in e.g. metres to a quantity in e.g. seconds, and correctly notices that 5 metres divided by 2 seconds is 2.5 metres per second. The second is stencil specifications: this helps a programmer writing complicated loops over arrays (tables) of data to make sure that they are reading from the correct position: many numerical algorithms are described in terms such as "centred in space" or "backwards in time" - these map onto simple stencils which our tool can check are followed.
Exploitation Route	Scientists may make free use of our tool and we are working to expand the range of Fortran code which we can deal with.
Sectors	Environment Manufacturing including Industrial Biotechology


Description	The Met Office are adopting CamFort in their software pipeline with the aiming of further improving their code quality. We have been awarded a IAA Postdoctoral placement award and Knowledge Transfer Placement to work with them on this. Bloomberg (USA) are using CamFort in their internal tools for managing their Fortran code base.
First Year Of Impact	2018
Sector	Environment,Financial Services, and Management Consultancy
Impact Types	Economic


Description	EPSRC IAA Knowledge Transfer Fellowships
Amount	£43,430 (GBP)
Organisation	University of Cambridge
Sector	Academic/University
Country	United Kingdom
Start	12/2018
End	09/2019


Description	EPSRC IAA Postdoctoral Placements
Amount	£12,465 (GBP)
Organisation	University of Cambridge
Sector	Academic/University
Country	United Kingdom
Start	08/2018
End	12/2018


Description	Software Sustainability Institute Fellowship
Amount	£3,000 (GBP)
Organisation	University of Edinburgh
Department	UK Software Sustainability Institute
Sector	Academic/University
Country	United Kingdom
Start	01/2015
End	12/2015


Description	Supporting the development of CamFort (donation)
Amount	£99,000 (GBP)
Organisation	Bloomberg
Sector	Private
Country	United States
Start	02/2020


Description	Undergraduate Intern Scholarships from the Science Faculty Impact Development Fund, University of Kent
Amount	£4,750 (GBP)
Organisation	University of Kent
Sector	Academic/University
Country	United Kingdom
Start	05/2017
End	08/2017


Title	Research data supporting Verifying Spatial Properties of Array Computations
Description	This dataset reproduces findings from our paper "Verifying Spatial Properties of Array Computations". It contains the code necessary to reproduce our study of array programming idioms in scientific/numerical Fortran code (Section 2 of the paper), to demonstrate the inference, checking, and specification synthesis features of our verification tool via examples (Section 3 of the paper), and to produce statistics describing the various kinds of specifications that can be inferred over our test corpus (Section 7 of the paper).
Type Of Material	Database/Collection of data
Year Produced	2017
Provided To Others?	Yes


Title	Software and data supporting "Lightning Talk: Supporting Software Sustainability with Lightweight Specifications"
Description	Provides a code snapshot of CamFort, evaluation scripts, and sample code to reproduce the results shown in the associated paper.
Type Of Material	Computer model/algorithm
Year Produced	2016
Provided To Others?	Yes
Impact	The CamFort tool and infrastructure has formed the basis of various papers following this initial paper. CamFort and its supporting libraries are now being employed in Bloomberg and the Met Office.
URL	https://www.repository.cam.ac.uk/handle/1810/257261


Description	Bloomberg CamFort development
Organisation	Bloomberg
Country	United States
Sector	Private
PI Contribution	Bloomberg have adopted our tools for their internal projects managing their codebase.
Collaborator Contribution	Direct contributions to the CamFort software.
Impact	Software updates and contributions to the CamFort tools
Start Year	2017


Description	Met Office
Organisation	Meteorological Office UK
Country	United Kingdom
Sector	Academic/University
PI Contribution	Adaption of the CamFort tool to work with the Unified Model. Application of units of measure checking to the Unified Model.
Collaborator Contribution	New directions for CamFort research, in particular how to support the new lFric model under development.
Impact	EPSRC IAA Knowledge Transfer Fellowship EPSRC IAA Postdoctoral Placement
Start Year	2018


Title	CamFort
Description	CamFort is a refactoring and verification tool for scientific Fortran programs. It currently supports Fortran 66, 77, and 90 with various legacy exntesions.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	14 people watching project for updates, 67 people have starred our repository, 8 people have forked the project to work on their own. We've given tutorials on Camfort at numerous venues - see engagement activities.
URL	https://camfort.github.io


Title	fortran-src: Fortran parsing and analysis infrastructure
Description	fortran-src provides a Haskell library for lexing, parsing, and analysing Fortran code covering standards: FORTRAN 66, FORTRAN 77, Fortran 90, and Fortran 95.
Type Of Technology	Software
Year Produced	2015
Open Source License?	Yes
Impact	fortran-src has now been adopted as a key piece of infrastructure for a team at Bloomberg. fortran-src is the backbone of the rest of the CamFort system. As of March 2019: 11 people are watching the repository, 23 people have 'starred', and 13 people have forked the repository.
URL	https://github.com/camfort/fortran-src


Description	Cambridge Econometrics Parallelisation Meeting
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	Cambridge Econometrics were looking for ways to incorporate the CamFort tool in the development of their E3MG model. In particular they are looking for ways to improve the performance of their code through parallelisation. We identified some interesting new research ideas to try and they are keen to collaborate on a future grant.
Year(s) Of Engagement Activity	2018


Description	Departmental seminar at University of Warwick
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	Departmental seminar at University of Warwick "Save our Science: Experiences in lightweight verification for computational science and what you can do to help"
Year(s) Of Engagement Activity	2018


Description	Invited lecture at Southampton CDT
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Postgraduate students
Results and Impact	Invited lecture at Southampton CDT "Next Generation Computational Modelling", title "Lightweight Verification for Computational Science Models". The intention was to inform new computational scientists about the need for verification and testing in scientific code and to present ways of doing this.
Year(s) Of Engagement Activity	2016


Description	Invited trip to Nasa JPL, USA
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	Local
Primary Audience	Industry/Business
Results and Impact	We were invited to present and discuss our work on units of measure. The organisers of the meeting wrote the original Fortran units of measure proposal. On the upshot of this we have been invited to attend the Fortran standards committee meetings.
Year(s) Of Engagement Activity	2018


Description	Organised 2nd meeting on Testing and Verification for Computational Science (TVCS)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	The need for more rigorous software verification in computational science is well known. Often the responsibility is placed on the scientist, but increased care and attention is not enough. There is a wealth of research in computer science aimed at automating testing and verification, yet little of this has crossed over into practice in the sciences. This meeting provides a forum to discuss recent work and new ideas, and to foster links between researchers interested in the intersection of verification, programming languages, and computational science. The meeting is aimed at both computer scientists and natural/physical scientists employing computational techniques. By popular demand we ran this 2nd installation of the meeting after the first. We made contact with users to trial our units of measure software and got feedback on our ideas for stencil computations. We directly inspired at least one group to try using our software as part of their project. One attendee travelled internationally to attend.
Year(s) Of Engagement Activity	2017
URL	https://camfort.github.io/tvcs2017/


Description	Organised meeting on Testing and Verification for Computational Science (TVCS)
Form Of Engagement Activity	Participation in an activity, workshop or similar
Part Of Official Scheme?	No
Geographic Reach	Regional
Primary Audience	Professional Practitioners
Results and Impact	The need for more rigorous software verification in computational science is well known. Often the responsibility is placed on the scientist, but increased care and attention is not enough. There is a wealth of research in computer science aimed at automating testing and verification, yet little of this has crossed over into practice in the sciences. This meeting will provide a forum to discuss recent work and new ideas, and to foster links between researchers interested in the intersection of verification, programming languages, and computational science. The meeting is aimed at both computer scientists and natural scientists employing computational techniques. The meeting comprised of 10 talks given by a variety of computer scientists and computational scientists. Including a keynote from Sylvie Boldo from Inria in Orsay, France. The objective was to start a dialogue between our fields and gain better understanding of problems. We made contact with users to trial our units of measure software and got feedback on our ideas for stencil computations.
Year(s) Of Engagement Activity	2016
URL	http://www.cl.cam.ac.uk/~dao29/meeting-tvcs/


Description	Presentation at Software Sustainability Institute's Collaborations Workshop
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	Presentation on units of meaure and dimension checking for Fortran code. We aimed to increase awareness and get feedback on the idea. One concrete outcome is that we were invited to present at the next Fortran Modernisation Workshop.
Year(s) Of Engagement Activity	2016
URL	https://www.software.ac.uk/cw16


Description	Presentations at Fortran Modernisation Workshops
Form Of Engagement Activity	A talk or presentation
Part Of Official Scheme?	No
Geographic Reach	National
Primary Audience	Professional Practitioners
Results and Impact	We have given a number of presentations on the Camfort tool focussing on units of measure to computational scientists attending the Fortran Modernisation Workshop: 13-14 July 2016 - Oxford 28-29 July 2016 - University of Southampton 24-25 August 2016 - Culham Centre for Fusion Energy 01-02 September 2016 - Queen Mary University of London 28 September 2016 - Daresbury 03 Febuary 2017 - Manchester 28 April 2017 - Warwick 25 July 2017 - Barcelona The most significant outcome of this has been that we were asked to provide Power8 binaries of CamFort so that our tool can be incorporated into practical tutorials. But we have also informed lots of scientists about the potential for lightweight verification. Additional URLs https://rsdn.oerc.ox.ac.uk/node/102 http://ngcm.soton.ac.uk/blog/2016-07-28/fortran-modernization-workshop.html
Year(s) Of Engagement Activity	2016
URL	http://www.nag.co.uk/content/fortran-modernization-workshop


Description	Radio show about computer modelling
Form Of Engagement Activity	A press release, press conference or response to a media enquiry/interview
Part Of Official Scheme?	No
Geographic Reach	International
Primary Audience	Public/other audiences
Results and Impact	Worked with the Naked Scientists team to produce a radio show about computer modelling and the research in the CamFort project. This was broadcast in the UK and South Africa and also made available online as part of their highly popular podcast series. There were more than 50,000 downloads.
Year(s) Of Engagement Activity	2019
URL	https://www.thenakedscientists.com/podcasts/naked-scientists-podcast/computer-models-welcome-catwalk

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications