CamFort: Automated evolution and verification of computational science models
Lead Research Organisation:
University of Cambridge
Department Name: Computer Science and Technology
Abstract
Scientific models play a vital role in science and policy making. Many models are now expressed as complex computer programs which are often the result of decades of research and development, possibly involving multiple researchers or teams. This has lead to significant investment in maintaining these models and evolving them to use modern programming approaches or to work efficiently on new hardware platforms (such as cloud computing resources). However, the complexity of these models makes maintenance and evolution difficult. In particular, changing a complex model's code whilst ensuring it produces the same results is difficult; maintenance/evolution of complex models is often error prone.
The complexity of a piece of software can be classified as either intrinsic or accidental. Intrinsic complexity is an essential reflection of the complexity inherent in the problem and solution at hand. Alternatively, accidental complexity arises from the particular programming language, design or tools used to implement the solution. Many of the research contributions of programming language design and software engineering have been aimed at reducing the accidental complexity of software. However, many of these approaches have not been targetted at scientific computing.
There is now a need to develop these contributions so that they meet the needs of scientists. Addressing these needs will provide huge benefits to science and policy through increased productivity and trust in models. Our collaborations with leading research groups in science have highlighted the huge existing investments in established models. We are therefore aiming to support the evolution, rather than replacement, of existing code and working practices. Our goal is to apply cutting edge programming language and software engineering research to help develop "sustainable" software, which maintains its value over generations of researcher. Our focus is on models developed in the Fortran language, as this remains a dominant programming language in scientific computing, owing in part to its longevity.
We will provide practical tools which scientists can use to reduce the accidental complexity of models through evolving a code base, as well as tools for automatically verifying that any maintenance/evolution activities preserve the models behaviour. We will develop new mechanisms for program comprehension and transformation in order to bring effective techniques from programming language design and software engineering across the chasm to scientific computing. Ultimately, reducing the effort to maintain and evolve code will free-up scientists to focus on the core aspects of the science, and will lead to models that are more easily communicated, disseminated, and reused between researchers, supporting core ideals of science.
The complexity of a piece of software can be classified as either intrinsic or accidental. Intrinsic complexity is an essential reflection of the complexity inherent in the problem and solution at hand. Alternatively, accidental complexity arises from the particular programming language, design or tools used to implement the solution. Many of the research contributions of programming language design and software engineering have been aimed at reducing the accidental complexity of software. However, many of these approaches have not been targetted at scientific computing.
There is now a need to develop these contributions so that they meet the needs of scientists. Addressing these needs will provide huge benefits to science and policy through increased productivity and trust in models. Our collaborations with leading research groups in science have highlighted the huge existing investments in established models. We are therefore aiming to support the evolution, rather than replacement, of existing code and working practices. Our goal is to apply cutting edge programming language and software engineering research to help develop "sustainable" software, which maintains its value over generations of researcher. Our focus is on models developed in the Fortran language, as this remains a dominant programming language in scientific computing, owing in part to its longevity.
We will provide practical tools which scientists can use to reduce the accidental complexity of models through evolving a code base, as well as tools for automatically verifying that any maintenance/evolution activities preserve the models behaviour. We will develop new mechanisms for program comprehension and transformation in order to bring effective techniques from programming language design and software engineering across the chasm to scientific computing. Ultimately, reducing the effort to maintain and evolve code will free-up scientists to focus on the core aspects of the science, and will lead to models that are more easily communicated, disseminated, and reused between researchers, supporting core ideals of science.
Planned Impact
Computer modelling is now a large part of science and engineering research. However, there are many challenges, e.g., verification, parallelisation, and maintenance. Despite significant programming language and software engineering research to address these concerns, very few results are applied to science; there is little interaction between natural science and programming research communities. Given the importance, the benefits, and the challenges of computational science, it is critical that foundational programming language and software engineering research be targeted at computational science. There is a great opportunity for progress and impact if these communities interact. Our project does exactly this, addressing problems of verified model evolution. However, many more interdisciplinary projects are needed. Part of our project's impact is therefore in promoting a cultural shift to computer science engagement with the sciences, which is critical to the health of future scientific research.
In programming language and software engineering research, our project will provide contributions to various areas such as (1) parsing - accommodating languages with embedded fragments of an unrecognised language, e.g. pre-processor code; (2) program analysis - for advanced refactorings and new verification techniques, e.g. advanced resource analysis and high-level programming pattern detection; (3) verification - techniques for differential testing of program fragments (e.g., before and after refactoring). More broadly, the interdisciplinary nature of this project will provide knowledge and data on the programming needs of computational scientists, which will aid future language research into new programming approaches for science.
As noted by the 'Programming languages and compilers' research area of the EPSRC ICT theme, a lack of implementation effort in this area has held-back impact. Our project includes significant implementation and engagement effort to ensure adoption, impact, and appropriate evaluation of the new programming tools.
Our work benefits scientists and engineers in academia and industry who are involved in computer modelling using Fortran. Our tools will aid the evolution of existing code base which may use custom pre-processors. Furthermore, our tools for automated testing will improve verification of evolution activities, reducing bugs and the effort required to find and correct them. In the short term, our tools will have immediate impact on our collaborators as our research is evaluated in the context of active scientific research. Our collaborators include the Cambridge Centre for Climate Change Mitigation Research (4CMR), the Met Office (the Unified Model), the Centre for Climate Science (Department of Geography), and the Department of Chemistry. The scope of our impact is broadened by dissemination activities (workshops, seminars) and providing the tools free online (and open source).
On medium to long-term scales, our work will indirectly benefit other aspects of computer modelling by reducing maintenance and verification efforts and thus allowing scientists to focus on other aspects, e.g. underlying science and validation of models.
Scientific research benefits society mainly through technology (when science is applied via engineering) and policy (informed by predictions). Since our research aims to improve computer modelling in science and engineering, we expect more long-term, indirect impact, although this is difficult to quantify given the technical nature of our research. Nonetheless, many of our collaborators are climate scientists and thus there is route to social, environmental, and policy-based impact. For example, the 4CMR has a number of ongoing projects with policy makers (including OFWAT, the Environment Agency, Department of Energy and Climate Change, Highways Agency, and British Land); our collaboration with 4CMR provides an opportunity for indirect policy/social impact.
In programming language and software engineering research, our project will provide contributions to various areas such as (1) parsing - accommodating languages with embedded fragments of an unrecognised language, e.g. pre-processor code; (2) program analysis - for advanced refactorings and new verification techniques, e.g. advanced resource analysis and high-level programming pattern detection; (3) verification - techniques for differential testing of program fragments (e.g., before and after refactoring). More broadly, the interdisciplinary nature of this project will provide knowledge and data on the programming needs of computational scientists, which will aid future language research into new programming approaches for science.
As noted by the 'Programming languages and compilers' research area of the EPSRC ICT theme, a lack of implementation effort in this area has held-back impact. Our project includes significant implementation and engagement effort to ensure adoption, impact, and appropriate evaluation of the new programming tools.
Our work benefits scientists and engineers in academia and industry who are involved in computer modelling using Fortran. Our tools will aid the evolution of existing code base which may use custom pre-processors. Furthermore, our tools for automated testing will improve verification of evolution activities, reducing bugs and the effort required to find and correct them. In the short term, our tools will have immediate impact on our collaborators as our research is evaluated in the context of active scientific research. Our collaborators include the Cambridge Centre for Climate Change Mitigation Research (4CMR), the Met Office (the Unified Model), the Centre for Climate Science (Department of Geography), and the Department of Chemistry. The scope of our impact is broadened by dissemination activities (workshops, seminars) and providing the tools free online (and open source).
On medium to long-term scales, our work will indirectly benefit other aspects of computer modelling by reducing maintenance and verification efforts and thus allowing scientists to focus on other aspects, e.g. underlying science and validation of models.
Scientific research benefits society mainly through technology (when science is applied via engineering) and policy (informed by predictions). Since our research aims to improve computer modelling in science and engineering, we expect more long-term, indirect impact, although this is difficult to quantify given the technical nature of our research. Nonetheless, many of our collaborators are climate scientists and thus there is route to social, environmental, and policy-based impact. For example, the 4CMR has a number of ongoing projects with policy makers (including OFWAT, the Environment Agency, Department of Energy and Climate Change, Highways Agency, and British Land); our collaboration with 4CMR provides an opportunity for indirect policy/social impact.
Organisations
- University of Cambridge (Lead Research Organisation, Project Partner)
- Meteorological Office UK (Collaboration)
- Bloomberg (Collaboration)
- Polyhedron Software ltd (Project Partner)
- Galois (United States) (Project Partner)
- Cambridge Econometrics (United Kingdom) (Project Partner)
- Naked Science Limited (Project Partner)
- Baincore Limited (Project Partner)
- National Centre for Atmospheric Science (Project Partner)
Publications
Contrastin M
(2018)
Automatic Reordering for Dataflow Safety of Datalog
Contrastin M
(2016)
Units-of-Measure Correctness in Fortran Programs
in Computing in Science & Engineering
Contrastin Mistral
(2016)
Units-of-Measure Correctness in Fortran Programs
in COMPUTING IN SCIENCE & ENGINEERING
Contrastin MOJP
(2016)
Lightning Talk: Supporting Software Sustainability with Lightweight Specifications
Danish M
(2019)
Learning Units-of-Measure from Scientific Code
Dominic Orchard
(2017)
Behavioural Types: from Theory to Tools
Gaboardi M
(2016)
Combining effects and coeffects via grading
Imtiaz Q
(2022)
CuO-based materials for thermochemical redox cycles: the influence of the formation of a CuO percolation network on oxygen release and oxidation kinetics.
in Discover chemical engineering
Orchard D
(2017)
Verifying spatial properties of array computations
in Proceedings of the ACM on Programming Languages
Description | We have a developed an free software tool called CamFort which can be used by scientists to enhance the software used in scientific modelling. Our research has developed two specific enhancements. The first is unit checking: this ensures that a programmer does not accidentally add a quantity in e.g. metres to a quantity in e.g. seconds, and correctly notices that 5 metres divided by 2 seconds is 2.5 metres per second. The second is stencil specifications: this helps a programmer writing complicated loops over arrays (tables) of data to make sure that they are reading from the correct position: many numerical algorithms are described in terms such as "centred in space" or "backwards in time" - these map onto simple stencils which our tool can check are followed. |
Exploitation Route | Scientists may make free use of our tool and we are working to expand the range of Fortran code which we can deal with. |
Sectors | Environment Manufacturing including Industrial Biotechology |
Description | The Met Office are adopting CamFort in their software pipeline with the aiming of further improving their code quality. We have been awarded a IAA Postdoctoral placement award and Knowledge Transfer Placement to work with them on this. Bloomberg (USA) are using CamFort in their internal tools for managing their Fortran code base. |
First Year Of Impact | 2018 |
Sector | Environment,Financial Services, and Management Consultancy |
Impact Types | Economic |
Description | EPSRC IAA Knowledge Transfer Fellowships |
Amount | £43,430 (GBP) |
Organisation | University of Cambridge |
Sector | Academic/University |
Country | United Kingdom |
Start | 12/2018 |
End | 09/2019 |
Description | EPSRC IAA Postdoctoral Placements |
Amount | £12,465 (GBP) |
Organisation | University of Cambridge |
Sector | Academic/University |
Country | United Kingdom |
Start | 08/2018 |
End | 12/2018 |
Description | Software Sustainability Institute Fellowship |
Amount | £3,000 (GBP) |
Organisation | University of Edinburgh |
Department | UK Software Sustainability Institute |
Sector | Academic/University |
Country | United Kingdom |
Start | 01/2015 |
End | 12/2015 |
Description | Supporting the development of CamFort (donation) |
Amount | £99,000 (GBP) |
Organisation | Bloomberg |
Sector | Private |
Country | United States |
Start | 02/2020 |
Description | Undergraduate Intern Scholarships from the Science Faculty Impact Development Fund, University of Kent |
Amount | £4,750 (GBP) |
Organisation | University of Kent |
Sector | Academic/University |
Country | United Kingdom |
Start | 05/2017 |
End | 08/2017 |
Title | Research data supporting Verifying Spatial Properties of Array Computations |
Description | This dataset reproduces findings from our paper "Verifying Spatial Properties of Array Computations". It contains the code necessary to reproduce our study of array programming idioms in scientific/numerical Fortran code (Section 2 of the paper), to demonstrate the inference, checking, and specification synthesis features of our verification tool via examples (Section 3 of the paper), and to produce statistics describing the various kinds of specifications that can be inferred over our test corpus (Section 7 of the paper). |
Type Of Material | Database/Collection of data |
Year Produced | 2017 |
Provided To Others? | Yes |
Title | Software and data supporting "Lightning Talk: Supporting Software Sustainability with Lightweight Specifications" |
Description | Provides a code snapshot of CamFort, evaluation scripts, and sample code to reproduce the results shown in the associated paper. |
Type Of Material | Computer model/algorithm |
Year Produced | 2016 |
Provided To Others? | Yes |
Impact | The CamFort tool and infrastructure has formed the basis of various papers following this initial paper. CamFort and its supporting libraries are now being employed in Bloomberg and the Met Office. |
URL | https://www.repository.cam.ac.uk/handle/1810/257261 |
Description | Bloomberg CamFort development |
Organisation | Bloomberg |
Country | United States |
Sector | Private |
PI Contribution | Bloomberg have adopted our tools for their internal projects managing their codebase. |
Collaborator Contribution | Direct contributions to the CamFort software. |
Impact | Software updates and contributions to the CamFort tools |
Start Year | 2017 |
Description | Met Office |
Organisation | Meteorological Office UK |
Country | United Kingdom |
Sector | Academic/University |
PI Contribution | Adaption of the CamFort tool to work with the Unified Model. Application of units of measure checking to the Unified Model. |
Collaborator Contribution | New directions for CamFort research, in particular how to support the new lFric model under development. |
Impact | EPSRC IAA Knowledge Transfer Fellowship EPSRC IAA Postdoctoral Placement |
Start Year | 2018 |
Title | CamFort |
Description | CamFort is a refactoring and verification tool for scientific Fortran programs. It currently supports Fortran 66, 77, and 90 with various legacy exntesions. |
Type Of Technology | Software |
Year Produced | 2016 |
Open Source License? | Yes |
Impact | 14 people watching project for updates, 67 people have starred our repository, 8 people have forked the project to work on their own. We've given tutorials on Camfort at numerous venues - see engagement activities. |
URL | https://camfort.github.io |
Title | fortran-src: Fortran parsing and analysis infrastructure |
Description | fortran-src provides a Haskell library for lexing, parsing, and analysing Fortran code covering standards: FORTRAN 66, FORTRAN 77, Fortran 90, and Fortran 95. |
Type Of Technology | Software |
Year Produced | 2015 |
Open Source License? | Yes |
Impact | fortran-src has now been adopted as a key piece of infrastructure for a team at Bloomberg. fortran-src is the backbone of the rest of the CamFort system. As of March 2019: 11 people are watching the repository, 23 people have 'starred', and 13 people have forked the repository. |
URL | https://github.com/camfort/fortran-src |
Description | Cambridge Econometrics Parallelisation Meeting |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Industry/Business |
Results and Impact | Cambridge Econometrics were looking for ways to incorporate the CamFort tool in the development of their E3MG model. In particular they are looking for ways to improve the performance of their code through parallelisation. We identified some interesting new research ideas to try and they are keen to collaborate on a future grant. |
Year(s) Of Engagement Activity | 2018 |
Description | Departmental seminar at University of Warwick |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Postgraduate students |
Results and Impact | Departmental seminar at University of Warwick "Save our Science: Experiences in lightweight verification for computational science and what you can do to help" |
Year(s) Of Engagement Activity | 2018 |
Description | Invited lecture at Southampton CDT |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Postgraduate students |
Results and Impact | Invited lecture at Southampton CDT "Next Generation Computational Modelling", title "Lightweight Verification for Computational Science Models". The intention was to inform new computational scientists about the need for verification and testing in scientific code and to present ways of doing this. |
Year(s) Of Engagement Activity | 2016 |
Description | Invited trip to Nasa JPL, USA |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | Local |
Primary Audience | Industry/Business |
Results and Impact | We were invited to present and discuss our work on units of measure. The organisers of the meeting wrote the original Fortran units of measure proposal. On the upshot of this we have been invited to attend the Fortran standards committee meetings. |
Year(s) Of Engagement Activity | 2018 |
Description | Organised 2nd meeting on Testing and Verification for Computational Science (TVCS) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | The need for more rigorous software verification in computational science is well known. Often the responsibility is placed on the scientist, but increased care and attention is not enough. There is a wealth of research in computer science aimed at automating testing and verification, yet little of this has crossed over into practice in the sciences. This meeting provides a forum to discuss recent work and new ideas, and to foster links between researchers interested in the intersection of verification, programming languages, and computational science. The meeting is aimed at both computer scientists and natural/physical scientists employing computational techniques. By popular demand we ran this 2nd installation of the meeting after the first. We made contact with users to trial our units of measure software and got feedback on our ideas for stencil computations. We directly inspired at least one group to try using our software as part of their project. One attendee travelled internationally to attend. |
Year(s) Of Engagement Activity | 2017 |
URL | https://camfort.github.io/tvcs2017/ |
Description | Organised meeting on Testing and Verification for Computational Science (TVCS) |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Professional Practitioners |
Results and Impact | The need for more rigorous software verification in computational science is well known. Often the responsibility is placed on the scientist, but increased care and attention is not enough. There is a wealth of research in computer science aimed at automating testing and verification, yet little of this has crossed over into practice in the sciences. This meeting will provide a forum to discuss recent work and new ideas, and to foster links between researchers interested in the intersection of verification, programming languages, and computational science. The meeting is aimed at both computer scientists and natural scientists employing computational techniques. The meeting comprised of 10 talks given by a variety of computer scientists and computational scientists. Including a keynote from Sylvie Boldo from Inria in Orsay, France. The objective was to start a dialogue between our fields and gain better understanding of problems. We made contact with users to trial our units of measure software and got feedback on our ideas for stencil computations. |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.cl.cam.ac.uk/~dao29/meeting-tvcs/ |
Description | Presentation at Software Sustainability Institute's Collaborations Workshop |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Presentation on units of meaure and dimension checking for Fortran code. We aimed to increase awareness and get feedback on the idea. One concrete outcome is that we were invited to present at the next Fortran Modernisation Workshop. |
Year(s) Of Engagement Activity | 2016 |
URL | https://www.software.ac.uk/cw16 |
Description | Presentations at Fortran Modernisation Workshops |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | We have given a number of presentations on the Camfort tool focussing on units of measure to computational scientists attending the Fortran Modernisation Workshop: 13-14 July 2016 - Oxford 28-29 July 2016 - University of Southampton 24-25 August 2016 - Culham Centre for Fusion Energy 01-02 September 2016 - Queen Mary University of London 28 September 2016 - Daresbury 03 Febuary 2017 - Manchester 28 April 2017 - Warwick 25 July 2017 - Barcelona The most significant outcome of this has been that we were asked to provide Power8 binaries of CamFort so that our tool can be incorporated into practical tutorials. But we have also informed lots of scientists about the potential for lightweight verification. Additional URLs https://rsdn.oerc.ox.ac.uk/node/102 http://ngcm.soton.ac.uk/blog/2016-07-28/fortran-modernization-workshop.html |
Year(s) Of Engagement Activity | 2016 |
URL | http://www.nag.co.uk/content/fortran-modernization-workshop |
Description | Radio show about computer modelling |
Form Of Engagement Activity | A press release, press conference or response to a media enquiry/interview |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Public/other audiences |
Results and Impact | Worked with the Naked Scientists team to produce a radio show about computer modelling and the research in the CamFort project. This was broadcast in the UK and South Africa and also made available online as part of their highly popular podcast series. There were more than 50,000 downloads. |
Year(s) Of Engagement Activity | 2019 |
URL | https://www.thenakedscientists.com/podcasts/naked-scientists-podcast/computer-models-welcome-catwalk |