Verifying Resource-like Data Use in Programs via Types

Lead Research Organisation: University of Kent
Department Name: Sch of Computing

Abstract

Software is integral to the fabric of our lives, controlling transport, the economy and
infrastructure, and providing the main tools of work and leisure. The cost of software failures is
therefore high, to productivity, stability, safety, and privacy. As an indication of the economic
impact, it is estimated that software errors cost the US economy tens of billions of dollars last
decade. Software errors can also impact human safety as software is used to control transport,
infrastructure, and medical equipment. The research agenda of program verification aims to mitigate
these risks by putting software engineering on a rigorous foundation through techniques to guarantee
program correctness. As software becomes more complex and pervasive, program verification is ever
more important. This work aims to advance the state-of-the-art in verification at the point of
program development through advancements in programming language theory and practice.

Programming languages are the core tool in which software is constructed; they are the means of
communicating our intentions to computer hardware. There are various design trade-offs when creating
a programming language, which has led to the variety of programming languages in use today. Some
languages support verification by including a "type system" which categorises data and operations,
ensuring that operations are only applied to data of the correct type. This provides some guarantees
about the correctness of the program by ruling out various kinds of error before the program is ever
executed. A subfield of programming language research aims to make type systems more expressive so
that they can describe and enforce more properties of programs and thus raise the level of
verification possible within the language. The current proposal aims to do just this, focussing on
the notion of data as a "resource" subject to constraints which should be enforced by a language's
type system. This project develops a particular technique for building such type systems in a way
that can capture various kinds of property in one system.

The manipulation of data is central to the task of programming. Current programming languages
essentially view data as an infinitely-replicable resource that can be stored, manipulated, and
communicated without restriction. However, this perspective is naive. Some data is private and thus
should not be arbitrarily copied, stored, or communicated. Some data is large and thus should not be
replicated too frequently. Some data acts as a way of interfacing with other parts of a system,
e.g., files or communication channels, which are then subject to restrictions on how they can be
used, such as agreed protocols of interaction. Most programming languages are agnostic to these
constraints however, treating data as totally unconstrained. This can lead to various software
errors, including privacy breaches, performance problems, and crashes due to incorrect interactions
between parts of a system. The goal of this research is to embed the constraints associated with
data into the type system of a language so that these additional constraints can be automatically
enforced. The key technology for doing this is a novel notion of types called "graded modal types"
which can capture and track different kinds of information about the structure of programs and the
flow of data through them.

This work will develop the theory and practice of graded modal types, producing a prototype language
that demonstrates their use for verifying program properties. Three case studies will be carried out
to demonstrate their power: (1) ensuring privacy and confidentiality, (2) capturing and reasoning
about performance e.g., how fast a program will execute relative to the size of its inputs and how
much memory it will consume (3) enforcing fine-grained protocols of interaction.

This work is a step towards a new generation of trustworthy software.

Planned Impact

The pervasive nature of software in our society means that the tools used for creating software have
wide-reaching impact. Programming languages in particular impact the ability of software engineers
(developers) to read, write, and reason about their programs, as well as the ability of computers to
effectively execute programs. This project is primarily concerned with making it easier for
developers to reason about the use of data in their programs, and to ensure that extra-functional
constraints associated to data (e.g., confidentiality) are respected. This kind of reasoning is
particularly important for developers of software used in high-risk contexts, e.g., travel, medical
applications, infrastructure, and for developers of software that handles sensitive data, e.g.,
mobile phone apps, websites, financial services. The techniques to provide this enhanced ability
for program reasoning and verification involve scientific advances in the area of programming
language theory. The practical application of this theory will deliver a novel programming approach,
exemplified in a new language which can assist program verification. Thus this project will deliver
complementary impact across both academia and industry, and further to society and the economy via
its impact in the software development industry.

In terms of industrial impact, and beyond, there are two many routes in which the main practical
deliverable, the language Granule, can have impact: (1) by becoming a mature programming language
that can be used by developers; (2) by influencing the design of existing and future programming
languages. Within the lifetime of this project, the latter mode of impact is the most likely, though
the impact activities here support both routes to delivering impact.

The Granule implementation will be open source, allowing ideas to be easily adopted by other
language implementers and encouraging a community of developers to grow, contributing to Granule
directly. The language will be promoted online, including via a series of online tutorial videos,
written tutorials, and online documentation. In the second year of the project, an online hackathon
competition will be run to promote Granule and to encourage early contributions and feedback from
practitioners. Industry talks will also be given and an academic workshop at the end of year 1 will
also be open to industrials. All these impact activities will not only help to promote the work to
industry and other language implementors, but will also assist the academic impact as well.

The impact and popularity of a language often relates to the amount and quality of library
support. Therefore, there will be a focus in the second year of the project on improving the
libraries and encouraging development of these by others outside the project team (see above).

Running throughout the project are three key practical application case studies: privacy, cost, and
protocols. The case studies will help to ensure that the research links with practical
applications, and will then serve as a way to demonstrate the verification technique developed
here. The privacy case study is in collaboration with industrial partner Galois Inc., and is
particularly relevant in the context of modern software development, where privacy is increasingly a
concern and there are increasing legal requirements to meet data protection standards.

Following from the industrial impact, the benefits to society and the economy are then yielded in
terms of correct and trustworthy software, with fewer bugs, which in turn reduces the cost of
software and reduces risk to stability, safety, and privacy. Take together, the long-term goals of
the project help to support a more rigorous engineering foundation for the UK's considerable
software industry, and further afield.

Publications

10 25 50
publication icon
Bocchi L (2022) A Theory of Composing Protocols in The Art, Science, and Engineering of Programming

publication icon
Ivaškovic, A. (2020) Data-flow analyses as effects and graded monads in Leibniz International Proceedings in Informatics, LIPIcs

publication icon
Koparkar C (2021) Efficient tree-traversals: reconciling parallelism and dense data representations in Proceedings of the ACM on Programming Languages

publication icon
Marshall D (2022) Replicate, Reuse, Repeat: Capturing Non-Linear Communication via Session Types and Graded Modal Types in Electronic Proceedings in Theoretical Computer Science

 
Description Given an adequate specification of a computer program, a computer can automatically generate a program to satisfy the specification; this is a technique known as program synthesis which depends on having enough information in the specification, and a sophisticated enough synthesis technique, in order to generate suitable programs. In the first six months of this award we have devised a new technique for synthesising certain kinds of programs based on specifications about how they use their input data, specified in an abstract form which is very different to the program itself, and requires little effort from the human user. This leverages two key theoretical ideas at heart of this project which allow a programming language to describe how data is used in a general way which can be used to talk about privacy and confidentiality of data. This work constitutes a leap forward in leveraging this information for automatic program synthesis.

The project has also developed a techniques for understanding computer programs by generalising and extending various existing ideas in the literature. Furthermore, we have developed new models for existing programming languages. Taken together, these deepen our understanding of how to reason about programs which can be used, for example, to determine program correctness.
Exploitation Route The technique developed can be deployed in other languages which has some of the same key ideas as our research language: Haskell and Idris. We are working on a variant of our technique that can be deployed in these more main stream languages.
Sectors Digital/Communication/Information Technologies (including Software)

URL https://github.com/granule-project/granule-project.github.io/blob/master/synth-tutorial.md
 
Title Additional proofs and code for "Data-flow analyses as effects and graded monads" 
Description This deposit provides code and additional proofs associated to the paper "Data-flow analyses as effects and graded monads" appearing at FSCD 2020 (5th International Conference On Formal Structures for Computation and Deduction). extra-proofs.pdf provides additional proofs not included in the appendix of the published paper for space reasons. GradedMonad.agda provides further mechanised proofs, referred to from extra-proofs.pdf dataflow-effects-as-grades-fscd2020.zip provides the source code corresponding to Section 4.4 and Appendix B The code is hosted on GitHub as well: https://github.com/dorchard/dataflow-effects-as-grades This .zip corresponds to this release https://github.com/dorchard/dataflow-effects-as-grades/releases/tag/fscd2020 Unzip and see README.md for details on how to build and interact with this code 
Type Of Material Database/Collection of data 
Year Produced 2020 
Provided To Others? Yes  
Impact This deposit provides a library code and additional proofs associated to the paper "Data-flow analyses as effects and graded monads" appearing at FSCD 2020 (5th International Conference On Formal Structures for Computation and Deduction). This provided code can be reused for future works as a building block. 
URL https://zenodo.org/record/3784967
 
Title Granule programming language implementation 
Description In functional programming, regular types are a subset of algebraic data types formed from products and sums with their respective units. One can view regular types as forming a commutative semiring but where the usual axioms are isomorphisms rather than equalities. In the pearl, we show that regular types in a linear setting permit a useful notion of multiplicative inverse, allowing us to 'divide' one type by another. We begin with an exploration of the properties and applications of this construction, and this artifact includes examples in Haskell demonstrating its relevance to various topics from the literature including program calculation, Laurent polynomials, and derivatives of data types. Some examples from the paper require a richer setting than Haskell can offer; the artifact replays the first set of examples in Granule, while also presenting additional examples demonstrating further algebraic structure through linear functions that incorporate local side effects 
Type Of Material Computer model/algorithm 
Year Produced 2022 
Provided To Others? Yes  
Impact This was the artefact for a paper. The artefact won the Distinguished Artefact Award. The paper also won Distinguished Paper Award. 
URL https://drops.dagstuhl.de/opus/volltexte/2022/16199/
 
Description Academic collaboration on Graded Hoare Logic 
Organisation Boston University
Country United States 
Sector Academic/University 
PI Contribution Key academic contributions to a new paper, with roughly equal split between partners.
Collaborator Contribution Key academic contributions to a new paper, with roughly equal split between partners.
Impact Publication "Graded Hoare Logic and its Categorical Semantics" https://link.springer.com/chapter/10.1007/978-3-030-72019-3_9
Start Year 2020
 
Description Academic collaboration on Graded Hoare Logic 
Organisation National Institute of Informatics (NII)
Country Japan 
Sector Public 
PI Contribution Key academic contributions to a new paper, with roughly equal split between partners.
Collaborator Contribution Key academic contributions to a new paper, with roughly equal split between partners.
Impact Publication "Graded Hoare Logic and its Categorical Semantics" https://link.springer.com/chapter/10.1007/978-3-030-72019-3_9
Start Year 2020
 
Description Academic collaboration on Graded Hoare Logic 
Organisation Tokyo Institute of Technology
Country Japan 
Sector Academic/University 
PI Contribution Key academic contributions to a new paper, with roughly equal split between partners.
Collaborator Contribution Key academic contributions to a new paper, with roughly equal split between partners.
Impact Publication "Graded Hoare Logic and its Categorical Semantics" https://link.springer.com/chapter/10.1007/978-3-030-72019-3_9
Start Year 2020
 
Title Gerty 
Description A new research prototype language accompanying our ESOP 2021 on 'Graded Modal Dependent Type Theory' 
Type Of Technology Webtool/Application 
Year Produced 2021 
Open Source License? Yes  
Impact Demonstrates a novel type system and its practical implications which goes beyond the existing state-of-the-art. 
URL https://github.com/granule-project/gerty/
 
Title Granule - A modern functional programming language with graded linear and modal types 
Description Latest release of the Granule programming language which includes results from publications arising from this grant in the latter half of 2020, including our paper on Resource Program Synthesis for Graded Linear Types which is deployed here as a new program synthesis procedure. 
Type Of Technology Software 
Year Produced 2020 
Open Source License? Yes  
Impact This software release has been used by the team in various presentations, including in an invited talk to Augusta University, and on online end of year review video. 
URL https://github.com/granule-project/granule/releases/tag/v0.8.0.0
 
Description Bristol visit 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Invited research visit to Bristol to discuss research ideas of this project with the Bristol Programming languages group and deliver a research seminar.
Year(s) Of Engagement Activity 2021
 
Description Podcast 
Form Of Engagement Activity A broadcast e.g. TV/radio/film/podcast (other than news/press)
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Interviewed for a video podcast series hosted by the company 47 degrees; this gave a good opportunity to engage with industry and spread the ideas of this research. The video has been watched 319 times at the time of writing.
Year(s) Of Engagement Activity 2021
URL https://www.youtube.com/watch?v=6yswyRQGsRQ
 
Description Presentation at Huawei Programming Languages group 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Research talk to the local group but also streamed to the wider Huawei research teams and subsequently uploaded online:
Year(s) Of Engagement Activity 2022
URL https://www.bilibili.com/video/BV1DA4y1Z734/?zw&spm_id_from=888.80996.embed_old