Session Types for Reliable Distributed Systems (STARDUST)

Lead Research Organisation: University of Glasgow
Department Name: School of Computing Science

Abstract

VISION

Distributed software systems are an essential part of the infrastructure of modern society. Such systems typically comprise diverse software components deployed across networks of hosts. Ensuring their reliability is challenging as software components must correctly communicate and synchronise with each other, and any of the hardware or software components may fail. Software failure and service outage are estimated to cost the world economy more than a trillion dollars annually.

Failures can occur at all levels of the system stack: hardware, operating system, network, software, and user. Here we focus on using advanced programming language technologies to enable the software level to better handle failures using a combination of fault prevention and fault tolerance. Specifically, we will combine the communication-structuring mechanism of session types with the scalability and fault-tolerance of actor-based software architectures.

Actor languages are based on independent processes (actors) communicating by asynchronous messages. Reliability is facilitated by actors having isolated state, and hence an actor can fail independently. Two key techniques for achieving reliability in actor languages are timeouts and supervision, and these are the main focus of STARDUST. Timeouts allow failures to be identified during execution, and failures are handled by establishing triggers and alternative behaviours within the code. An actor may supervise other actors, detecting failures and taking remedial action like restarting a failed actor.

Session types provide a way to specify and constrain the communication behaviour (protocol) between nodes in a system. A session type system excludes any non-conforming behaviour, perhaps statically (for fault prevention), dynamically, or a mixture of both (for fault tolerance). Several languages now have session-type support via libraries and tools.

By combining the strengths of actor languages and session types, we will develop a well-founded theory of reliable actor programming with clearly defined communication structures. Key aims are to deliver tools that provide lightweight support for developers, e.g. warn of potential issues, and to allow developers to continue to use established idioms. By doing so we aim to deliver a step change in the engineering of reliable distributed software systems.

Planned Impact

The STARDUST project will deliver a step change in the reliability of distributed systems. We will build new foundations to describe how distributed systems behave, enhancing session types to take account of failure. On this basis we will extend/verify existing programming languages - Erlang, Scala/Akka and ESL. We will engineer extensions to their tools and libraries, and thus support software engineers in implementing provably reliable systems.

Our foundations and tools will have direct impact on our industrial partners: Actyx AG, Erlang Solutions Ltd, Lightbend, Quviq AB, and Tata Consultancy Services). During the project we will integrate our libraries and tools into the specific systems of each partner.

The applicability of our techniques to real-world engineering and development practices within organisations will be central. The integration of our techniques into widely used open source Integrated Development Environments (IDEs) will facilitate adoption by practitioners, and by organisations of any scale beyond the project's consortium.

STARDUST will provide in-depth training activities in language engineering for undergraduate and masters project students, as discussed in our Pathways to Impact. Dissemination mechanisms, e.g. tutorials and lectures, will provide less deep training to a far larger audiences of developers and practitioners.

STARDUST will directly impact (present and future) practitioners in the UK ICT people pipeline, by providing new abstractions, methods and tools to understand, plan, and engineer reliable systems. The research staff will gain valuable programming language engineering expertise that will equip them to make significant technical and economic impacts in their future careers. Specifically they will gain expertise in the design of behavioural types, and the engineering of libraries and tools to support them. By applying the new technologies to industrial case studies they will gain insight into the characteristics and requirements of the application domains, and the challenges of applying the techniques at scale.

While we will focus on actor languages, in the longer term STARDUST will influence the development of other communication-oriented languages, such as Go.

Two of the three site leads are female (Yoshida and Bocchi). Both are active in events for female computer science students, researchers and school children, including London Hopper, Summer Codeathon and events at London girls' schools. We will continue to engage in such outreach activities to promote participation of females and disseminate empowering paradigms and tools to inspire and support them in their career planning.
 
Description The main achievement so far is the development of a type system for programming with mailboxes, as used in actor languages such as Erlang. This type system will provide a foundation for a language extension or analysis tool for Erlang, during the remainder of the project.
Exploitation Route We aim to produce software tools that can be used by Erlang programmers.
Sectors Digital/Communication/Information Technologies (including Software)