SACRED-MA: Safe And seCure REmote Direct Memory Access

Lead Research Organisation: University of Surrey
Department Name: Computing Science

Abstract

Modern society depends on accessing and transferring vast quantities of data, at ever-increasing speeds. Technology giants such as Google invest billions of dollars every year into data centres across the world. Replicated data systems form the backbone of all cloud services; improving their reliability and performance impacts all cloud and big data services. The UK is currently the largest cloud market in Europe, with over £17 billion in cloud investment in 2020.

To meet our ever-growing need for rapid data transfer, RDMA (remote direct memory access) technologies enable next-generation infrastructures by allowing a machine to access (read/write) directly the memory of another machine across a network. Unlike traditional network protocol stacks such as TCP/IP, RDMA-enabled network interface cards (NICs) can bypass' an operating system kernel, ans are thus capable of wire-speed data transmission. RDMA technology has been available in supercomputing clusters since the mid 2000s, but had until recently remained an experimental feature in consumer and enterprise systems due to its cost. However, this changed recently with the availability of affordable NICs (e.g. those developed by our partner NVIDIA), and it is now possible to build distributed applications that challenge conventional design paradigms. For instance, one can leverage the additional throughput of RDMA to support concurrent front-end applications, surpassing sequential state-machine replication services used today. There is already a shift towards consumer-grade devices through the development of wireless RDMA and RDMA over 6G.

To unlock the potential of RDMA in enterprise and consumer systems, we must enable programmers to write safe and secure programs. However, the RDMA behaviour is currently documented through informal plain-text manuals [22] and examples, making its semantics vague and ambiguous. The programming models for RDMA are poorly understood, and there is no support for formal verification. This carries significant risks since RDMA enables writing directly into the memory of a remote machine. Moreover, RDMA programming is inherently challenging as it requires an understanding of the interaction between remote communication and local computation, i.e. how remote memory writes interact with the local memory writes of a given machine, potentially leading to data races and increasing the risk of safety and security bugs.