Predictable datacenter with low-latency

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Informatics

Abstract

Online services that power every aspect of today's life demand low and predictable latency. Performance predictability is a key requirement for high-performant applications in today's multi-tenant datacenters. Online services running in infrastructure datacenters need such predictability to satisfy application SLAs. Cloud datacenters require guaranteed performance to bound customer costs and spur adoption.
Practical constraints, however, require that these latency-sensitive services share datacenter networking infrastructure with other workloads.
The combined pressure of many concurrently running applications inside a datacenter leads to highly unpredictable network behavior, which has detrimental implications on network, and ultimately end-to-end, service latency.
To address this, this project will study new datacenter networking technologies that provide low latency not only in the average case, but also predictable latency for high quality-of-service.
The project will consider network topology, flow control, routing and protocol offload mechanisms to enable predictable network performance in tomorrow's datacenters.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
EP/N509644/1 01/10/2016 30/09/2021
1957062 Studentship EP/N509644/1 01/11/2017 30/04/2021 Mohammadreza Katebzadeh
 
Description Currently, we identified shortcomings in existing performance measurement tools used for modern networking technologies and show why they are unable to accurately assess the latency of datacenter switch. We introduced a novel performance measurement tool that provides a highly accurate latency measurement for modern datacenter switches. Using the precise measurements enabled by our tool, we analyzed the performance of a datacenter switch. We conclude that better mechanisms are needed to provide performance isolation in a datacenter.
Exploitation Route Today's cloud datacenters are responsible for a lot of digital services. Any improvement in designing efficient (in terms of performance, power and cost) datacenters leads to more efficient digital services. Hence, the outcome of our work gives datacenter providers an opportunity to rethink about their infrastructure and tune the underlying systems to improve their services.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://homepages.inf.ed.ac.uk/bgrot/pubs/RPERF_ISPASS20.pdf
 
Description MSR 
Organisation Microsoft Research
Department Microsoft Research Cambridge
Country United Kingdom 
Sector Private 
PI Contribution We explored the possible solutions to improve the predictability of Datacenter Networks. The findings of our research are useful for designing better Datacenter Networks.
Collaborator Contribution The collaboration gave us the opportunity to be informed about internal parts of a real datacenter to get better insight about the research. we also got a huge benefit from our collaborator's experiences.
Impact We published a paper entitled: Evaluation of an InfiniBand switch: Choose Latency or Bandwidth, but Not Both
Start Year 2017
 
Title RPerf 
Description We develop a performance measurement tool for RDMA-based networks, RPerf, that is capable of precisely measuring the IB switch performance without hardware support. 
Type Of Technology Systems, Materials & Instrumental Engineering 
Year Produced 2019 
Impact Using RPerf, we benchmarked a rack-scale IB cluster in isolated and mixed- traffic scenarios. Our key finding is that the evaluated switch can provide either low latency or high bandwidth, but not both simultaneously in a mixed-traffic scenario. We evaluated several options to improve the latency-bandwidth trade-off and demonstrated that none are ideal. 
URL http://homepages.inf.ed.ac.uk/bgrot/pubs/RPERF_ISPASS20.pdf