📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

OptoCloud: Ultra-fast optically interconnected heterogeneous Data Centers

Lead Research Organisation: UNIVERSITY COLLEGE LONDON
Department Name: Electronic and Electrical Engineering

Abstract

The majority of human activities, including transport, Internet, banking, public health and entertainment, depend on Data Centers. Cloud traffic is forecasted to grow exponentially and account for 95% of global traffic. In 2015, the total power consumption of data centers worldwide was higher than the national power consumption of the UK and is predicted to increase up to 15-times by 2030.

Currently, all data center networks are formed based on hierarchical electronic packet switched networks; however, they can't keep up with demand creating a ever increasing gap between data growth and Moore's Law. So, while compute node power, measured in flop/s, has increased by 65 times in the last 18 years, the node communication bandwidth has only increased by 4.8 times and the bytes communicated per flop have decreased 8 times. This creates a computation to communication wall, minimizing data movement and constraining applications to operate locally. In addition, these systems also suffer from very high median latencies, O(100microseconds) (order of 100microseconds), and 99.9-percentile tail latencies, O(100ms), to the detriment of the system and application performance.

The OptoCloud fellowship aims to design and build an energy efficient, cost effective, scalable, single hop, and nanosecond speed optical circuit switched network. This will interconnect heterogeneous systems made of servers, CPUs, accelerators, neuromorphic processors, memory elements, storage to support different parts (rack, end-of-row) and sizes of data centers (small-medium size ~10-100,000 to ~1,000,000 server farm). Crucially, the network aims to offer zero data loss, without in-network a) buffering, b) active switching and routing, and c) network header addressing and processing to minimize complexity, and to consume very low power. Furthermore, the system also will inherently support 1-to-1, 1-to-N, N-to-N and N-to-1 connectivity in a synchronous manner without the need for data replication for multi/broad -casting, currently not possible. This is key to support diverse workloads such as storage caching, large-scale database lookups, training distributed deep neural networks, parallel computing that use communication primitives such as allreduce, broadcast and reduce, gather and scatter, all-to-all among others.

To achieve these, OptoCloud will explore the fundamental challenges of sub-nanosecond optical switching, near receiver-less low-power transceivers and nanosecond scheduling able to reconfigure circuits and shape IT and network topologies every 10s-100s of nanoseconds. It aims to offer orders of magnitude improvement in a) switching, b) scheduling and network topology re-configuration, c) power consumption, d) medium and tail latency and finally e) throughput with zero data loss.

The PI will work with the PDRAs, PhD students, industrial partners (Microsoft, Finisar, Xilinx, Sumitomo Electric), as well as universities (Columbia and National Technical University of Athens) and form a unique compute and optical network ecosystem to methodologically answer fundamental questions while reflecting all necessary requirements on the proposed concepts, and rigorously evaluating developed technologies using industrial driven use case scenarios.

Planned Impact

The technologies proposed will provide the means for the design and implementation of a new form of computer and network architecture. This is the heterogeneous and disaggregated Data Center system where the network technologies proposed at its core will unlock its potential to deliver unparalleled modularity and performance. The proposed technologies can support the increasing data volume, diversity and unpredictability of connected computing systems while reducing power consumption and CO2 footprint. It will enable accelerated creation and innovation of new services and applications as well as solve scientific problems currently not possible due to the rigid data centre and high performance computer architecture. This will benefit everyone who uses and relies on networked technologies.

All major communication and computing stakeholders will benefit from the fellowship results.

*Technology manufacturers and vendors: The results of the fellowship will be invaluable in designing the equipment of the future to maximize performance, flexibility and programmability by benefiting from the fusion of photonic and electronic systems. The project partners Finisar, Xilinx, and Sumitomo Electric will be the most immediate beneficiaries but others will benefit.

*Data Center and High Performance Computing operators: Using the fellowship, heterogeneous reconfigurable data center architectures can be reconfigured millions of times per second to deliver maximum utilization, best serve diverse workloads and unlock the ability to perform distributed parallel computation and distributed deep neural network tasks at scale. Microsoft, a partner of this fellowship, will be a direct beneficiary, and others will follow.

*Creation of SMEs: The fellowship will stimulate the generation of future business opportunities by creating a new sector of disaggregated and heterogeneous computer and network architecture technologies. This will allow the creation of a range of SMEs that can create revenues either through the development and licensing of software/hardware function modules or delivering complete hardware solutions. The resulting business opportunities will contribute to job creation and economic prosperity.

*Impact beyond ICT: The principles, concepts and techniques developed are directly transferable to other sectors within and beyond ICT that use and benefit from highly modular computing systems. This includes but not limited to 5G and beyond networks, Internet of Things, smart cities, satellite, High Performance Computing, consumer electronics, embedded and distributed systems, robotics, fundamental engineering, manufacturing, energy, automotive, consumer electronics and health.

*Academic and research community: The fellowship can play a key role towards the realization of a new philosophy of using optical and scheduling technologies to pool together functional modules to form complete computing and network systems. This inherently creates a new research field that will stimulate fundamental rethinking on the design and operation of systems and networks as well as the creation of new programming models and application design. Columbia University and NTUA are partners, so involved directly, but the wider community will also benefit through widespread dissemination.
In collaboration with UCL's Business and industrial partners I will take advantage of existing expertise for the communication, protection and exploitation of the results.

Publications

10 25 50
publication icon
Alkharsan H. (2022) Optimal and Low Complexity Control of SOA-Based Optical Switching with Particle Swarm Optimisation in 2022 European Conference on Optical Communication, ECOC 2022

publication icon
Benjamin J.L. (2022) Traffic Tolerance of Nanosecond Scheduling on Optical Circuit Switched Data Center Network in Optics InfoBase Conference Papers

publication icon
Benjamin J.L. (2021) Benchmarking packet-granular OCS network scheduling for data center traffic traces in Optics InfoBase Conference Papers

publication icon
Benjamin J.L. (2022) Traffic Tolerance of Nanosecond Scheduling on Optical Circuit Switched Data Center Network in 2022 Optical Fiber Communications Conference and Exhibition, OFC 2022 - Proceedings

publication icon
Luo R. (2022) Message Passing: Towards Low-Complexity, Global Optimal Routing and Wavelength Assignment Solutions for Optical Networks in 2022 Optical Fiber Communications Conference and Exhibition, OFC 2022 - Proceedings

publication icon
Mishra V (2021) MONet: heterogeneous Memory over Optical Network for large-scale data center resource disaggregation in Journal of Optical Communications and Networking

 
Description We have developed a disruptive way to replace all electronic packet switches in Cloud Data Centers, High Performance Systems and Machine Learning systems with fast optical circuit switches. This allows the network performance to increase by 20 times and reduce the power consumption by 40 times.
Exploitation Route We have filed a number of patents, one was licenced to H+S Polatis and a number were licences to the UCL spinout Oriole Networks that will commercialise the research developed.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description One patent has been licenced to H+S Polatis is already used to significantly increase the performance and manufacturing efficiency of optical switches designed in UK. UCL spinout Oriole Networks announces one of the UK's largest seed funding rounds of recent years, co-led by a strong investor syndicate. Oriole Networks raises a £10m seed round to build AI data centres out of light. Oriole uses light to directly connect thousands of AI GPUs to create huge super-brains, enabling Large Language Models to be trained up to a hundred times faster and AI inference significantly accelerated, for a fraction of the network power. Oriole brings together decades of world-leading research from UCL with a team of industry veterans experienced in university scale-ups. UCL spinout Oriole Networks was created in 2023 by four founders, Professor George Zervas, James Regan, Alessandro Ottino and Joshua Benjamin, with IP licensed through UCL's technology transfer company UCLB. UCL scientists George, Alessandro and Joshua had found a way to use light to connect thousands of AI GPUs directly to each other, resulting in much higher performance. James Regan already had a track record of building successful tech companies from university spinouts, having spun out EFFECT Photonics and built it to $0.5B.
First Year Of Impact 2023
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal

Economic

 
Description Distributed Quantum Computing and Applications
Amount £3,049,365 (GBP)
Funding ID EP/W032643/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 03/2022 
End 03/2026
 
Description Dynamos - DYNAMIC AND RECONFIGURABLE DATA CENTRE NETWORKS WITH MODULAR OPTICAL SUBSYSTEMS
Amount £5,400,000 (GBP)
Funding ID 10038802 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 07/2022 
End 07/2026
 
Description The quantum data centre of the future
Amount £8,918,816 (GBP)
Funding ID 10004793 
Organisation Innovate UK 
Sector Public
Country United Kingdom
Start 03/2022 
End 02/2025
 
Title TrafPy 
Description Tool to generate network traffic data for reproducability purposes. 
Type Of Material Data analysis technique 
Year Produced 2021 
Provided To Others? Yes  
Impact Used to conduct research across a team of people. 
URL https://github.com/cwfparsonson/trafpy
 
Description Huber+suhner Polatis 
Organisation Polatis
Country United Kingdom 
Sector Private 
PI Contribution Designed multi-core fibre switch in 2019-2020. Designed and demonstrated Data Center network using optical switches during the same period. Polatis is a project partner.
Collaborator Contribution Provided optical switches and some parameters of the switch constraints.
Impact Numerous papers in top conferences and journals.
Start Year 2007
 
Description Microsoft Collaboration on Distributed Deep Learning 
Organisation Microsoft Research
Department Microsoft Research Cambridge
Country United Kingdom 
Sector Private 
PI Contribution Work on developing optical switched interconnects and analytical models to design and operate AI-based computing systems.
Collaborator Contribution Information on the Cloud provider requirements and processor profiler.
Impact Not yet outputs. We are working on a potential patent and research paper.
Start Year 2019
 
Description Optical switching and networking for Quantum and Classical Data Centres 
Organisation BT Group
Department BT Research
Country United Kingdom 
Sector Private 
PI Contribution This is a EPSRC iCASE funding were we develop optical switching technologies for quantum computing
Collaborator Contribution Input of industrial requirements and specifications.
Impact None yet.
Start Year 2021
 
Description Sumitomo Electric on Multi-Core Fibre networks 
Organisation Sumitomo Corporation
Country Japan 
Sector Private 
PI Contribution We extensively characterized a multi-core fibre and modelled its behaviour.
Collaborator Contribution They provided 4 spools of multi-core fibre.
Impact Published joint paper.
Start Year 2018
 
Description Xilinx 
Organisation Xilinx Research
Country United States 
Sector Private 
PI Contribution Provided insight on technologies pioneered by my researchers.
Collaborator Contribution Xilinx Labs hosted two of my researchers in San Jose for one week and provided in depth training and knowledge on their latest solutions under NDA. They provided access to latest software tools and aim also to donate up to 2 high performance development platforms.
Impact Expanded the collaboration with other Xilinx research labs with closer proximity and strong interests in our research. The Xilinx labs in Dublin, Ireland were keen to support a H2020 ITN research proposal that I participated.
Start Year 2014
 
Title MPI operations 
Description Collective operations (scatter-reduce, all-gather, all-reduce, broadcast, all-to-all, etc.) among computing nodes that minimize the number of communication steps to just four. Inventors: Georgios Zervas, Alessandro Ottino, Joshua Benjamin 
IP Reference 2217578.0 
Protection Patent / Patent application
Year Protection Granted 2022
Licensed Commercial In Confidence
Impact The collective operations can speed up parallel and distributed tasks by x10. The network overhead can reduce from 95% to less than 1% increasing the computational efficiency from 10-20% up to 95%.
 
Title Methods and apparatus for optical fibre design and production 
Description The present technique relates to the field of design and production of multi-core optical fibres. Multi-core fibres can provide significantly improved capacity relative to singlecore fibres. However, the design parameters (for example the composition, number and geometry of the fibre cores) and corresponding transmission properties (for example signal10 to-noise ratio and level of crosstalk between cores) relate to each other in many nonlinear ways, both directly and indirectly. The design of a multi-core fibre is thus complex. Some methods for fibre design utilise a "brute force" approach, for example by modelling a large number of combinations of design parameters. However, this is inefficient, and can lead to optical fibres with suboptimal transmission properties. There is thus a desire 15 for improved methods and apparatus for designing and producing multi-core optical fibres. 
IP Reference  
Protection Patent application published
Year Protection Granted 2021
Licensed No
Impact Lead to early commercialization funding and collaboration with fibre manufacturers on improving the design tool.
 
Title Network Architecture 
Description Modular optical network architectures for data center networks for higher performance and very low power networking. 
IP Reference GB2217579.8 
Protection Patent / Patent application
Year Protection Granted 2022
Licensed Commercial In Confidence
Impact It can eliminate the use of electronic switching. It leads to x40 reduction in power consumption, x20 in network performance.
 
Title Network Scheduling Method and Apparatus 
Description Recent growth in volume of on-demand data has been exponential. Much of the data is stored/processed in and accessed from data centers in which large numbers of servers are connected in a network. It is a continual challenge to scale these and other networks and still handle the network traffic, while managing cost, power consumption, latency and so forth. A fundamental challenge is how to schedule and re-configure network resources (switches, buffers, paths, etc.) in nanoseconds in order to transfer information data from any to any network termination point in a deterministic way. Processing the traffic demands in order to allocate network resources, such as path, timeslot and wavelength, is a computationally hard problem, and needs to be handled on nanosecond timescales. The present invention has been devised in view of the above problems. The network scheduling method and apparatus, with a novel scheduling architecture, outperforms previous network schedule processing Units. Spatial parallelism is carefully increased in the architecture to increase decisions per cycle and reduce the required number of cycles per epoch, enabling a reduced execution time. An example of implementing the invention is able to achieve higher throughput, more deterministic latency and a smaller number of clock cycles per epoch. Further optional features of the invention are defined in the dependent claims. 
IP Reference 2318669.5 
Protection Patent / Patent application
Year Protection Granted 2023
Licensed Commercial In Confidence
Impact This IP has been licenced to Oriole Networks, a spin-out company, that George Zervas the PI of the fellowship is a co-founder and CTO. UCL spinout Oriole Networks is about to announce one of the UK's largest seed funding rounds of recent years, £10M, co-led by a strong investor syndicate. UCL spinout Oriole Networks was created in 2023 by four founders, Professor George Zervas, James Regan, Alessandro Ottino and Joshua Benjamin, with IP licensed through UCL's technology transfer company UCLB. This Fellowship directly derived the IP. Oriole uses light to directly connect thousands of AI GPUs to create huge super-brains, enabling Large Language Models to be trained up to a hundred times faster and AI inference significantly accelerated, for a fraction of the network power. Oriole brings together decades of world-leading research from UCL with a team of industry veterans experienced in university scale-ups.
 
Title PID tuning 
Description A one-shot, offline, reinforcement learning method to identify optimal PID parameters of N^2 piezo-electric actuators of a beam steering free space optical switch. The inventors are Georgios Zervas and Zacharaya Zhabka. 
IP Reference GB2210433.5 
Protection Patent / Patent application
Year Protection Granted 2022
Licensed Yes
Impact We have licensed the IP to Huber Suhner Polatis. Polatis has already used it to improve the performance of their switches in terms of a) increased switching speed, b) increased resilience to thermal effects, c) lower insertion loss as well as significantly increased fabrication yield and manufacturing efficiency.
 
Title Network Traffic Generation Tool 
Description Data related to communication networks is often sensitve and proprietary. Consequently, many networking academic papers are published without open-accessing the network traffic data that was used to obtain the results, and when they are published the datasets are often too limited for data-hungry applications such as reinforcement learning. In an effort to aid reproducibility, some authors release characteristic distributions which broadly describe the underlying data. However, these distributions are often not analytically described and may not fall under the classic 'named' distributions (Gaussian, log-normal, Pareto etc.). As a result, other researchers find themselves using unrealistically simple uniform traffic distributions or their own distributions which are difficult to universally benchmark. This project saw the development of an open-access network traffic generation tool for (1) standardising the traffic patterns used to benchmark networking systems, and (2) enabling rapid and easy replication of literature distributions even in the absence of raw open-access data. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact This open source software is now used by other scientists and researchers. This work was also published and got the 2022 Fabio Neri Best Journal Award. 
URL https://trafpy.readthedocs.io/en/latest/
 
Company Name Oriole Networks 
Description Oriole Networks develops low energy technologies to improve performance of machine learning and high performance computing systems. 
Year Established 2023 
Impact Oriole Networks, a new company that revolutionises the performance of AI systems and data centres, has successfully raised £10 million in seed funding. The round was co-led by UCL Technology Fund, Clean Growth Fund, XTX Ventures and Dorilton Ventures, with support from Innovate UK Investor Partnership. Since the initial funding round, Oriole Networks has raised a further £22 million in funding, with all existing investors re-investing. The total funding raised in its first year was £35 million. Data centres have played a critical role in the proliferation of SaaS companies and are also supporting the predicted platform shift towards AI. However, the increasing demands placed on data centres and the approach underlying their networking are leading to systemic problems and unsustainable power consumption. With Oriole's novel approach, Large Language Models can be trained up to a hundred times faster, whilst consuming only a tiny fraction of the power. As a result, machine learning algorithms can run with a thousandth of the latency, revolutionising time critical tasks such as algorithmic trading, and speeding up AI adoption and AI algorithmic progress. As the demand for compute continues to increase, it is critical to find new solutions that can address these challenges in a sustainable and carbon efficient manner. Our novel approach to harness the power of light has already demonstrated significant technical performance improvements, up to 100 times speed up in completion time and 40 times improvements in energy consumption.
Website http://oriolenetworks.com
 
Description Invited talk at STW2021 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Invited talk at STW2021, a Huawei-organized conference. I delivered a talk on published work on sub-nanosecond optical switching for data centers and high performance computing.
Year(s) Of Engagement Activity 2021
 
Description Invited talk at TOP Conference 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact I was invited to present the OptoCloud Fellowship program. Over 100 people from UK and abroad attended that span across telecom/datacom industries as well as academic and research institutions. There was lots of interest on my work and numerous meetings were arranged for collaboration and potential exploitation paths of the work.
Year(s) Of Engagement Activity 2022
URL https://topconference.com/
 
Description Poster presentation on multi-core fiber design using artificial intelligence and machine learning 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact The presentation covered our work on AI/ML methods and the design of novel multi-core fibres that can increase the bandwidth density and capacity of optical fiber interconnects in cloud data center networks.
Year(s) Of Engagement Activity 2022
URL https://topconference.com/
 
Description Poster presentation on optical networks for distributed machine learning systems. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Poster presentation of work related to optical networking and collective operations for parallel and distributed computing including machine learning systems.
Year(s) Of Engagement Activity 2022
URL https://topconference.com/
 
Description Poster presentation on ultra-fast hardware based control of optical circuit switching for cloud data centers 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Poster presentation on TDM/WDM scheduling using hardware-based methods for large-scale cloud data center systems.
Year(s) Of Engagement Activity 2022
URL https://topconference.com/
 
Description Talk/seminar on fast optical switching for intra-satelite communications. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact I delivered a seminar on optical switching technologies we developed for cloud data centers and how these can be used to support networking requirements of satellites for low earth orbit internet applciations.
Year(s) Of Engagement Activity 2021