Interconnection Networks: Practice unites with Theory (INPUT)

Lead Research Organisation: Durham University
Department Name: Engineering and Computing Sciences

Abstract

An interconnection network is a mechanism by which different components of a (usually large) computer system communicate. The design of interconnection networks is not straightforward as there are many issues to take into account, such as: the topology (that is, the basic pattern of connectivity of the components); the routing algorithms (that are used in order to transfer messages around the network); the methods of flow-control (that are used in order to deal with congestion when different network packets, for example, request limited hardward resources); and the methods of switching (the way in which once a route for a message has been selected, the message is physically transferred from component to component throughout the network). The whole area is an incredible mix of hardware, software and mathematics, and employs principles from both computer science and engineering.

The field of interconnection networks covers a wide variety of different communications subsystems, from relatively small, very local on-chip networks, through supercomputers and clusters, and on to vast, remote and evolving networks such as those implemented in grid and cloud computing (upon which so much of the ubiquitous computing in modern society depends). Although many interconnection network principles apply universally, the varying domain characteristics and intended applications lead to a number of differences. The full extent of these differences is impossible to cover here but one is the scale of the interconnection network. On-chip networks are relatively small - currently tens of nodes (though there are efforts to scale up to a thousand nodes), whilst the number of nodes used in data centre networks or supercomputers can be hundreds of thousands. The research in this proposal aims to improve the design of interconnection networks for large-scale systems such as those employed in supercomputers, clusters and data centres by developing closer links between the mathematics behind interconnection networks and the practical construction of interconnection networks.

The practical construction of, for example, a supercomputer that might fill a large room is immensely complex, with a multitude of wires, cables, boards, chips, racks and cabinets all conjoined so that all of the computational power of such a system can be employed to yield efficient solutions to problems on massive data sets. Of course, such a supercomputer has to be programmed so that each of its computational elements knows exactly what to do and when to do it and so that the individual computational results can be rapidly compiled into a solution of the underlying problem. The design of such a hardware and software system is an incredible feat of engineering. Mathematicians abstract the essential interconnection network within such a supercomputer as a graph; that is, as a set of vertices, pairs of which are joined by edges. Whilst this may seem an imprecise abstraction, one can use graph-theoretic properties in order to design interconnection network topologies which possess many properties one would wish of an interconnection network. Graph properties relating to, for example, symmetry, shortest-paths, connectivity, Hamiltonicity, recursive decomposability and embeddings prove to be extremely important in securing good practical properties for interconnection networks. However, up until now there has been a considerable gap between the mathematical theory on the one hand and practical interconnection network performance on the other. Our research proposal aims to narrow this gap by providing a closer link between the theory and practice of interconnection networks, with the ultimate goal being techniques by which we can theoretically design an interconnection network and be sure of its resulting practical properties when built and used.

Planned Impact

The objective of this proposal is the design of advanced interconnection networks that can be implemented in distributed-memory computing systems of varied nature (supercomputers, clusters and data centres). The proposed research addresses the dichotomy between theory and practice by employing a holistic approach to establish synergies between a highly theoretical research group and a practicality-focused research group. The knowledge which will be gained over the course of the project will have relevance in the wide range of markets which are sustained by large-scale computing systems (banking, research, infrastructure supporting smartphones, cloud-based services such as ebay, amazon, facebook, etc., are just a few examples), and when implemented in commercial systems could have a major economic impact.

The potential economic impact of these advanced interconnection networks is clear given the high expenditures linked to these kind of systems both in terms of initial investment (tens of millions pounds for a not-so-large system composed of 10K nodes; BlueGene/Q has a estimate purchase cost of more than $1.5 billion; Google is believed to invest several billions of dollars in its computing infrastructure, yearly) and in running costs (in terms of power consumption, in the order of MWatts, which translates into several millions of pounds per year). If we manage to improve the performance and efficiency of these systems by providing an improved interconnection network by, say a rather conservative 5%, we could be talking about savings in the order of one million pounds per 10K nodes.

Looking further into the future, the use of concepts developed in this project in the development of increasingly larger-scale computing systems (perhaps millions of nodes) could have a huge economic impact, but one which is currently very difficult to quantify.

Most industry and commerce throughout the UK relies on large computing systems (mostly, data centres) to sustain different parts of their activities (big-data analytics, databases, e-commerce, process analysis, etc.). An increase in achievable performance will reduce the size of their computing infrastructures and, hence, their costs. In addition they will benefit from reduced power bills as a result of the increase in terms of energy-efficiency. In general, these cost reductions will mean increased profits and/or competitiveness which in turn will boost the UK economy.

Similarly, a significant part of the scientific community relies on computation (supercomputers and more commonly clusters) to either obtain results from simulation-based experimentation or analyse data from real-world experiments. Although the cost reductions (as discussed above) can still be useful in this scenario, we feel that the improvement in terms of performance will be much more beneficial: higher performance will translate into the faster production of results which in turn will increase the pace at which science advances. The research power of the UK scientific community could well be affected positively by the outcomes of this project.

The development of improved energy-efficient computing systems can help reduce the total electricity usage of the UK, assisting the efforts of the Government to reach the legally binding target of a 34% reduction in terms of CO2 emissions by 2023 and reducing our dependence on non-UK generated electric power. In addition, these outcomes will help the UK to demonstrate its determination to deal with current concerns about climate change.

The wider public will benefit from the increase of competitiveness (as pointed out above) of UK industry which may be translated into better, cheaper services. The public would also benefit from improved scientific education as a result of our outreach activities.

Publications

10 25 50
 
Description The high-level objective of the research was to incorporate elements of theory and practice so as to further research on interconnection networks. These objectives have been met and we have contributed significant new results on server-centric data centre networks, all of them supported by research papers published in top journals and/or conferences. Our primary contributions involve: optimal routing algorithms in the data centre network DPillar; significantly improved routing algorithms in the data centre networks HCN, BCN, DCell, FiConn, and recursively defined networks in general (in terms of hop-length, fault-tolerance, load-balancing, latency, and throughput); a new generic paradigm for the construction of dual-port data centre networks, namely stellar networks; and a flow-based simulator INRFlow that is geared specifically towards flow-based simulation in data centres which has
been essential to carry out all the evaluation work. There have also been contributions in general interconnection networks and switch-centric data centre networks: for the former, combinatorial properties relating to Hamiltonicity in multiswapped networks have been established; and for the latter, combinatorial design theory has been used to design improved switch-centric data centre networks. The interaction between two groups of researchers, one theoretical and one more applied, has been massively successful and a long-term collaboration has been established. The unique skill sets of both teams have been absolutely necessary to undertake the research.
Exploitation Route There is potential for our results to be implemented in server-centric data centre networks when this paradigm moves from the university research domain and becomes more established within the builders of data centres. However, at present the switch-centric paradigm still dominates. INRFlow is currently being used as the core experimental environment
within other related research projects and is being extended accordingly.
Sectors Digital/Communication/Information Technologies (including Software)

 
Title INRFlow 
Description A novel, open-source flow-based simulator of interconnection networks. This 
Type Of Material Improvements to research infrastructure 
Year Produced 2015 
Provided To Others? Yes  
Impact This tool is being used as part of other network-related projects in which the group is involved (EU-funded EXANeSt and EuroEXA). Also it is being used by some of our partners in these projects. We also plan to use them for new research projects. 
URL https://gitlab.com/ExaNeSt/inrflow
 
Title INRFlow - update needed 
Description Interconnection network research flow-level evaluation framework 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact
 
Title INSEE 
Description Interconnection Networks Simulation and Evaluation Environment 
Type Of Technology Software 
Open Source License? Yes  
Impact INSEE is a lightweight tool that can be used for evaluation of interconnection networks. It has been used by several research groups worldwide. 
URL https://sourceforge.net/projects/insee/