The DiRAC 2.5x Facility

Lead Research Organisation: University of Edinburgh
Department Name: Sch of Physics and Astronomy

Abstract

hysicists across the astronomy, nuclear and particle physics communities are focussed
on understanding how the Universe works at a very fundamental level. The distance scales
with which they work vary by 50 orders of magnitude from the smallest distances probed
by experiments at the Large Hadron Collider, deep within the atomic
nucleus, to the largest scale galaxy clusters discovered out in space. The Science challenges,
however, are linked through questions such as: How did the Universe begin and how is it evolving?
and What are the fundamental constituents and fabric of the Universe and how do they interact?

Progress requires new astronomical observations and experimental data but also
new theoretical insights. Theoretical understanding comes increasingly from large-scale
computations that allow us to confront the consequences of our theories very accurately
with the data or allow us to interrogate the data in detail to extract information that has
impact on our theories. These computations test the fastest computers that we have and
push the boundaries of technology in this sector. They also provide an excellent
environment for training students in state-of-the-art techniques for code optimisation and
data mining and visualisation.

The DiRAC2 HPC facility has been operating since 2012, providing computing resources for theoretical research
in all areas of particle physics, astronomy, cosmology and nuclear physics supported by STFC. It is a highly productive
facility, generating more than 250 papers annually in international, peer-reviewed journals. However, the DiRAC2 hardware is now at least 5 years old and is therefore at significant risk of failure. The loss of any one of the DiRAC2 services
would have a potentially disastrous impact on the research communities which rely on it to deliver their scientific research. The main
purpose of the requested funding for the DiRAC2.5x project is to replace the ageing DiRAC2 hardware at Durham, Edinburgh and Leicester
while taking advantage of recent hardware advances to provide some new capabilities (e.g. i/o acceleration using flash storage) as prototypes for
the proposed DiRAC3 services.

DiRAC2.5x builds on the success of the DiRAC HPC facility and will provide the resources needed to support cutting-edge research
during 2018 in all areas of science supported by STFC. While the funding is required to "keep the lights on", the science programme will continue to be
world-leading. Examples of the projects which will benefit from this investment include:

(i) lattice quantum chromodynamics (QCD) calculations of the properties of fundamental particles from first principles;
(ii) improving the potential of experiments at CERN's Large Hadron Collider for discovery of new physics by increasing the accuracy of theoretical predictions for rare processes involving the fundamental constituents of matter known as quarks;
(iii) simulations of the merger of pairs of black holes which generate gravitational waves such as those recently discovered by the LIGO consortium;
(iv) the most realistic simulations to date of the formation and evolution of galaxies in the Universe;
(v) the accretion of gas onto supermassive black holes, the most efficient means of extracting energy from matter and the engine which drives galaxy formation and evolution;
(vi) new models of our own Milky Way galaxy calibrated using new data from the European Space Agency's GAIA satellite;
(vii) detailed simulations of the interior of the sun and of planetary interiors;
(viii) the formation of stars in clusters - for the first time it will be possible to follow the formation of stars many times more massive than the sun.

Planned Impact

The anticipated impact of the DiRAC2.5x HPC facility aligns closely with the recently published UK Industrial Strategy. As such, many of our key impacts will be driven by
our engagements with industry. Each service provider for DiRAC2.5x has a local industrial strategy to deliver increased levels of industrial returns over the next three years.
The "Pathways to impact" document which is attached to this proposal describes the overall industrial strategy for DiRAC2.5x, including our strategic goals and key performance indicators.

Publications

10 25 50
 
Description We have calculated the hadronic vacuum polarisation contribution to the anomalous magnetic moment of the muon.
There is a tension between theory and experiment with on-going experiments in the USA and in Japan. This has the potential
to indicate new physics if, as both theory and experiment race to reduce their errors, the discrepancy persists and becomes more significant.
Exploitation Route No
Sectors Other

 
Description We worked on MPI performance with an IAAA collaboration with Hewlett Packard Enterprise. This successfully influence Intel to improve the MPI stack for their Omnipath products.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software),Electronics
 
Description Intel ATI codesign project 
Organisation Intel Corporation
Department Intel Corporation (Jones Farm)
Country United States 
Sector Private 
PI Contribution I lead the Alan Turing Institute / Intel codesign project. Two Intel engineers are placed in Edinburgh to work with me. We have developed profiling tools for machine learning packages and led to insight into the architectural requirements of deep learning that have been propagated to Intel. The tool has been used to study reduced precision floating point formats, and specific new instruction set extensions have been proposed to Intel.
Collaborator Contribution Two Intel engineers are placed in Edinburgh to work with me. We have developed profiling tools for machine learning packages and led to insight into the architectural requirements of deep learning that have been propagated to Intel. The tool has been used to study reduced precision floating point formats, and specific new instruction set extensions have been proposed to Intel.
Impact Paper with Intel MPI team. Multidisciplinary, particle physics, computing science and electronic engineering.
Start Year 2016
 
Description Intel IPAG QCD codesign project 
Organisation Intel Corporation
Department Intel Corporation (Jones Farm)
Country United States 
Sector Private 
PI Contribution We have collaborated with Intel corporation since 2014 with $720k of total direct funding, starting initially as an Intel parallel computing centre, and expanding to direct close collaboration with Intel Pathfinding and Architecture Group.
Collaborator Contribution We have performed detailed optimisation of QCD codes (Wilson, Domain Wall, Staggered) on Intel many core architectures. We have investigated the memory system and interconnect performance, particularly on Intel's latest interconnect hardware called Omnipath. We found serious performance issues and worked with Intel to plan a solution and this has been verified and is available as beta software. It will reach general availability in the Intel MPI 2019 release, and allow threaded concurrent communications in MPI for the first time. A joint paper on the resolution to this was written with the Intel MPI team, and the application of the same QCD programming techniques to machine learning gradient reduction was applied in the paper to the Baidu Research all reduce library, demonstrating a 10x gain for this critical step in machine learning in clustered environments. We are also working with Intel verifying future architectures that will deliver the exascale performance in 2021.
Impact We have performed detailed optimisation of QCD codes (Wilson, Domain Wall, Staggered) on Intel many core architectures. We have investigated the memory system and interconnect performance, particularly on Intel's latest interconnect hardware called Omnipath. We found serious performance issues and worked with Intel to plan a solution and this has been verified and is available as beta software. It will reach general availability in the Intel MPI 2019 release, and allow threaded concurrent communications in MPI for the first time. A joint paper on the resolution to this was written with the Intel MPI team, and the application of the same QCD programming techniques to machine learning gradient reduction was applied in the paper to the Baidu Research all reduce library, demonstrating a 10x gain for this critical step in machine learning in clustered environments.
Start Year 2014
 
Title FP16-S7E8 MIXED PRECISION FOR DEEP LEARNING AND OTHER ALGORITHMS 
Description We demonstrated that a new non-IEEE 16 bit floating point format is the optimal choice for machine learning training and proposed instructions. 
IP Reference US20190042544 
Protection Patent application published
Year Protection Granted 2019
Licensed Yes
Impact We demonstrated that a new non-IEEE 16 bit floating point format is the optimal choice for machine learning training and proposed instructions. Intel filed this with US patent office. This IP is owned by Intel under the terms of the Intel Turing strategic partnership contract. As a co-inventor I have been named on the patent application. The proposed format has been announced as planned for use in future Intel architectures. This collaboration with Turing emerged out of an investment in Edinburgh by Intel Pathfinding and Architecture Group in codesign with lattice gauge theory simulations. Intel hired DiRAC RSE's Kashyap and Lepper and placed them in Edinburgh to work with me on Machine Learning codesign through the Turing programme.
 
Description Panel discussion on machine learning and future HPC Intel HPC developer conference. 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Invited as panel expert on future of HPC and machine learning by Intel at their annual HPC developer conference attended widely by Industry and research lab sector. Note, Boyle second from left in photograph on the Intel web page linked below.
Year(s) Of Engagement Activity 2017
URL https://www.intel.com/content/www/us/en/events/hpcdevcon/overview.html
 
Description Talk on MPI optimisation on Intel stand at Supercomputing 2017 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Decision influence: I Influenced Intel to modify, update and release optimisations to their MPI library for the Intel Omnipath interconnect. Coauthored a paper on this topic.
Year(s) Of Engagement Activity 2017
URL http://inspirehep.net/record/1636204
 
Description Talks presented on this activity at Intel Xeon Phi User Group conferences. 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact Presented work in several Intel Xeon Phi User Group meetings.
Year(s) Of Engagement Activity 2016,2017