Software development support for DiRAC
Lead Research Organisation:
University of Edinburgh
Department Name: Sch of Physics and Astronomy
Abstract
We propose to partially mitigate the shortfall in projected DiRAC computing resources over the next year, arising from the delay to DiRAC-3, by continuing a code optimisation effort to accelerate DiRAC's scientific codes on current architectures, and which will also give a sustained long term benefit to scientific throughput on future machines. The effort will be used to provide profiling, benchmarking and optimisation services to maximise scientific return from the DiRAC by ensuring the most efficient use of computing resources possible. This represents continuation of a successful first year of such effort supported by the DiRAC Technical Working Group grant and with additional support from the University of Edinburgh. The first year of effort focused on the representation of the DiRAC workload as benchmarks to assess future supercomputing needs, and there have been significant benefits to the community arising from this activity. These included discovering vectorisation, caching optimisation, and MPI parallelisation strategies for codes of the VIRGO, UKQCD, UKMHD and Planck cosmology collaborations as examples.
DiRAC contains some areas of key international strength in high performance software engineering. DiRAC's expertise has been the foundation upon which the Alan Turing Institute has secured an internationally unique strategic partnership with Intel's HPC architecture group. There is an good opportunity for STFC to enable the rest of the DiRAC community to exploit the knowledge base.
DiRAC contains some areas of key international strength in high performance software engineering. DiRAC's expertise has been the foundation upon which the Alan Turing Institute has secured an internationally unique strategic partnership with Intel's HPC architecture group. There is an good opportunity for STFC to enable the rest of the DiRAC community to exploit the knowledge base.
Planned Impact
The proposed work will disseminate best practice in High Performance Computing software engineering throughout the theoretical Particle Physics, Astronomy and Nuclear physics communities in the UK. This will encourage the development of skills in the research community that are highly transferrable to the computations required for simulations in advanced manufacturing and computer aided design. Some level of the young researchers in the community leave academia each year as a natural part of the turnover that occurs in university sector research, especially given the limited number of research positions. The transfer of a portion of the research communities to other jobs carrying these skills is a key element that feeds UK industry from the space and satellite sector, through hard engineering and the financial sectors.
The DIRAC project has close ties with the highest levels of research and development in the computing industry. We designed a key component on the IBM BlueGene/Q system, as part of a unique Academic-Industrial joint project with IBM Watson Laboratory. The design has powered leading scientific computing installations around the world, from laboratories in the USA, Japan, Italy and Germany to our own national Hartree centre.
More recently DiRAC has been competitively awarded three Intel Parallel Computing Centres, won several international supercomputing awards, and developed a close codesign project with Intel on future HPC architectures. This latter effort promises the ability to influence a vast swathe of modern computing over the next five years. Such improvements in computing would impact all consumers of computional hardware, and in particular those doing numerical simulation such as the advanced manufacturing sector (e.g. Rolls Royce, BAE, Mclaren, Shell, Jaguar Land Rover), in addition to much of academic research in the physical sciences.
The Intel-Alan Turing Institute strategic partnership, with an HPC architecture team embedded in the ATI is built on this foundation of a deep collaboration with theoretical particle physicists. DiRAC's technical director is the codesign leader for the ATI and will transfer DiRAC's best practice codesign techniques to a number of other subject domains covering HPC and Data Science.
The Alan Turing Insitute, with DiRAC's codesign leader playing a central role is actively engaging with external partners, such as Intel, the MET office, Shell, and even Mclaren Racing, propagating vectorisation techniques developed in QCD codes to the important area of Finite Elements Modelling.
The present proposal provides resources for the theoretical particle physics, astronomy and nuclear physics communities to interact with the codesign knowledge centre. The opportunity to influence with sensible engineering decisions that optimise codes for products and products for codes is real. One example is a recurring element of computer architecture involves the tradeoff between throughput and accuracy of vectorised reciprocal square root instructions. These are a key element of the inverse square law that dominates the gravitational element of astronomy simulations. We have the opportunity to give definitive statements about the right balance to Intel.
Similarly we hope to provide useful information on the requirements for data motion, from the complexities of cache organisation and algorithms to interconnect requirements. Nowhere is this more pressing than addressing the enormous challenges presented by the Square Kilometre Array, perhaps a leading Big Data problem in the near future of scientific endeavour.
Our work can also lead to cross fertilisation into the nascent fields of machine learning and data science through the Alan Turing Institute.
The DIRAC project has close ties with the highest levels of research and development in the computing industry. We designed a key component on the IBM BlueGene/Q system, as part of a unique Academic-Industrial joint project with IBM Watson Laboratory. The design has powered leading scientific computing installations around the world, from laboratories in the USA, Japan, Italy and Germany to our own national Hartree centre.
More recently DiRAC has been competitively awarded three Intel Parallel Computing Centres, won several international supercomputing awards, and developed a close codesign project with Intel on future HPC architectures. This latter effort promises the ability to influence a vast swathe of modern computing over the next five years. Such improvements in computing would impact all consumers of computional hardware, and in particular those doing numerical simulation such as the advanced manufacturing sector (e.g. Rolls Royce, BAE, Mclaren, Shell, Jaguar Land Rover), in addition to much of academic research in the physical sciences.
The Intel-Alan Turing Institute strategic partnership, with an HPC architecture team embedded in the ATI is built on this foundation of a deep collaboration with theoretical particle physicists. DiRAC's technical director is the codesign leader for the ATI and will transfer DiRAC's best practice codesign techniques to a number of other subject domains covering HPC and Data Science.
The Alan Turing Insitute, with DiRAC's codesign leader playing a central role is actively engaging with external partners, such as Intel, the MET office, Shell, and even Mclaren Racing, propagating vectorisation techniques developed in QCD codes to the important area of Finite Elements Modelling.
The present proposal provides resources for the theoretical particle physics, astronomy and nuclear physics communities to interact with the codesign knowledge centre. The opportunity to influence with sensible engineering decisions that optimise codes for products and products for codes is real. One example is a recurring element of computer architecture involves the tradeoff between throughput and accuracy of vectorised reciprocal square root instructions. These are a key element of the inverse square law that dominates the gravitational element of astronomy simulations. We have the opportunity to give definitive statements about the right balance to Intel.
Similarly we hope to provide useful information on the requirements for data motion, from the complexities of cache organisation and algorithms to interconnect requirements. Nowhere is this more pressing than addressing the enormous challenges presented by the Square Kilometre Array, perhaps a leading Big Data problem in the near future of scientific endeavour.
Our work can also lead to cross fertilisation into the nascent fields of machine learning and data science through the Alan Turing Institute.
Publications
Cossu G
(2021)
Nonperturbative Infrared Finiteness in a Superrenormalizable Scalar Quantum Field Theory.
in Physical review letters
Hill A
(2021)
The morphology of star-forming gas and its alignment with galaxies and dark matter haloes in the EAGLE simulations
in Monthly Notices of the Royal Astronomical Society
Talbot R
(2021)
Blandford-Znajek jets in galaxy formation simulations: method and implementation
in Monthly Notices of the Royal Astronomical Society
Pontzen A
(2021)
EDGE: a new approach to suppressing numerical diffusion in adaptive mesh simulations of galaxy formation
in Monthly Notices of the Royal Astronomical Society
Buzzo M
(2021)
Recovering the origins of the lenticular galaxy NGC 3115 using multiband imaging
in Monthly Notices of the Royal Astronomical Society
Changeat Q
(2021)
An Exploration of Model Degeneracies with a Unified Phase Curve Retrieval Analysis: The Light and Dark Sides of WASP-43 b
in The Astrophysical Journal
Bourne M
(2021)
AGN jet feedback on a moving mesh: gentle cluster heating by weak shocks and lobe disruption
in Monthly Notices of the Royal Astronomical Society
Allanson O
(2021)
Electron Diffusion and Advection During Nonlinear Interactions With Whistler-Mode Waves
in Journal of Geophysical Research: Space Physics
Czakon M
(2021)
NNLO QCD corrections to leptonic observables in top-quark pair production and decay
in Journal of High Energy Physics
Porth L
(2021)
Fast estimation of aperture-mass statistics - II. Detectability of higher order statistics in current and future surveys
in Monthly Notices of the Royal Astronomical Society
Benitez-Llambay A
(2021)
The Tail of Late-forming Dwarf Galaxies in ?CDM
in The Astrophysical Journal Letters
Koudmani S
(2021)
A little FABLE: exploring AGN feedback in dwarf galaxies with cosmological simulations
in Monthly Notices of the Royal Astronomical Society
Woss A
(2021)
Decays of an exotic 1 - + hybrid meson resonance in QCD
in Physical Review D
Jackson R
(2021)
The origin of low-surface-brightness galaxies in the dwarf regime
in Monthly Notices of the Royal Astronomical Society
Andrade T
(2021)
GRChombo: An adaptable numerical relativity code for fundamental physics
in Journal of Open Source Software
Baraffe I
(2021)
Two-dimensional simulations of solar-like models with artificially enhanced luminosity I. Impact on convective penetration
in Astronomy & Astrophysics
Drewes N
(2021)
On the Dynamics of Low-viscosity Warped Disks around Black Holes
in The Astrophysical Journal
Foster C
(2021)
The MAGPI survey: Science goals, design, observing strategy, early results and theoretical framework
in Publications of the Astronomical Society of Australia
Igoshev A
(2021)
Combined analysis of neutron star natal kicks using proper motions and parallax measurements for radio pulsars and Be X-ray binaries
in Monthly Notices of the Royal Astronomical Society
Owens A
(2021)
ExoMol line lists - XLI. High-temperature molecular line lists for the alkali metal hydroxides KOH and NaOH
in Monthly Notices of the Royal Astronomical Society
Cao K
(2021)
Studying galaxy cluster morphological metrics with mock-X
in Monthly Notices of the Royal Astronomical Society
Poncelet R
(2021)
NNLO QCD study of polarised W+W- production at the LHC
in Journal of High Energy Physics
Chakraborty B
(2021)
Improved V c s determination using precise lattice QCD form factors for D ? K l ?
in Physical Review D
Trujillo-Gomez S
(2021)
The kinematics of globular cluster populations in the E-MOSAICS simulations and their implications for the assembly history of the Milky Way
in Monthly Notices of the Royal Astronomical Society
Olsen K
(2021)
sígame v3: Gas Fragmentation in Postprocessing of Cosmological Simulations for More Accurate Infrared Line Emission Modeling
in The Astrophysical Journal
Dobbs C
(2021)
The properties of clusters, and the orientation of magnetic fields relative to filaments, in magnetohydrodynamic simulations of colliding clouds
in Monthly Notices of the Royal Astronomical Society
Thomas N
(2021)
The radio galaxy population in the simba simulations
in Monthly Notices of the Royal Astronomical Society
Fossati M
(2021)
MUSE analysis of gas around galaxies (MAGG) - III. The gas and galaxy environment of z = 3-4.5 quasars
in Monthly Notices of the Royal Astronomical Society
Mukherjee S
(2021)
SEAGLE - II. Constraints on feedback models in galaxy formation from massive early-type strong-lens galaxies
in Monthly Notices of the Royal Astronomical Society
Fiteni K
(2021)
The relative efficiencies of bars and clumps in driving disc stars to retrograde motion
in Monthly Notices of the Royal Astronomical Society
Šoltinskí T
(2021)
The detectability of strong 21 centimetre forest absorbers from the diffuse intergalactic medium in late reionisation models
in Monthly Notices of the Royal Astronomical Society
Hughes D
(2021)
Double-diffusive Magnetic Layering
in The Astrophysical Journal
Beckett A
(2021)
The relationship between gas and galaxies at z < 1 using the Q0107 quasar triplet
in Monthly Notices of the Royal Astronomical Society
Karunakaran A
(2021)
Satellites around Milky Way Analogs: Tension in the Number and Fraction of Quiescent Satellites Seen in Observations versus Simulations
in The Astrophysical Journal Letters
Horst L
(2021)
Multidimensional low-Mach number time-implicit hydrodynamic simulations of convective helium shell burning in a massive star
in Astronomy & Astrophysics
Mellor T
(2021)
Artificial Symmetries for Calculating Vibrational Energies of Linear Molecules
in Symmetry
Rogers J
(2021)
Photoevaporation versus core-powered mass-loss: model comparison with the 3D radius gap
in Monthly Notices of the Royal Astronomical Society
Raj A
(2021)
Disk Tearing: Implications for Black Hole Accretion and AGN Variability
in The Astrophysical Journal
Young A
(2021)
Chemical signatures of a warped protoplanetary disc
in Monthly Notices of the Royal Astronomical Society
Hergt L
(2021)
Bayesian evidence for the tensor-to-scalar ratio r and neutrino masses m ? : Effects of uniform versus logarithmic priors
in Physical Review D
Hernández-Aguayo C
(2021)
Galaxy formation in the brane world I: overview and first results
in Monthly Notices of the Royal Astronomical Society
Elsender D
(2021)
The statistical properties of protostellar discs and their dependence on metallicity
in Monthly Notices of the Royal Astronomical Society
Nixon C
(2021)
Partial, Zombie, and Full Tidal Disruption of Stars by Supermassive Black Holes
in The Astrophysical Journal
Clough K
(2021)
Continuity equations for general matter: applications in numerical relativity
in Classical and Quantum Gravity
Gronow S
(2021)
Double detonations of sub-M Ch CO white dwarfs: variations in Type Ia supernovae due to different core and He shell masses
in Astronomy & Astrophysics
Czakon M
(2021)
NNLO QCD predictions for W+c-jet production at the LHC
in Journal of High Energy Physics
Radia M
(2021)
Anomalies in the gravitational recoil of eccentric black-hole mergers with unequal mass ratios
in Physical Review D
Raste J
(2021)
Implications of the z > 5 Lyman-a forest for the 21-cm power spectrum from the epoch of reionization
in Monthly Notices of the Royal Astronomical Society
Buividovich P
(2021)
Static magnetic susceptibility in finite-density $$SU\left( 2\right) $$ lattice gauge theory
in The European Physical Journal A
Jones C
(2021)
Fully developed anelastic convection with no-slip boundaries
in Journal of Fluid Mechanics
| Description | DiRAC RSE's were hired by Intel and work with DiRAC team members to analyse the requirements of machine learning. We discovered the IEEE FP16 is not optimal for machine learning and that a new floating point format Bfloat16 is more effective. We coauthored a patent application with Intel |
| Exploitation Route | We have demonstrated to Intel and coauthored a patent. It is being released as part of Intel's Cooperlake architecture for broad use. |
| Sectors | Aerospace Defence and Marine Digital/Communication/Information Technologies (including Software) Electronics Financial Services and Management Consultancy Transport Other |
| URL | http://appft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&p=1&u=%2Fnetahtml%2FPTO%2Fsearch-bool.html&r=0&f=S&l=50&TERM1=Boyle&FIELD1=IN&co1=AND&TERM2=Kashyap&FIELD2=IN&d=PG01 |
| Description | Our software helped Intel improve their driver software for omnipath networks. DiRAC RSE's were hired by Intel to work with the Turing institute on machine learning. |
| First Year Of Impact | 2018 |
| Sector | Digital/Communication/Information Technologies (including Software),Electronics |
| Impact Types | Economic |
| Description | Intel ATI codesign project |
| Organisation | Intel Corporation |
| Department | Intel Corporation (Jones Farm) |
| Country | United States |
| Sector | Private |
| PI Contribution | I lead the Alan Turing Institute / Intel codesign project. Two Intel engineers are placed in Edinburgh to work with me. We have developed profiling tools for machine learning packages and led to insight into the architectural requirements of deep learning that have been propagated to Intel. The tool has been used to study reduced precision floating point formats, and specific new instruction set extensions have been proposed to Intel. |
| Collaborator Contribution | Two Intel engineers are placed in Edinburgh to work with me. We have developed profiling tools for machine learning packages and led to insight into the architectural requirements of deep learning that have been propagated to Intel. The tool has been used to study reduced precision floating point formats, and specific new instruction set extensions have been proposed to Intel. |
| Impact | Paper with Intel MPI team. Multidisciplinary, particle physics, computing science and electronic engineering. |
| Start Year | 2016 |
| Description | Intel IPAG QCD codesign project |
| Organisation | Intel Corporation |
| Department | Intel Corporation (Jones Farm) |
| Country | United States |
| Sector | Private |
| PI Contribution | We have collaborated with Intel corporation since 2014 with $720k of total direct funding, starting initially as an Intel parallel computing centre, and expanding to direct close collaboration with Intel Pathfinding and Architecture Group. |
| Collaborator Contribution | We have performed detailed optimisation of QCD codes (Wilson, Domain Wall, Staggered) on Intel many core architectures. We have investigated the memory system and interconnect performance, particularly on Intel's latest interconnect hardware called Omnipath. We found serious performance issues and worked with Intel to plan a solution and this has been verified and is available as beta software. It will reach general availability in the Intel MPI 2019 release, and allow threaded concurrent communications in MPI for the first time. A joint paper on the resolution to this was written with the Intel MPI team, and the application of the same QCD programming techniques to machine learning gradient reduction was applied in the paper to the Baidu Research all reduce library, demonstrating a 10x gain for this critical step in machine learning in clustered environments. We are also working with Intel verifying future architectures that will deliver the exascale performance in 2021. |
| Impact | We have performed detailed optimisation of QCD codes (Wilson, Domain Wall, Staggered) on Intel many core architectures. We have investigated the memory system and interconnect performance, particularly on Intel's latest interconnect hardware called Omnipath. We found serious performance issues and worked with Intel to plan a solution and this has been verified and is available as beta software. It will reach general availability in the Intel MPI 2019 release, and allow threaded concurrent communications in MPI for the first time. A joint paper on the resolution to this was written with the Intel MPI team, and the application of the same QCD programming techniques to machine learning gradient reduction was applied in the paper to the Baidu Research all reduce library, demonstrating a 10x gain for this critical step in machine learning in clustered environments. This collaboration has been renewed annually in 2018, 2019, 2020. Two DiRAC RSE's were hired by Intel to work on the Turing collaboration. |
| Start Year | 2016 |
| Title | FP16-S7E8 MIXED PRECISION FOR DEEP LEARNING AND OTHER ALGORITHMS |
| Description | We demonstrated that a new non-IEEE 16 bit floating point format is the optimal choice for machine learning training and proposed instructions. |
| IP Reference | US20190042544 |
| Protection | Patent application published |
| Year Protection Granted | 2019 |
| Licensed | Yes |
| Impact | We demonstrated that a new non-IEEE 16 bit floating point format is the optimal choice for machine learning training and proposed instructions. Intel filed this with US patent office. This IP is owned by Intel under the terms of the Intel Turing strategic partnership contract. As a co-inventor I have been named on the patent application. The proposed format has been announced as planned for use in future Intel architectures. This collaboration with Turing emerged out of an investment in Edinburgh by Intel Pathfinding and Architecture Group in codesign with lattice gauge theory simulations. Intel hired DiRAC RSE's Kashyap and Lepper and placed them in Edinburgh to work with me on Machine Learning codesign through the Turing programme. |
| Description | Panel discussion on machine learning and future HPC Intel HPC developer conference. |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | Invited as panel expert on future of HPC and machine learning by Intel at their annual HPC developer conference attended widely by Industry and research lab sector. Note, Boyle second from left in photograph on the Intel web page linked below. |
| Year(s) Of Engagement Activity | 2017 |
| URL | https://www.intel.com/content/www/us/en/events/hpcdevcon/overview.html |
| Description | Talk on MPI optimisation on Intel stand at Supercomputing 2017 |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | Decision influence: I Influenced Intel to modify, update and release optimisations to their MPI library for the Intel Omnipath interconnect. Coauthored a paper on this topic. |
| Year(s) Of Engagement Activity | 2017 |
| URL | http://inspirehep.net/record/1636204 |
| Description | Talks presented on this activity at Intel Xeon Phi User Group conferences. |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Industry/Business |
| Results and Impact | Presented work in several Intel Xeon Phi User Group meetings. |
| Year(s) Of Engagement Activity | 2016,2017 |
