Maintaining memory order dependencies through LLVM optimisation passes.
Lead Research Organisation:
University of Kent
Department Name: Sch of Computing
Abstract
Modern processors have complex memory models that can exhibit unwanted
behaviours if not correctly used. Many modern compilers do not provide
strong guarantees on the execution of programs in certain racy blocks of
code; this creates a set of weak memory behaviours in low level applications,
which can be unexpected, or are poorly modelled. [Batty et al.(2015)].
Some compilers make conservative judgements about the way these memory
models work, and emit machine code which is not as well optimised as it
could be. In the case of LLVM, optimisation passes have been analysed and
tested, but haven't been verified with respect to weak memory behaviours,
and they don't strictly reflect the x86 TSO memory model.
In a system being carefully designed around x86-TSO, such as the "readcopy-update"
primitive in the Linux kernel. The code which implements
this is core to the functionality of Linux and is called very frequently. A
performance boost could be obtained by omitting some of the fences present
in the conservative output of the LLVM C compiler (Clang).
The Linux RCU (read-copy-update) primitive is a mechanism for cheap
locking in the Linux Kernel. It has been implemented by Paul McKenney,
it relies on ordering preserved through dependencies in C code. This is
relies heavily in on the the dependencies preserved by GCC. LLVM does
not preserve these dependencies, and instead Linux kernels compiled under
LLVM have to include a collection of "fences" which are more than likely
slower than preserving dependencies.
By exploring changes to LLVM and it's optimisations I will ensure and
verify the dependency preservation of the compiler, and to be able to implement
a fence-free LLVM compiled Linux RCU primitive.
An ultimate goal of this research would be to be able to implement Linux
RCU without fences and measure the performance increase. I have a few
lines of exploration planned around this research.
There has been some recent work to verify optimisations in LLVM by
[Chakraborty and Vafeiadis(2016)], but they still rely on the insertion of
"fences" into code, rather than relying on the preservation of dependencies
as [McKenney(2016)] suggests. As previously discussed, the removal of
these fences should provide a measurable performance increase to the Linux
Kernel
behaviours if not correctly used. Many modern compilers do not provide
strong guarantees on the execution of programs in certain racy blocks of
code; this creates a set of weak memory behaviours in low level applications,
which can be unexpected, or are poorly modelled. [Batty et al.(2015)].
Some compilers make conservative judgements about the way these memory
models work, and emit machine code which is not as well optimised as it
could be. In the case of LLVM, optimisation passes have been analysed and
tested, but haven't been verified with respect to weak memory behaviours,
and they don't strictly reflect the x86 TSO memory model.
In a system being carefully designed around x86-TSO, such as the "readcopy-update"
primitive in the Linux kernel. The code which implements
this is core to the functionality of Linux and is called very frequently. A
performance boost could be obtained by omitting some of the fences present
in the conservative output of the LLVM C compiler (Clang).
The Linux RCU (read-copy-update) primitive is a mechanism for cheap
locking in the Linux Kernel. It has been implemented by Paul McKenney,
it relies on ordering preserved through dependencies in C code. This is
relies heavily in on the the dependencies preserved by GCC. LLVM does
not preserve these dependencies, and instead Linux kernels compiled under
LLVM have to include a collection of "fences" which are more than likely
slower than preserving dependencies.
By exploring changes to LLVM and it's optimisations I will ensure and
verify the dependency preservation of the compiler, and to be able to implement
a fence-free LLVM compiled Linux RCU primitive.
An ultimate goal of this research would be to be able to implement Linux
RCU without fences and measure the performance increase. I have a few
lines of exploration planned around this research.
There has been some recent work to verify optimisations in LLVM by
[Chakraborty and Vafeiadis(2016)], but they still rely on the insertion of
"fences" into code, rather than relying on the preservation of dependencies
as [McKenney(2016)] suggests. As previously discussed, the removal of
these fences should provide a measurable performance increase to the Linux
Kernel
People |
ORCID iD |
Mark Batty (Primary Supervisor) | |
Simon Cooksey (Student) |
Studentship Projects
Project Reference | Relationship | Related To | Start | End | Student Name |
---|---|---|---|---|---|
EP/R513246/1 | 01/10/2018 | 30/09/2023 | |||
1773013 | Studentship | EP/R513246/1 | 01/10/2016 | 31/03/2020 | Simon Cooksey |
Description | The mathematical complexity of simulating how multi-threaded programs work is higher than we expected. To better understand these programs, complex mathematical models have been derived. These models are often impenetrable, even by experts. It was identified early on in this project that to succeed in critiquing and further develop these models that they must be executable and scrutable. Building fast software to evaluate many small programs under each model gives intuitions about the mechanisms that each artefact provides, as well as giving a coarse grained (but useful) set of points for comparison. |
Exploitation Route | Further development of complex models can be achieved using our findings. |
Sectors | Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Electronics,Energy |
Description | I have been involved with developing industry specifications for memory consistency models which will be published as general technical documentation for all programmers. |
First Year Of Impact | 2018 |
Sector | Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Electronics,Energy |
Impact Types | Societal |
Description | Attendance at ISO C/C++ Meetings (SG1). |
Geographic Reach | Multiple continents/international |
Policy Influence Type | Membership of a guideline committee |
Description | Fixing the thin-air problem: ISO disemination |
Amount | £60,458 (GBP) |
Organisation | The Research Institute on Verified Trustworthy Software Systems |
Sector | Academic/University |
Country | United Kingdom |
Start | 04/2020 |
End | 03/2021 |
Description | PrideMM web interface |
Amount | £41,000 (GBP) |
Organisation | The Research Institute on Verified Trustworthy Software Systems |
Sector | Academic/University |
Country | United Kingdom |
Start | 08/2018 |
End | 03/2019 |
Description | Programming Lanugages Mentoring Workshop grant |
Amount | £750 (GBP) |
Organisation | Association for Computing Machinery |
Sector | Charity/Non Profit |
Country | United States |
Start | 01/2017 |
End | 01/2017 |
Description | Industrial Research Internship |
Organisation | NVIDIA |
Country | Global |
Sector | Private |
PI Contribution | Provided mathematical modelling of industrial hardware specifications. |
Collaborator Contribution | Industrial internship to work on research related to my thesis. |
Impact | Submitted a paper to leading academic conference. |
Start Year | 2018 |
Title | MRDer |
Description | Evaluator for first denotational semantics of C/C++ which forbids thin-air executions. |
Type Of Technology | Software |
Year Produced | 2019 |
Impact | Part of work published at ESOP 2020. |
Title | PrideMM |
Description | Evaluates small programs, called litmus tests, under a cutting edge mathematical model of concurrency. |
Type Of Technology | Software |
Year Produced | 2018 |
Impact | Contributed to a paper seen at TACAS 2019. Cited by James Riely and Alan Jeffery in 2019 Journal version of Event Structures Work. Drove further development of more simple Thin-Air free memory models. |
Description | Speaking at ISO/WG21/SG1 standards meeting |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | Described cutting edge research into memory consistency models, and agreed strategy for developing this model further. WG21 agreed that our work shows promise as a future direction for the international standard of C++ concurrency and would like to see more. |
Year(s) Of Engagement Activity | 2019,2020 |
URL | https://github.com/cplusplus/papers/issues/554 |