HAMLET: Hardware Enabled Meta-Tracing (ext.)

Lead Research Organisation: King's College London
Department Name: Informatics

Abstract

As our software systems grow in size and complexity, increasingly diverse users
have different wants and needs from their languages: the right language for a
statistician (e.g. R) is different from that of someone who formally verifies
safety properties (e.g. OCaml), which is different again from someone creating
user-facing apps (e.g. Javascript). However, different languages inhabit
different silos and interactions between them are crude and slow. Language
composition has long been touted as the solution to this problem, allowing
languages to be used together in a fine-grained way, but has traditionally
struggled to match this promise. In the Lecture Fellowship, my team and I showed
that large, messy, real-world languages can be composed together, even allowing
different languages to be intermingled within a single line of code. We were
able to make the performance of such multi-lingual programs close to their
mono-language constituents, showing that language composition's promise is real.

However, in the course of this research, an unexpected problem became apparent:
Virtual Machines (VMs), the systems used to make many languages run fast (and
which are crucial to the good performance of language composition), do not
perform as expected. In the largest VM experiment to date, we showed
that VMs perform incorrectly in around 60% of cases. Attempts to fix existing
VMs have largely failed, because the problems are so deeply embedded that they
cannot be teased out, even after careful examination. This is a significant
problem for language composition, for which VMs are a foundational pillar.

This Fellowship Extension thus aims to show that VMs can have good, predictable
performance and that they are a suitable foundational pillar for language
composition. However, we cannot expect to create a traditional VM, which often
consume tens, hundreds, or thousands of person years of effort. Instead, my team
and I will create a new meta-tracing VM system, since history shows that these
can be created in a small number of person years. Fortunately for us,
meta-tracing has also been shown as the fastest way to run multi-lingual
programs, so it is a natural fit. We will rigorously benchmark the new
meta-tracing system we create from the beginning of, and throughout, its
development. This will enable us to observe performance regressions soon after
they occur, allowing us to fix them quickly.

We will also take the opportunity to address one of meta-tracing's biggest
weaknesses: its slow warmup, that is the time between a program starting, and
JIT compilation completing. Tracing currently involves a software interpreter
interpreting a software interpreter, with a 100-200x overhead when a loop is
traced. We will use the Processor Trace (PT) feature found in recent x86 chips
to move the software part of meta-tracing into hardware, giving a roughly 100x
speed-up to this critical phase of the system. That will also allow us to be
more aggressive in optimising other parts of the tracer that currently cause
poor warm-up.

At the end of this Fellowship Extension, alongside traditional research papers,
we will produce an open-source release of our new meta-tracing system. This will
allow others to build on our work, be that for language composition, or simply
to make individual languages run fast.

Planned Impact

My Fellowship Extension's chief medium and long term impact will be to make
language composition a reality by making composed VMs perform well. This will
unlock language composition's potential to solve two hard problems: gradually
migrating systems from an old to a new language; and making the existing use of
multiple languages faster. Both have the potential to significantly improve the
way that end users develop software.

In order to realise this impact, we absolutely require a system which allows VMs
with predictable performance to be created. Thus the open-source hardware
meta-tracer we will create (Yorick) is crucial to Hamlet's impact. Alongside
this, a significant work package element in Hamlet's final year will evaluate
Yorick's effectiveness when used for language composition, giving pointers to
how it might need to be adjusted to be more effective. It will also be crucial
to make Yorick a self-sustaining open-source system, with a vibrant community,
so that Yorick is useful beyond the Fellowship's end: we have a detailed plan to
produce gradually more mature releases throughout Hamlet's duration, and to
engage with the open-source community through open development methods.

In the long term, language composition will need to be taken up by industry in
order for it to reach its full potential. Although it is hard to plan a concrete
route to realising this impact before the underlying technology is ready, Hamlet
will perform several actions to prepare the way. First, we will make use of the
project partners as a good mechanism for initial dissemination of Hamlet's
concepts and results. Second, we will run a workshop in Hamlet's last 6 months
to engage with a wide community of interested researchers and users. Third, the
open-source release of Yorick will reduce the barriers to long-term use by
industry. Fourth, the Fellow will publicise Hamlet's research results on his
blog and other social media, reaching beyond traditional venues to reach a wide
audience.

Finally, Hamlet will have a significant impact on the Fellow and, in particular,
the RAs. Hamlet contains a comprehensive training plan to enhance their skills and
profile. When Hamlet concludes, they will be well placed to pursue whatever job
(academic or industrial) they desire.

Publications

10 25 50