LECTURE: LanguagE ComposiTion UnifiEd

Lead Research Organisation: King's College London

Department Name: Informatics

Abstract

General purpose programming languages are the basis of virtually all current software systems. However, most large problems do not naturally decompose onto the strengths of any single programming language. Consider a government department processing economic data. Its data collection team might prefer to use Python; its data analysis team R; and its front-end team Java. However, since functionality is intermingled -- the front-end may need to both process and analyse data, for example -- it is currently impractical to use different languages in such a way. Instead, such users are typically forced to pick a single language, even though it is inappropriate for much of what they need.

Language composition is one answer to this long-standing problem: it aims to allow users to mix languages (programming languages and / or domain specific languages) in a fine-grained manner, so that each part of a problem can be expressed using the language most suited to it. However, existing language composition approaches tend either to be safe but limited in expressivity (e.g. Converge) or flexible but unsafe (e.g. MetaBorg/Stratego). This unpalatable choice is in no small part caused by the division of programming into three main phases: editing, compilation, and execution. Each phase has its own tool(s), forcing a language's semantics to be duplicated over multiple systems. Various problems result: discrepancies, or approximations, of language semantics amongst tools cause subtle bugs; information loss between tools stymies debugging; and encodings lead to poor performance.

The Lecture fellowship hypothesises that language composition needs a unified tool environment, equivalent to programming language IDEs. This will free language composition from the many practical burdens imposed by spreading language composition over multiple, distinct tools. We call the tooling required a "foundry", an environment for new languages to be "cast" by composing languages together; the resultant languages can then be used to write programs. Foundries will provide a single environment to end-users that will allow them to compose languages, write programs in the result, compile them into a suitable end target, execute, and debug them. Foundries can therefore be thought of as a combination of a customisable editor and compiler, providing a single environment to end-users that will allow them to compose languages, write programs in the result, compile them into a suitable end target, execute, and debug them. Ultimately, we envisage end-users choosing amongst different foundries in the same way that they can choose amongst multiple IDEs. Once a foundry has been chosen and downloaded, the user will then select between already implemented language components to create their own composed language. They will then edit programs in as natural a fashion as they can in current IDEs.

Tratt's existing language composition research -- language boxes make editing composed programs a reality; and VM composition makes running composed programs feasible -- provides pillars upon which to build, but many areas remain untackled and it is to those that Lecture addresses itself. Languages will need to be defined as a series of sub-components (e.g. syntax, implementation, type definitions). Intra-language mappings will relate syntax to implementations. Inter-language mappings will allow different languages to interact statically (e.g. to build type checkers) and dynamically (e.g. to exchange run-time data). Tooling will need to be developed to express each of these and to present the result to users in a comprehensible fashion. Alongside these are important design issues: what are sensible granularity levels for language components? how can foundries be best structured? etc.

Planned Impact

By improving the languages we have at our disposal to express software, Lecture will benefit several different groups. We identify two in particular.

Language designers are the initial target for Lecture's output. Although programming language design can appear to be simple, it is a tricky task, consisting of many small, subtly interlocking, decisions: seemingly inconsequential decisions taken early on often turn out later to be crucial in determining adoption. Since programming languages are currently designed and implemented almost entirely from scratch, creating a new language is a lengthy and risky enterprise, where small mistakes can ruin years of hard work. Furthermore, even if every decision taken is a good one, designing a language which people wish to use is difficult: many good language design ideas have faded into obscurity because the languages they were a part of were considered insufficiently mature to be worth investigating.

Lecture will therefore give language designers and implementers new means to experiment with languages, and to reduce their production costs significantly. By breaking languages into components, Lecture will increase the likelihood of reuse and reduce the chances of well understood problems creeping into language design. Lecture may even remove the need for general purpose language design where that is instigated purely to add, remove, or alter a single aspect of a previous language.

The second target for Lecture is software developers. If Lecture is able to realise the potential of language composition, it will open up entirely new avenues for software development and change the way software is developed. Different developers working on a single system will be able to improve their efficiency by using the languages best suited to their purposes. In the longer term, language composition may empower experts in non-computing domains to write parts of software, removing one of the most painful bottlenecks in software development. Health-care professionals, for example, may be able to specify relevant software behaviour directly instead of mediating through developers. Fine-grained language composition will also open up new possibilities for migrating \1 software gradually to newer languages, without the `big bang' that often hinders such efforts currently.

To realise Lecture's impact, we will focus on community appearances, publicly available and accessibly written articles, and uptake by respected organisations. We will also hold two workshops: the first focused on researchers in relevant areas including language designers; the second on potential early adopters.

Funded Value:

£953,404

Funded Period:

Jul 14 - Jul 19

Funder:

EPSRC

Project Status:

Closed

Project Category:

Fellowship

Project Reference:

EP/L02344X/1

Principal Investigator:

Laurence Tratt

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Fundamentals of Computing (50%)

Software Engineering (50%)

Organisations

People	ORCID iD
Laurence Tratt (Principal Investigator / Fellow)

Publications

Author Name Title Publication

Date Published

10 25 50

Erdweg S (2015) Evaluating and comparing language workbenches in Computer Languages, Systems & Structures

Ó Cinnéide M (2016) An experimental search-based approach to cohesion metric evaluation in Empirical Software Engineering

Barrett E (2017) Virtual machine warmup blows hot and cold in Proceedings of the ACM on Programming Languages

Barrett E (2016) Fine-grained Language Composition: A Case Study

Bolz CF (2016) Making an Embedded DBMS JIT-friendly

Diekmann L (2020) Don't Panic! Better, Fewer, Syntax Errors for LR Parsers

Diekmann L (2019) Default disambiguation for online parsers

Berger M (2017) Modelling Homogeneous Generative Meta-Programming

Key Findings
Further Funding
Collaboration
Software and Technical Products


Description	Language composition was previously thought difficult for small languages and impractical for medium-sized languages. In this Fellowship, my team and I have shown that it is in fact possible to compose large real-world languages in a natural way, opening up a number of new academic and commercial possibilities. Whilst carrying out this research, we also discovered substantial problems with the performance of many major programming language implementations, which suggests there is considerable potential to improve this situation in the future.
Exploitation Route	Please see above.
Sectors	Digital/Communication/Information Technologies (including Software)


Description	HAMLET: Hardware Enabled Meta-Tracing (ext.)
Amount	£922,997 (GBP)
Funding ID	EP/S020861/1
Organisation	Engineering and Physical Sciences Research Council (EPSRC)
Sector	Public
Country	United Kingdom
Start	07/2019
End	07/2022


Description	Faster real-world tracing JIT compilers
Organisation	Cloudflare, Inc.
Country	United States
Sector	Private
PI Contribution	Our initial work on benchmarking VM performance (which led to our OOPSLA '17 paper) meant that we were able to show Cloudflare how our research ideas could be used to improve JIT compiling VMs. They have funded this 12 month project with 1 RA in response.
Collaborator Contribution	Substantial funding; access to engineer time; and benchmark examples.
Impact	No direct outputs yet.
Start Year	2017


Description	Lexical Cross-Language Interoperability
Organisation	Oracle Corporation
Department	Oracle EMEA
Country	United States
Sector	Private
PI Contribution	This funding is supporting 1 Research Associate for 1 year investigating how different languages used in composed programs interoperate, considering such issues as debugging and profiling.
Collaborator Contribution	We have regular meetings with Oracle Labs (mostly telecons, but also in-person meetings) where we share expertise and experiences in both directions.
Impact	Ongoing.
Start Year	2015


Description	Visualizing Cross-Language Execution
Organisation	Oracle Corporation
Department	Oracle Corporation UK Ltd
Country	United Kingdom
Sector	Private
PI Contribution	I am the PI on this project investigating the visualisation of program execution in the face of language composition.
Collaborator Contribution	We have regular meetings with two members of Oracle Labs; they have provided ideas, and data to validate our ideas.
Impact	Not currently applicable.
Start Year	2016


Title	Krun
Description	Krun is a state-of-the-art software benchmark runner. It rigorously benchmarks software, controlling more confounding variables than any previous tool.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	Too early to say.


Title	PyHyp
Description	PyHyp is the first large-scale language composition of two real-world languages: in this case PHP and Python. It consists of a fast virtual machine to execute PyHyp programs and special editor support via the Eco tool to make writing composed programs plausible.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	Too early to state.
URL	http://soft-dev.org/pubs/files/pyhyp/


Title	SQPyte
Description	SQPyte takes the widely used SQLite database and converts part of it into a fast meta-tracing virtual machine. When called regularly from a programming language, SQPyte is generally significantly faster than SQLite.
Type Of Technology	Software
Year Produced	2016
Open Source License?	Yes
Impact	The SQLite project contacted us to find out what they can learn from SQPyte. This discussion is ongoing.
URL	http://soft-dev.org/pubs/files/sqpyte/


Title	grmtools
Description	A new parsing framework including advanced error recovery.
Type Of Technology	Software
Year Produced	2018
Open Source License?	Yes
Impact	None yet.
URL	https://soft-dev.org/src/grmtools/


Title	warmup_stats
Description	warmup_stats analyses data produced from VM software experiments, identifies important patterns in the data, and aggregates and presents it in usable form to the user.
Type Of Technology	Software
Year Produced	2017
Open Source License?	Yes
Impact	Too early to say.

Abstract

Planned Impact

Organisations

People

ORCID iD

Publications