LECTURE: LanguagE ComposiTion UnifiEd

Lead Research Organisation: King's College London
Department Name: Informatics

Abstract

General purpose programming languages are the basis of virtually all current software systems. However, most large problems do not naturally decompose onto the strengths of any single programming language. Consider a government department processing economic data. Its data collection team might prefer to use Python; its data analysis team R; and its front-end team Java. However, since functionality is intermingled -- the front-end may need to both process and analyse data, for example -- it is currently impractical to use different languages in such a way. Instead, such users are typically forced to pick a single language, even though it is inappropriate for much of what they need.

Language composition is one answer to this long-standing problem: it aims to allow users to mix languages (programming languages and / or domain specific languages) in a fine-grained manner, so that each part of a problem can be expressed using the language most suited to it. However, existing language composition approaches tend either to be safe but limited in expressivity (e.g. Converge) or flexible but unsafe (e.g. MetaBorg/Stratego). This unpalatable choice is in no small part caused by the division of programming into three main phases: editing, compilation, and execution. Each phase has its own tool(s), forcing a language's semantics to be duplicated over multiple systems. Various problems result: discrepancies, or approximations, of language semantics amongst tools cause subtle bugs; information loss between tools stymies debugging; and encodings lead to poor performance.

The Lecture fellowship hypothesises that language composition needs a unified tool environment, equivalent to programming language IDEs. This will free language composition from the many practical burdens imposed by spreading language composition over multiple, distinct tools. We call the tooling required a "foundry", an environment for new languages to be "cast" by composing languages together; the resultant languages can then be used to write programs. Foundries will provide a single environment to end-users that will allow them to compose languages, write programs in the result, compile them into a suitable end target, execute, and debug them. Foundries can therefore be thought of as a combination of a customisable editor and compiler, providing a single environment to end-users that will allow them to compose languages, write programs in the result, compile them into a suitable end target, execute, and debug them. Ultimately, we envisage end-users choosing amongst different foundries in the same way that they can choose amongst multiple IDEs. Once a foundry has been chosen and downloaded, the user will then select between already implemented language components to create their own composed language. They will then edit programs in as natural a fashion as they can in current IDEs.

Tratt's existing language composition research -- language boxes make editing composed programs a reality; and VM composition makes running composed programs feasible -- provides pillars upon which to build, but many areas remain untackled and it is to those that Lecture addresses itself. Languages will need to be defined as a series of sub-components (e.g. syntax, implementation, type definitions). Intra-language mappings will relate syntax to implementations. Inter-language mappings will allow different languages to interact statically (e.g. to build type checkers) and dynamically (e.g. to exchange run-time data). Tooling will need to be developed to express each of these and to present the result to users in a comprehensible fashion. Alongside these are important design issues: what are sensible granularity levels for language components? how can foundries be best structured? etc.

Planned Impact

By improving the languages we have at our disposal to express software, Lecture will benefit several different groups. We identify two in particular.

Language designers are the initial target for Lecture's output. Although programming language design can appear to be simple, it is a tricky task, consisting of many small, subtly interlocking, decisions: seemingly inconsequential decisions taken early on often turn out later to be crucial in determining adoption. Since programming languages are currently designed and implemented almost entirely from scratch, creating a new language is a lengthy and risky enterprise, where small mistakes can ruin years of hard work. Furthermore, even if every decision taken is a good one, designing a language which people wish to use is difficult: many good language design ideas have faded into obscurity because the languages they were a part of were considered insufficiently mature to be worth investigating.

Lecture will therefore give language designers and implementers new means to experiment with languages, and to reduce their production costs significantly. By breaking languages into components, Lecture will increase the likelihood of reuse and reduce the chances of well understood problems creeping into language design. Lecture may even remove the need for general purpose language design where that is instigated purely to add, remove, or alter a single aspect of a previous language.

The second target for Lecture is software developers. If Lecture is able to realise the potential of language composition, it will open up entirely new avenues for software development and change the way software is developed. Different developers working on a single system will be able to improve their efficiency by using the languages best suited to their purposes. In the longer term, language composition may empower experts in non-computing domains to write parts of software, removing one of the most painful bottlenecks in software development. Health-care professionals, for example, may be able to specify relevant software behaviour directly instead of mediating through developers. Fine-grained language composition will also open up new possibilities for migrating \1 software gradually to newer languages, without the `big bang' that often hinders such efforts currently.

To realise Lecture's impact, we will focus on community appearances, publicly available and accessibly written articles, and uptake by respected organisations. We will also hold two workshops: the first focused on researchers in relevant areas including language designers; the second on potential early adopters.

Publications

10 25 50
publication icon
Erdweg S (2015) Evaluating and comparing language workbenches in Computer Languages, Systems & Structures

publication icon
Ó Cinnéide M (2016) An experimental search-based approach to cohesion metric evaluation in Empirical Software Engineering

publication icon
Barrett E (2017) Virtual machine warmup blows hot and cold in Proceedings of the ACM on Programming Languages

 
Description Language composition was previously thought difficult for small languages and impractical for medium-sized languages. In this Fellowship, my team and I have shown that it is in fact possible to compose large real-world languages in a natural way, opening up a number of new academic and commercial possibilities. Whilst carrying out this research, we also discovered substantial problems with the performance of many major programming language implementations, which suggests there is considerable potential to improve this situation in the future.
Exploitation Route Please see above.
Sectors Digital/Communication/Information Technologies (including Software)

 
Description HAMLET: Hardware Enabled Meta-Tracing (ext.)
Amount £922,997 (GBP)
Funding ID EP/S020861/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 07/2019 
End 07/2022
 
Description Faster real-world tracing JIT compilers 
Organisation Cloudflare, Inc.
Country United States 
Sector Private 
PI Contribution Our initial work on benchmarking VM performance (which led to our OOPSLA '17 paper) meant that we were able to show Cloudflare how our research ideas could be used to improve JIT compiling VMs. They have funded this 12 month project with 1 RA in response.
Collaborator Contribution Substantial funding; access to engineer time; and benchmark examples.
Impact No direct outputs yet.
Start Year 2017
 
Description Lexical Cross-Language Interoperability 
Organisation Oracle Corporation
Department Oracle EMEA
Country United States 
Sector Private 
PI Contribution This funding is supporting 1 Research Associate for 1 year investigating how different languages used in composed programs interoperate, considering such issues as debugging and profiling.
Collaborator Contribution We have regular meetings with Oracle Labs (mostly telecons, but also in-person meetings) where we share expertise and experiences in both directions.
Impact Ongoing.
Start Year 2015
 
Description Visualizing Cross-Language Execution 
Organisation Oracle Corporation
Department Oracle Corporation UK Ltd
Country United Kingdom 
Sector Private 
PI Contribution I am the PI on this project investigating the visualisation of program execution in the face of language composition.
Collaborator Contribution We have regular meetings with two members of Oracle Labs; they have provided ideas, and data to validate our ideas.
Impact Not currently applicable.
Start Year 2016
 
Title Krun 
Description Krun is a state-of-the-art software benchmark runner. It rigorously benchmarks software, controlling more confounding variables than any previous tool. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Too early to say. 
 
Title PyHyp 
Description PyHyp is the first large-scale language composition of two real-world languages: in this case PHP and Python. It consists of a fast virtual machine to execute PyHyp programs and special editor support via the Eco tool to make writing composed programs plausible. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Too early to state. 
URL http://soft-dev.org/pubs/files/pyhyp/
 
Title SQPyte 
Description SQPyte takes the widely used SQLite database and converts part of it into a fast meta-tracing virtual machine. When called regularly from a programming language, SQPyte is generally significantly faster than SQLite. 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact The SQLite project contacted us to find out what they can learn from SQPyte. This discussion is ongoing. 
URL http://soft-dev.org/pubs/files/sqpyte/
 
Title grmtools 
Description A new parsing framework including advanced error recovery. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact None yet. 
URL https://soft-dev.org/src/grmtools/
 
Title warmup_stats 
Description warmup_stats analyses data produced from VM software experiments, identifies important patterns in the data, and aggregates and presents it in usable form to the user. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Too early to say.