PLanCompS: Programming Language Components and Specifications

Lead Research Organisation: Royal Holloway, University of London
Department Name: Computer Science


Software comes in many different shapes and sizes: ranging from games and social networking systems through databases and office software to control system for vehicles and medical instrumentation. Regardless of its purpose, software is almost always written in high-level programming languages, which are significantly easier to use than the low-level machine languages which can be executed directly by computers.Before a program written in a high-level language can be run on a particular computer, the language needs to have been implemented on that computer. The implementation could be a compiler, which translates high-level programs to machine code; alternatively, it might directly interpret them, simulating their intended behaviour.Many hundreds of programming languages have been designed and implemented since the 1950s, and dozens are currently in widespread use. Major ones introduced since 1995 include Java, C#, Python, Ruby, OCaml, Delphi, and VBScript. Older languages evolve to incorporate new features: new versions of Fortran, Cobol, Ada, C++, Scheme and Haskell appear at intervals ranging from one to 10 years. New programming languages are continually being designed and implemented, with the aim of making it easier (or at least quicker and cheaper) for programmers to write useful software. So-called domain-specific languages (DSLs) are designed for use in a particular sector, such as banking or engineering, or particular application areas, e.g., interactive web pages; they are often obtained by extending general-purpose languages with features that correspond closely to standard concepts or notation in the targeted sector.The documentation of a language design is called a language specification. This usually consists of a succinct formal grammar, determining the syntax of the language (i.e., which sequences of characters are allowed as programs, and how they are to be grouped into meaningful phrases), together with a lengthy informal explanation of their semantics (i.e., the required behaviour when programs are run), written in a natural language such as English. Unfortunately, such explanations are inherently imprecise, open to misinterpretation, and not amenable to validation.This project will employ innovative techniques for specifying the semantics of languages completely formally. The main novelty will be the creation of a large collection of reusable components of language specifications. Each component will correspond to a fundamental programming construct, or funcon, with fixed semantics. Translation of program phrases to combinations of funcons determines the semantics of the programs, and specifying this translation is much simpler - and much less effort - than specifying their formal semantics directly. The project will test and demonstrate the advantages of this component-based approach to language specification using case studies involving specification of major programming languages (including C# and Java) and DSLs.Sophisticated tools and an integrated development environment will be designed and implemented by the project to support creation and validation of component-based language specifications. The tools will support automatic generation of correct prototype implementations directly from specifications, allowing programs to be run according to their formal semantics. This will encourage language designers to experiment with different designs before initiating a costly manual implementation of a particular design, which may lead to development of better languages.Funcon and language specifications will be stored in an open-access repository, and a digital library interface will support browsing and searching in the repository. The library will also provide access to digital copies of existing formal specifications of programming languages using previous approaches.


10 25 50
publication icon
Scott E (2019) Derivation representation using binary subtree sets in Science of Computer Programming

publication icon
Van Binsbergen L (2019) Executable component-based semantics in Journal of Logical and Algebraic Methods in Programming

publication icon
Scott E (2013) GLL parse-tree generation in Science of Computer Programming

publication icon
Van Binsbergen L (2018) GLL parsing with flexible combinators

publication icon
Scott E (2018) GLL syntax analysers for EBNF grammars in Science of Computer Programming

publication icon
Johnstone A (2014) Modular grammar specification in Science of Computer Programming

publication icon
Johnstone A (2015) Principled software microengineering in Science of Computer Programming

publication icon
Afroozeh A (2013) Software Language Engineering

Description This research has shown that formally specified 'Fundamental Constructs' may be used to define the meaning (or 'semantics') of programming languages by software engineers who lack detailed training in formal semantics. A significant result is that we now have a quite-high language in which to discuss the implementation of programming language processors which is quite easy to understand, yet which may be expanded into a fully formal semantics with unambiguous mathematical foundations.

The project also made advances in syntax analysis of programming languages, demonstrating the practicality of a new algorithm called GLL. Within the project, new variants of GLL. These ideas have now been taken up within the software engineering community, where syntax analysis is used to help find insecure code idioms and to measure code quality.
Exploitation Route We shall ourselves move forward with tooling designed to be used by typical software engineers which allows straightforward construction of simple languages and deeper analysis of new programming language constructs in the context of existing languages.

We also expect other groups, some affiliated with this project, to implement these fundamental constructs as an abstraction layer within their formal semantics systems.

Advances have also been made in generalised parsing: in particular we have developed a new method for capturing all possible lexicalisations of a string without requiring a parser to operate at character level (which can be very inefficient). This work is presently being prepared for publication; the PhD student attached to the project has just received his award on the basis of this work.

We expect these tools to significantly improve the productivity and reliability of system design based on Domain Specific Languages, and Meta-Model Driven Design. Our overall hope is that in the future all new language designs will be prototyped using formal specification, and that small Domain Specific Languages will be directly executable from such concise specifications. This will resolve a difficult software maintenance problem since modifying language processors is presently widely felt to be a difficult task.
Sectors Digital/Communication/Information Technologies (including Software),Electronics

Description The tooling developed within this project has been used to implement the software for Leverhulme Project Grant 'Notions and Notations: Babbage's Language of Thought'. This project has featured in a BBC documentary on Ada Lovelace, and in the Ada 200 symposium at Oxford in December 2015. The Plan-28 project also hopes to make use of this material - see their blog at and video from the Ada symposium at (item 15). In the software engineering domain, generalised parsing is now achieving widespread traction for software metrics, trait detection and testing. In all of these areas we see evidence that GLL parsing, as developed within this award and as part of an ongoing research effort, is the algorithm of choice. As evidence we note (1) the keynote speaker at the 2019 Software Language Engineering in Athens who noted these developments as part of a broad review of the field; (2) the development of GLL parsing framework for the Rust language ( which we are told is used as part of the Rust developers grammar tests, (3) the use of GLL parsing within the Sonarsource toolkit ( and Sonarsource builds tools to measure code quality and security in software systems composed from disparate languages and dialects. Generalised parsing is an enabling technology for them. Their toolkit is widely used by many major companies:
First Year Of Impact 2014
Sector Creative Economy,Digital/Communication/Information Technologies (including Software),Education
Impact Types Cultural

Description TU/e research partnership 
Organisation Eindhoven University of Technology
Country Netherlands 
Sector Academic/University 
PI Contribution Joint supervision of students; development of new theory; joint publications
Collaborator Contribution Joint supervision of students; development of new theory; joint publications
Impact Joint papers (see portfolio). Joint submission of EU Marie Curie proposal (unsuccessful)
Title ART translator generator tool 
Description ART is a generalised parser generator which implements the GLL algorithm in its various forms. It is under active development; thevfirst full release (as opposed to beta releases used by our immediate collaboarators in Eindhoven and Swansea) will be in 2014. 
Type Of Technology Software 
Year Produced 2013 
Open Source License? Yes  
Impact ART has been used in a prototype model-based engineering tool under development in Eindhoven, and in use by their commercial collaborators (company confidential).