PLanCompS: Programming Language Components and Specifications

Lead Research Organisation: Swansea University
Department Name: College of Science

Abstract

Software comes in many different shapes and sizes: ranging from games and social networking systems through databases and office software to control system for vehicles and medical instrumentation. Regardless of its purpose, software is almost always written in high-level programming languages, which are significantly easier to use than the low-level machine languages which can be executed directly by computers.Before a program written in a high-level language can be run on a particular computer, the language needs to have been implemented on that computer. The implementation could be a compiler, which translates high-level programs to machine code; alternatively, it might directly interpret them, simulating their intended behaviour.Many hundreds of programming languages have been designed and implemented since the 1950s, and dozens are currently in widespread use. Major ones introduced since 1995 include Java, C#, Python, Ruby, OCaml, Delphi, and VBScript. Older languages evolve to incorporate new features: new versions of Fortran, Cobol, Ada, C++, Scheme and Haskell appear at intervals ranging from one to 10 years. New programming languages are continually being designed and implemented, with the aim of making it easier (or at least quicker and cheaper) for programmers to write useful software. So-called domain-specific languages (DSLs) are designed for use in a particular sector, such as banking or engineering, or particular application areas, e.g., interactive web pages; they are often obtained by extending general-purpose languages with features that correspond closely to standard concepts or notation in the targeted sector.The documentation of a language design is called a language specification. This usually consists of a succinct formal grammar, determining the syntax of the language (i.e., which sequences of characters are allowed as programs, and how they are to be grouped into meaningful phrases), together with a lengthy informal explanation of their semantics (i.e., the required behaviour when programs are run), written in a natural language such as English. Unfortunately, such explanations are inherently imprecise, open to misinterpretation, and not amenable to validation.This project will employ innovative techniques for specifying the semantics of languages completely formally. The main novelty will be the creation of a large collection of reusable components of language specifications. Each component will correspond to a fundamental programming construct, or funcon, with fixed semantics. Translation of program phrases to combinations of funcons determines the semantics of the programs, and specifying this translation is much simpler - and much less effort - than specifying their formal semantics directly. The project will test and demonstrate the advantages of this component-based approach to language specification using case studies involving specification of major programming languages (including C# and Java) and DSLs.Sophisticated tools and an integrated development environment will be designed and implemented by the project to support creation and validation of component-based language specifications. The tools will support automatic generation of correct prototype implementations directly from specifications, allowing programs to be run according to their formal semantics. This will encourage language designers to experiment with different designs before initiating a costly manual implementation of a particular design, which may lead to development of better languages.Funcon and language specifications will be stored in an open-access repository, and a digital library interface will support browsing and searching in the repository. The library will also provide access to digital copies of existing formal specifications of programming languages using previous approaches.

Planned Impact

Ideas for the design of new languages often originate in academia, but implementation is usually left to commercial companies. Sometimes, companies themselves take the initiative, gaining a competitive advantage by developing a language especially well-suited to their particular needs or practices; a notable example is Ericsson, which supports the development of Erlang, an advanced language for distributed system implementation. Other major languages developed primarily in the private sector include FORTRAN (IBM), Java (Sun Microsystems), and C# and F# (Microsoft). The prime example of a major language commissioned by a government agency is Ada, which the US DoD mandated for all its new software from 1987 to 1997; the design and implementation of Ada was carried out by a team at CII Honeywell Bull, including several prominent academic researchers. Regardless of whether practitioners involved in programming language development are in academia, commercial companies, or government agencies, our project will ensure that they have the opportunity to benefit from its results, as follows. * Language designers Designers of programming languages will benefit hugely from our research. Our component-based approach and our highly accessible formal specification frameworks will let them record and change tentative language design decisions efficiently. Our tool support will let them generate and experiment with prototype implementations, bringing aspects of agile modelling to the language design process, and supporting language evolution. Our digital library will provide the opportunity to browse and search for existing components that can be reused without further effort, and to access several complete specifications of existing major programming languages and domain-specific languages. * Language specifiers Currently, practitioners specify language syntax formally, using context-free grammars. However, they generally give only informal specifications of language semantics, i.e., the intended compile-time and run-time behaviour of programs - even in normative language standards. They will benefit from being able to specify semantics formally, which will allow their specifications to be validated, and from the high degree of reusability inherent in our component-based approach, which will dramatically reduce the cost of formal specification. Our digital library will ensure that language specifiers can access our results. We will include major case studies in it, to demonstrate how our approach scales up, and we will analyse the degree of reuse that we obtain. * Language implementers When the designers of a language provide a complete formal specification, the implementers benefit by being able to refer to it to ascertain details of the intended interpretation. Our tools will also allow implementers to check their outputs against those of a generated prototype implementation. For domain-specific languages, implementations generated directly from specifications may often be efficient enough for practical use, thus reducing the need for costly manual implementation. * Programmers The benefit of our research to programmers is indirect, stemming from potential improvements to the quality and efficiency of language design and implementation. However, even minor language improvements could help to avoid costly bugs and delays during software development, and thus have significant economic and societal impact in the long run. * Research assistants and PhD students Our RAs and research students will acquire expertise in practical language specification techniques, and in the use of advanced tools, as well as co-authoring scientific articles and reports on our work. This will make them highly qualified for industrial positions that involve design and implementation of domain-specific languages, as well as for academic careers.

Publications

10 25 50
publication icon
Bach Poulsen C (2014) Programming Languages and Systems

publication icon
Bach Poulsen C (2017) Flag-based big-step semantics in Journal of Logical and Algebraic Methods in Programming

publication icon
Mosses P (2019) Software meta-language engineering and CBS in Journal of Computer Languages

publication icon
Mosses P (2015) Semantics of programming languages: Using Asf+Sdf in Science of Computer Programming

 
Description The project has developed a collection of formally defined fundamental programming constructs (funcons). Each funcon is a potentially highly reusable component of definitions of programming languages. The project has also developed a unified meta-notation (CBS) for defining funcons and programming languages, and modular notions of program equivalence. To support practical use of CBS, the project has developed an IDE, which includes generation of translators from programming languages to funcons, and funcon interpreters.

The usability of CBS and its tool support has been tested with case studies including Caml Light and C#. The Caml Light case study was highly successful, and has already been published. The C# case study revealed some unanticipated issues regarding the many implementation-oriented details of the official language reference, as well as difficulties with defining each language construct independently.
Exploitation Route The work on the project is continuing after the end of funding. A beta-release of a repository of funcon definitions and language specifications has now been made available; it is currently being extended to cover multithreading and distributed processes. Development of tool support is continuing.

When the C# case study has been completed and validated, the collection of funcons and the case studies based on them will be made freely available in a digital library, for use by language developers. The high reusability of the funcon definitions, together with the available tool support and IDE for CBS, should facilitate uptake. The material could also be adopted for use in teaching concepts and theory of programming languages.
Sectors Digital/Communication/Information Technologies (including Software),Education

URL https://plancomps.github.io/CBS-beta/