The Software Sustainability Institute: Phase 2

Lead Research Organisation: University of Edinburgh
Department Name: Edinburgh Parallel Computing Centre

Abstract

Modern research is impossible without software. From short, thrown-together temporary scripts to solve a specific problem, through an abundance of complex spreadsheets analysing collected data, to the hundreds of software engineers and millions of lines of code behind international efforts such as the Large Hadron Collider and the Square Kilometre Array, there are few areas of research where software does not have a fundamental role.

Further, this is not just research based on the "traditional" users of computational infrastructure. Data science and "big data" would not be possible without software to access, analyse, visualise, send and store that data. Software use is also not restricted to the physical sciences: the use of research software is even across all disciplines, with 68% of researchers reporting that their research would be impossible without research software. The capacity of all researchers, in the UK and worldwide, to generate new insights depends on the availability of research software and the ability of researchers to use it (better software, better research).

During the next phase of the Institute, we will focus on the domains of each of the Research Councils (particularly BBSRC, EPSRC and ESRC) and help their researchers gain the most from available services to improve the sustainability, engineering, reuse, quality and recognition of software.

To deliver this, the Institute's work is split into five themes:
Community: bringing people together via events and networks to identify, understand and facilitate solutions for common challenges;
Policy: research into the social, economic and technical drivers of the research software community, understanding its needs, and then working to enact the required changes through campaigns;
Research Software: working directly with researchers who are developing software to ensure it meets the needs of reliable, reproducible and reusable research;
Training: coordinating, defining, and delivering training on software development and data science skills to UK research organisations, and working to build a sustained training platform;
Communications: ensuring the work of the Institute is disseminated to the widest possible audience, and working with collaborators to amplify the impact of our work.

With the help of additional studies extending the work above, we will conduct research-council specific campaigns tailored to the needs of the researchers to help increase the uptake of services (not just those offered by the Institute, but those funded by the research councils) that are already available and define services which are lacking. This will not only help researchers from all research domains acquire the skills they need for modern research (helping researchers help themselves), but will also ensure that the lessons learned within one domain are transferred to the others - keeping the UK at the forefront of world-leading research.

Planned Impact

The SSI's aim is to impact the research community through effective use and sustainability of software - Better Software Better Research - to foster global economic performance, prevent wasteful reinvention, improve returns on initial research investments and ultimately improve the competitiveness of the UK by making our researchers more innovative and productive.

Our very structure, organisation and work plan is based on five pathways to achieve this impact: research community engagement; stakeholder policy; sustaining key software; capacity building through training; and communications with all stakeholders, including the public. Our model of delivering software sustainability through collaboration and knowledge transfer between engineers and researchers achieves value and impact. A range of established schemes - Fellowships, workshops, engagement, consultancy, training, networks - ensure that researchers and research software engineers benefit.

Academic researchers will benefit from software - their own and that developed by others - that can be relied on and be used as the basis of their research. Fellowships and access to training in computational and data skills present opportunities to become more effective researchers. Researchers and developers who make software will have a facility to assist them with the maintenance, expansion, exploitation and community development of their codes for the benefit of themselves and others in the UK, and to make a wider impact internationally.

Commercial and public sector researchers will have access to more robust software from the research sector, with the potential and incentive to re-contribute. We will maximise the ability of researchers to take-up software developed by others for the benefit of the UK as a whole. With our industry partners we will pursue the exploitation of software to gain additional value from the commercial and public sector. The commercial sector will benefit from access to people who have gained skills required in industry, improving the ease which researchers and software engineers can transfer to the sector.

Policy makers will benefit from our:
- research into the demographics of the research software community and its economic impact, and by our representation of this community to inquiries organised by policy stakeholders, (e.g. UK government, Research Councils and other funding organisations) and internationally (e.g. NSF, NIH);
- contribution to specific policies relating to software, including software in scholarly communication and software management and accreditation;
- support of software which is used to define policy, e.g. software to support decision makers in the areas of climate change, social mobility and changing populations, and pollution policy.

The wider public will benefit:
- directly from interacting with the research software community via our online debates, gaining a greater understanding of the impact of research software, its role in research, and the challenges it faces;
- indirectly from the research made possible through the use of reliable and reproducible software, e.g. in biofuels, fusion energy and drug discovery where we have worked directly with leading research groups.

We will leverage our connections with international organisations to amplify the impact of the software we help sustain. We will promote the importance of software sustainability using our strong presence in: technical standards bodies; scientific standards initiatives; major scientific networks; major international projects impacting UK communities in Europe and the USA and major international initiatives. We will build on our experience and encourage and assist key scientific software groups to adopt better development methods, bring together islands of expertise to create critical mass in the community, foster the integration of similar software products, and facilitate a fuller dialogue between developers and users.
 
Description We have developed new methods and materials for improving and sustaining research software through mechanisms includes advice and guidance, training, policy, consultancy and community building. This has led to an improvement in adoption of good practice for developing research software, and a cultural change in the way that software is regarded in research.
Exploitation Route Our findings are made available under Creative Commons and Open Source licenses to allow for maximum reuse. Some of our findings have been picked up by the media as part of articles, by other organisations involved in computational infrastructure, and by international initiatives such as Software Carpentry and Data Carpentry.
Sectors Digital/Communication/Information Technologies (including Software)

URL http://www.software.ac.uk/
 
Description Our findings are being used by industry to understand career paths for Research Software Engineers. This includes the use of data collected as part of our international survey of Research Software Engineers, which has been used in the 2017 Research Software Engineer State of the Nation report, as well as the administrative and logistical support provided to help with the establishment of the UK RSE Association, Society of Research Software Engineering, and similar initiatives in Germany, the Netherlands, Canada and the USA.
First Year Of Impact 2017
Sector Digital/Communication/Information Technologies (including Software)
Impact Types Societal,Economic,Policy & public services

 
Description Chair of EPSRC e-Infrastructure Strategic Advisory Team (Neil Chue Hong)
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
 
Description Computational Science Centre for Research Communities (Steering Committee)
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
 
Description FAIR in Practice report (Carole Goble)
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
Impact Better understanding of the underlying practices of research, demonstrating considerable evidence that good practices exists in findability, accessibility, interoperability and reusability. In many cases this was both well established (over many years) and continually improving.
URL https://zenodo.org/record/1245568
 
Description First Signatory to European Open Science Cloud Declaration
Geographic Reach Europe 
Policy Influence Type Participation in a national consultation
URL https://ec.europa.eu/research/openscience/pdf/list_of_institutions_endorsing_the_eosc_declaration.pd...
 
Description Membership (Carole Goble) of ESFRI DIGIT Working Group
Geographic Reach Europe 
Policy Influence Type Participation in a advisory committee
URL https://www.esfri.eu/strategy-working-group-data-computing-and-digital-research-infrastructures
 
Description NIH Request for Information on Strategies for NIH Data Management, Sharing, and Citation
Geographic Reach North America 
Policy Influence Type Participation in a national consultation
 
Description OECD Expert Group on Digital Skills for Science (Neil Chue Hong)
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a advisory committee
URL https://www.innovationpolicyplatform.org/digital-skills-data-intensive-science-oecd-project
 
Description Open Research Data Task Force (Carole Goble and David De Roure)
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
URL https://assets.publishing.service.gov.uk/government/uploads/system/uploads/attachment_data/file/7750...
 
Description Turning FAIR into reality - RSE Case Study
Geographic Reach Europe 
Policy Influence Type Citation in other policy documents
Impact Showed how the way that the Software Sustainability Institute supported the Research Software Engineering workforce has led to better software engineering support for research software, improving accessibility of research services to the public.
URL https://publications.europa.eu/en/publication-detail/-/publication/7769a148-f1f6-11e8-9982-01aa75ed7...
 
Description UKRI e-Infrastructure Advisory Board (Carole Goble)
Geographic Reach National 
Policy Influence Type Participation in a advisory committee
 
Description UNESCO/INRIA Expert Group On Software Source Code As Heritage For Sustainable Development (Neil Chue Hong)
Geographic Reach Multiple continents/international 
Policy Influence Type Participation in a advisory committee
URL https://en.unesco.org/foss/paris-call-software-source-code
 
Description Research Data Shared Service Pilots
Amount £30,000 (GBP)
Organisation Jisc 
Sector Public
Country United Kingdom
Start 08/2017 
End 07/2018
 
Description Standard Research
Amount £80,263 (GBP)
Funding ID EP/N028902/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2016 
End 01/2019
 
Description The UK Software Sustainability Institute: Phase 3
Amount £6,599,477 (GBP)
Funding ID EP/S021779/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 12/2018 
End 11/2023
 
Title International RSE Survey 
Description In 2018 we created a single consolidated survey for all countries (rather than one survey for each country), to assess Research Software Engineers across the world. 
Type Of Material Improvements to research infrastructure 
Year Produced 2018 
Provided To Others? Yes  
Impact This enables us to better understand, compare and improve the working conditions for RSEs across the world. 
URL https://github.com/softwaresaved/international-survey/
 
Title 2016 UK RSE Survey 
Description Anonymised release of data on the demographics, job satisfaction, and practices of Research Software Engineers collected via survey in 2016. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact Better understanding of job satisfaction and skills of research software engineers in the UK. 
URL https://github.com/softwaresaved/international-survey/blob/master/analysis/2016/uk/data/public_data....
 
Title 2017 International RSE Survey 
Description Public release of data relating to the 2017 international surveys of research software engineers. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Not known at present. 
URL https://zenodo.org/record/1194669#.WqahZ5PFKL4
 
Title 2017 UK RSE Survey 
Description Anonymised release of data on the demographics, job satisfaction, and practices of Research Software Engineers in the UK collected via survey in 2017. 
Type Of Material Database/Collection of data 
Year Produced 2017 
Provided To Others? Yes  
Impact Better understanding of practices and working conditions for UK RSEs. Comparison against similar surveys in Germany, Netherlands, South Africa and USA. 
URL https://github.com/softwaresaved/international-survey/blob/master/analysis/2017/uk/data/public_data....
 
Title 2018 International RSE Survey 
Description Public data related to 2018 International Research Software Engineers Survey. We publish the results under the form of notebooks. All surveys have an attached 'public.csv' file. These files have been cleaned of all sensitive data. Therefore, the jupyter notebooks show some results that are not contained in the 'public.csv'. The base questions for the survey were tailored to meet the requirements of each country. They covered ten subjects: Demographics: traditional social and economic questions, such as gender, age, salary and education. Coding: how much code do RSEs write, how often, and for whom. Employment: questions about where RSEs work and in which disciplines. Current contract: understanding stability of employment by questioning the type of employment contract RSEs receive. Previous employment: understanding routes into the profession the reasons for choosing it. Collaboration and training: who RSEs work with, how many people they work with, and the training they conduct. Publications: do RSEs contribute to publications and are they acknowledged? Sustainability and tools: testing, bus factor, technical handover. Also which tools they are using Job satisfaction: what do RSEs think about their job and their career? Network: how do RSEs meet and gain representation? These subjects are not necessarily investigated under this order, neither published with that order. 
Type Of Material Database/Collection of data 
Year Produced 2018 
Provided To Others? Yes  
Impact Not known at present. 
URL https://github.com/softwaresaved/international-survey/
 
Title Software used in research based on combined surveys 
Description The combined results of five surveys run by the Software Sustainability Institute, which were run between 2014 to 2016. The data relate to 1261 survey participants who were asked "What software do you use in your research?". The data are described here: https://www.software.ac.uk/blog/2016-08-13-quick-and-dirty-analysis-software-being-used-research-python-matlab-and-r 
Type Of Material Database/Collection of data 
Year Produced 2016 
Provided To Others? Yes  
Impact Use of data in other studies. 
URL https://zenodo.org/record/60276#.WMkpaxKLSis
 
Title The Software Sustainability Institute's Collaborations Workshop 2015 (CW15) attendees computational tools dataset 
Description Contains the question, raw data, and cleaned data for producing the most used software word cloud for those who attended the Software Sustainability Institute's Collaborations Workshop 2015 (CW15) held at the Oxford e-Research Institute, Oxford, UK from 25-27 March 2015 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact Unknown. 
URL https://zenodo.org/record/19828#.WMkp_hKLSis
 
Description Data Carpentry 
Organisation Data Carpentry
Country United States 
Sector Charity/Non Profit 
PI Contribution Coordination of Data Carpentry training events in the UK. Training of Data Carpentry instructors. Contribution of training material.
Collaborator Contribution Production of training materials. Provision of central administrative infrastructure.
Impact Multi-disciplinary. Training of hundreds of researchers in basic data management and analysis skills.
Start Year 2015
 
Description International Coalition on Science Gateways 
Organisation The National eResearch Collaboration Tools and Resources project
Sector Public 
PI Contribution We have entered into the International Coalition on Science Gateways, administered by Nectar in Australia, as part of our work with Nectar and the US Science Gateways Institute to promote best practice in the development of science gateways.
Collaborator Contribution Our partners are some of the primary supporters of science gateways in their respective countries. Their contributions have been through running of requirement workshops and development of best practice which we are able to utilise.
Impact Paper submitted but not yet published.
Start Year 2016
 
Description Software Carpentry Foundation 
Organisation Software Carpentry Foundation
Country United States 
Sector Charity/Non Profit 
PI Contribution We are acting as the UK coordinators for Software Carpentry courses to teach researchers computing skills. Neil Chue Hong and Carole Goble were invited to join the board of the SCF.
Collaborator Contribution The SCF provides materials and organises instructor training. We have therefore benefitted from the resources developed by partners within the Software Carpentry Foundation.
Impact Over 30 workshops and 1000 learners trained in the UK across multiple disciplines.
Start Year 2012
 
Description Software Preservation Network 
Organisation Software Preservation Network
Country United States 
Sector Charity/Non Profit 
PI Contribution Collaboration on Software Preservation Network's Training and Education Working Group to develop resources around software preservation. Also input into SPN's future strategic direction through steering committee.
Collaborator Contribution Collaboration on Software Preservation Network's Training and Education Working Group to develop resources around software preservation.
Impact This collaboration is multi-disciplinary. Outputs have not yet been published.
Start Year 2018
 
Description The Carpentries 
Organisation The Carpentries
Country United States 
Sector Charity/Non Profit 
PI Contribution Coordination of Carpentries training events in the UK. Training of Carpentries instructors. Contribution of training material. Facilitation of development of new courses for social sciences and life sciences.
Collaborator Contribution Production of training materials. Provision of central administrative infrastructure. Governance of open source materials production. Organisation of international workshops.
Impact Multi-disciplinary. Training of hundreds of researchers in basic software engineering and data management and analysis skills.
Start Year 2018
 
Title Checklist for a Software Management Plan (source code) (Version 1.0) 
Description A Software Management Plan (SMP) can help you to define a set of structures and goals to understand your research software including what you are going to develop; who the software is for (even if it is just for yourself); how you will deliver your software to its intended users; how it will help them; and how you will assess whether it has helped them, and contributed to research, in the ways that you intended. An SMP also helps you to understand how you can support those who wish to, or do, use your research software; how your software relates to other artefacts in your research ecosystem; and how you will ensure that your software remains available beyond the lifetime of your current project. This checklist will help you to write an SMP. It consists of sections that cover the key elements that an SMP should include. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Checklist implemented in France via OPIDoR: https://dmp.opidor.fr/ 
URL https://github.com/softwaresaved/software-management-plans
 
Title CodeMeta Schema Version 1.0 
Description CodeMeta contributors are creating a minimal metadata schema for science software and code, in JSON and XML. The goal of CodeMeta is to create a concept vocabulary that can be used to standardize the exchange of software metadata across repositories and organizations. CodeMeta started by comparing the software metadata used across multiple repositories, which resulted in the CodeMeta Metadata Crosswalk. That crosswalk was then used to generate a set of software metadata concepts, which were arranged into a JSON-LD context for serialization (see codemeta.jsonld, or an example CodeMeta document). 
Type Of Technology Software 
Year Produced 2016 
Open Source License? Yes  
Impact Ability to crosswalk between metadata generated by popular code repositories and catalogues. 
URL https://github.com/CodeMeta/codemeta
 
Title International RSE Survey analysis software 
Description Collaboration tool to create surveys and analyse data about Research Software Engineers around the world This software is used to create and analyse international surveys. It use csv files to store questions and answers that are later transformed into a limesurvey TSV file. The analysis are using python and are shared within jupyter notebooks. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact In 2016 the Software Sustainability Institute ran the first survey of Research Software Engineers (RSEs) - the people who write code in academia. This produced the first insight into the demographics, job satisfaction, and practices of RSEs. To support and broaden this work, the Institute will run the UK survey every year and - it is hoped - will expand the survey so that insight and comparison can be made across different countries. Ultimately, we hope that these results, the anonymised version of which will all be open licensed, will act as a valuable resource to understand and improve the working conditions for RSEs. In 2017 we also surveyed Canadian RSEs and we added four further countries, Germany, Netherlands, South Africa and USA. Our thanks to our partners: Scott Henwood (Canada), Stephan Janosch and Martin Hammitzsch (Germany), Ben van Werkhoven and Tom Bakker (Netherlands), Anelda van der Walt (South Africa) and Daniel Katz and Sandra Gesing (USA). In 2018 we have worked differently and created a survey for all countries (rather than one survey for each ones). 
URL https://github.com/softwaresaved/international-survey
 
Title ResearchFish outcome analysis software 
Description This is a first pass at understanding software-related research outcomes recorded in ResearchFish. The data used has not been released, but can be downloaded from Gateway to Research. The forthcoming release will include data as well as software. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Unknown at present. 
URL https://github.com/softwaresaved/ResearchFish
 
Title Software Assessment Framework 
Description The Software Assessment Framework is a pilot implementation to make it easier for developers to understand the "quality" of a piece of research software, which in turn will allow them to improve software reuse and increase recognition for good software development practice. 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Insights achieved from this pilot implementation led to our acceptance as a founding partner of the Community Health Analytics Open Source Software (CHAOSS) initiative hosted by the Linux Foundation. 
URL https://github.com/softwaresaved/software-assessment-framework
 
Title Software Deposit Guidance for Researchers (source code) (Version 1.0) 
Description esearch software is an integral part of the modern research ecosystem. Taken together, research software, alongside data, facilities, equipment and an overarching research question can be viewed as a research activity or experiment, worthy to be published. Conversely, a publication can be considered as a narrative that describes how the research objects are used together to reply to the research question. Depositing research software into a digital repository can offer significant benefits. By depositing not just papers, but software, and data sets, as well, researchers can store a more complete record of this ecosystem for future use to both the researchers who undertook the research and also the wider research community. Making research software available allows other researchers to inspect, replicate, reproduce and reuse the research, as manifested in the software, in the short term and to inspect, for the historical record, in the long term. It allows research software to remain available beyond the lifetime of any current project, or a researcher's current employment at a specific institution. Digital repositories can also provide unique persistent digital identifiers for software which can be cited and help researchers to get attribution and credit for their research software when it is used by others. The Software Sustainability Institute, funded by Jisc, developed a set of complementary guides covering the main aspects of depositing software into digital repositories. These guides are intended for researchers, principal investigators and research leaders and research data and digital repository managers. This deposit holds the sources of these guides, used to generate PDF and HTML (online) versions of the guides, and a PDF of the index page of the online version of the guides. 
Type Of Technology Software 
Year Produced 2018 
Open Source License? Yes  
Impact Cited in guidance from international research communities. 
URL https://github.com/softwaresaved/software-deposit-guidance
 
Title Software Management Plans 
Description An extension of the DMPOnline webtool to allow for the creation and management of software management plans. 
Type Of Technology Webtool/Application 
Year Produced 2016 
Impact This has enabled the widely referenced DMPOnline tool developed and hosted by the Digital Curation Centre to be applied to software, and forms the basis of upcoming guidance for research funder software calls. 
URL https://www.software.ac.uk/software-management-plans
 
Title lowFAT 
Description The Software Sustainability Institute's low effort Fellowship Administration Tool (lowFAT) https://softwaresaved.github.io/lowfat/ 
Type Of Technology Software 
Year Produced 2017 
Open Source License? Yes  
Impact Unknown at present. 
URL https://github.com/softwaresaved/lowfat
 
Title uCONFLY 
Description uCONFLY is an unconference resource management system. It provides document templates for a range of resource types and a means to allow event attendees to created documents based upon the templates made available to them. The production deployment of uCONFLY is hosted at https://uconfly.org/. 
Type Of Technology Webtool/Application 
Year Produced 2018 
Impact Used in various unconference workshops - still in beta phase of rollout. 
URL https://uconfly.org/
 
Description Big Bang Fair 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Schools
Results and Impact Participation in Big Bang Fair, national schools science event attended by tens of thousands. Creation of materials providing guidance on ways to learn coding for school children and parents. Approximately 800 people picked up promotional materials.
Year(s) Of Engagement Activity 2016,2017
 
Description Dagstuhl Workshop: Implementing FAIR Data Infrastructures 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact "Open Science" reflects a fundamental paradigm shift in making scientific research more accessible and reusable. The Open Science movement has recently gained a strong momentum worldwide with an increasing demand for reliable and sustainable research data infrastructures that enable researchers to cooperate on data and share results. On the European level, for example, the European Open Science Cloud is being developed. In this context the so-called FAIR principles seem to become a common and widely accepted conceptual basis for future research data infrastructures.

These principles describe the core characteristics of data use, but they do not define or suggest any technological implementations. Thus, there is still a great lack of models, infrastructures and services available showing how FAIR research data infrastructures can be implemented. Computer science can greatly contribute to this important field.

The interdisciplinary Dagstuhl Perspectives Workshop "Implementing FAIR Data Infrastructures" aims at bringing together computer scientists with digital infrastructure experts from different domains. The central goal is to discuss the key elements required for the transition of scientific e-infrastructures and services to Open Science from the perspective of computer science as well as from different stakeholder perspectives. The workshop will discuss requirements, common mechanisms, and best practices for FAIR data management platforms and services, with a particular focus on innovative tools and methods for sharing research data and on principles of how research data can be represented in an interoperable way to foster linking and reusing data across community borders.

The Workshop will publish a manifesto of recommendations and an inventory of tools & services for implementing FAIR data principles and other targeted aspects of Open Science in future research data infrastructures and data management services. It will further shape the role that the field of computer science has to play in advancing Open Science practices.
Year(s) Of Engagement Activity 2018
URL https://www.dagstuhl.de/en/program/calendar/semhp/?semnr=18472