Socio-technical resilience in software development (STRIDE)

Lead Research Organisation: The Open University
Department Name: Faculty of Sci, Tech, Eng & Maths (STEM)

Abstract

This project (STRIDE) addresses the issue of how to make software development more resilient to constant changes of technology, staff, methods, requirements, expectations, regulations and more. The specific problem for this project is to characterise how automation can best be used to improve socio-technical resilience. The solution, based on interdisciplinary research, will be to provide: instruments for organisations to assess their resilience; and case studies, best practices, guidance and a concrete example (from automated fault localisation) to understand how humans and tools can best work together. In addition, we will advocate for a positive image for software engineering.

So, STRIDE will investigate resilience and automation in the socio-technical system that supports software development, a system that includes people (engineers, users, managers), technical infrastructure (tools, development environments), processes (lean, requirements elicitation) and artefacts (code, wiki, coding standards). Breakdowns in socio-technical systems can cause significant disruption and Resilience Engineering aims to avoid them by emphasising what works, so that resilience can be preserved. From this perspective, resilience is defined as the productive tension between stability and change, always with the aim of producing systems that are "safe". This view of socio-technical systems is pertinent to modern software engineering where change has become endemic: with changing requirements, advanced technologies, complex infrastructure and new security threats. In addition to the constantly changing environment, software production is increasingly being automated, which requires repeated re-balance of this tension. But what is the relationship between resilience and automation?

While improvements to software development brought by automation are vital to keeping software safe and secure, automation is not a silver bullet. It is said that "Making a system safer involves coupling the capabilities of humans with the technology they work with so that they can stay in control". What does that mean for software development? Is there something fundamentally human that needs to be retained as part of the software development process? And if so, how can a productive and resilient balance between human control and automation be maintained in the context of constantly increasing automation? How can automation be used to increase socio-technical resilience and what will be the impact on resilience of different levels of automation?

STRIDE aims to address these and related questions. The project will determine and operationalise factors that indicate socio-technical resilience (STR) of software development, drawing on social psychology and resilience engineering, and grounding the research in the concrete development task of automated fault localisation. We will engage with representatives of two developer communities: commercial software engineers and professional end user developers who represent two different development environments. This work will have particular implications for improving STR and the pace and nature of automation in the software development lifecycle.

Planned Impact

Engagement and impact in the proposed project are a substantive thread running through each work package and will span a range of disciplines, and practitioner communities. Academic disciplines include software engineering, resilience engineering and social psychology; practitioner communities include software engineers in commercial settings, professional end user developers, and project managers. Our advocacy activities will impact a wider range of stakeholders including the general public.

We will use traditional channels such as journals, conferences and workshops for achieving academic impact. Our non-academic impact is distinctive in its focus to achieve high impact in both commercial software engineering but also in research software engineering, an under-researched community of software developers. Research software engineers, and professional end user developers more generally (e.g. in accountancy, insurance, nuclear engineering etc) write software to enable other activity to take place. Research Software Engineers, for example, are typically PhD students and post docs writing software to enable science to progress. In addition to addressing the general difficulties common to all software development projects, research software must represent, manipulate, and provide data for complex theoretical constructs. This research will help research software engineers and those they work with to recognise and improve their socio-technical resilience.

Through our advisory board and advocacy activities with a wider range of stakeholders and with the general public, the impact of our project will extend beyond the research results themselves, as we will engage through media (TV, blogs, online learning resources), practitioner events (MeetUps, conferences, specialist workshops) and through our links with policymakers (the Software Sustainability Institute, National Cyber Security Centre) to promote a rounded view of software engineering and what it achieves. Sometimes, people only see the problems connected with the software we use, and forget the wonderful things that software enables, and we will attempt to re-dress that image.
 
Title Fireside chat in Information Matters 
Description This is an interview between Shalini Urs and Helen Sharp discussing socio-technical resilience in software engineering, and the role of professional developers in keeping systems secure. 
Type Of Art Film/Video/Animation 
Year Produced 2022 
Impact The ideas internally to the project evolved through this discussion. 
URL https://informationmatters.org/2022/09/building-socio-technical-resilience-in-software-development-e...
 
Title STRIDE Lightning talk 
Description Lightning talk slide describing the results of a study examining how research software engineers working conditions have been affected by the COVID-19 pandemic 
Type Of Art Artefact (including digital) 
Year Produced 2021 
Impact Dissemination to RSEs, leading to good engagement with survey in 2022 
URL https://ssi-cw.figshare.com/articles/presentation/STRIDE_-_Caroline_Jay/14331242
 
Description The feasibility of applying resilience engineering (RE) approaches and terminology to qualitative data of everyday professional practice, in both air traffic management (ATM) and software development, has been explored. This application of RE differs from that traditionally used because it focuses on individuals and teams rather than organisational or industry-level activity. The results so far have indicated that applying RE this way can identify elements of potential resilient practice. Furthermore, once these elements have been identified they can be used to query potential automation decisions, and the impact of automation decisions on resilient practice.
Exploitation Route Academic colleagues may use the advances to apply the techniques in their own research. Through the WREN project this work may help inform automation decisions in ATM.
Sectors Digital/Communication/Information Technologies (including Software)

 
Title STRIDE Research Software Engineering COVID-19 interview study dataset and materials. 
Description This dataset contains results from an interview study deployed between April and June 2020 to understand the changing situation in research software engineering work environments as a result of the COVID-19 pandemic. The study took place over an eight-week period, during which 17 self-identified research software engineers (RSEs) recorded their thoughts about the impact of the pandemic on their work and lifestyles. Each weekly entry included a series of questions based on the agile software engineering retrospective, a technique used within agile teams to look back on previous work. The first week followed a basic retrospective format, asking participants to assess what went well and didn't go well, and to identify areas that could be improved going forward. To encourage ongoing participation, questions in subsequent weeks were adapted from creative retrospective plans designed by agile practitioners. An invitation to take part was issued via various international RSE social media channels in two batches, resulting in 11 participants starting in the week commencing on the 6th of April, and six starting in the week of the 20th of April. In total, 17 participants responded to the invitation; 15 agreed to participate after the first week. Participants were sent an email each week inviting them to complete a diary entry for a total of eight weeks; data were collected through a survey deployed via JISC's Online Surveys.The consent form and a pdf of the first week of questions are included in the materials to provide an example of how the survey was administered. The entry week and questions are reported in full in columns A and B in the spreadsheet accordingly. To avoid identification of individuals, demographic information and some contextual information has been redacted. Redactions are indicated by *** in the response.The study was conducted as part of the STRIDE project: https://stride.org.uk. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact The results were used as a basis for the workshop paper at CSCW 2021, and for conversations at Research Software Engineer gatherings 
URL https://figshare.manchester.ac.uk/articles/dataset/STRIDE_Research_Software_Engineering_COVID-19_int...
 
Title STRIDE Research Software Engineering COVID-19 interview study dataset and materials. 
Description This dataset contains results from an interview study deployed between April and June 2020 to understand the changing situation in research software engineering work environments as a result of the COVID-19 pandemic. The study took place over an eight-week period, during which 17 self-identified research software engineers (RSEs) recorded their thoughts about the impact of the pandemic on their work and lifestyles. Each weekly entry included a series of questions based on the agile software engineering retrospective, a technique used within agile teams to look back on previous work. The first week followed a basic retrospective format, asking participants to assess what went well and didn't go well, and to identify areas that could be improved going forward. To encourage ongoing participation, questions in subsequent weeks were adapted from creative retrospective plans designed by agile practitioners. An invitation to take part was issued via various international RSE social media channels in two batches, resulting in 11 participants starting in the week commencing on the 6th of April, and six starting in the week of the 20th of April. In total, 17 participants responded to the invitation; 15 agreed to participate after the first week. Participants were sent an email each week inviting them to complete a diary entry for a total of eight weeks; data were collected through a survey deployed via JISC's Online Surveys.The consent form and a pdf of the first week of questions are included in the materials to provide an example of how the survey was administered. The entry week and questions are reported in full in columns A and B in the spreadsheet accordingly. To avoid identification of individuals, demographic information and some contextual information has been redacted. Redactions are indicated by *** in the response.The study was conducted as part of the STRIDE project: https://stride.org.uk. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
Impact The results were used as a basis for the workshop paper at CSCW 2021, and for conversations at Research Software Engineer gatherings 
URL https://figshare.manchester.ac.uk/articles/dataset/STRIDE_Research_Software_Engineering_COVID-19_int...
 
Description WREN 
Organisation National Air Traffic Services Limited
Country United Kingdom 
Sector Private 
PI Contribution We are applying resilience engineering ideas to ATM
Collaborator Contribution Access to ATM information and discussions of resilience engineering and its application
Impact So far we have progressed understanding of how to apply resilience engineering theories and approaches to everyday operational contexts, rather than at organisational or industry level, which is more common. We have developed a framework for applying these ideas and have engaged in discussions and data gathering sessions within ATM and with the internal R&D team. The framework results have been encouraging. Building on this work we have applied elements of this framework within the wider work in STRIDE that focuses on software engineering. A publication due out in May captures the application of RE to software engineering. The final report of the WREN project is currently being written. Key outcomes include • A framework for identifying and linking evidence of resilient performance to automation decisions • Demonstration and recommendation about different techniques that can be used to support this process • Identified challenges in representing human activity within automation decisions
Start Year 2021
 
Description IT 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact Presentation to software development professionals regarding the project and asking for their engagement with it.
Year(s) Of Engagement Activity 2022
 
Description WREN 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Professional Practitioners
Results and Impact We have conducted seven sessions of different lengths with ATM professionals in the course of our collaboration through WREN. Each session included between 5 and 8 individuals
Year(s) Of Engagement Activity 2022,2023