Interdisciplinary Design and Evaluation of Dependability (INDEED)

Lead Research Organisation: City, University of London
Department Name: Centre for Software Reliability

Abstract

Computers increasingly play vital roles in organisations - e.g., hospitals or factories - which thus become computer-based systems . The dependability of these systems is a major societal concern. In response, EPSRC funded the Dependability Interdisciplinary Research Collaboration (DIRC) between City, Edinburgh, Lancaster, Newcastle and York universities. DIRC was based on the premise that dependability must be studied not as a purely technical issue, but as a socio-technical property of the combination of a computing system with the environments in which it is procured, developed and used. DIRC thus assembled a world class interdisciplinary team of computer scientists, psychologists, sociologists and statisticians, which has achieved substantial results through a rare degree of collaboration between engineering and social sciences.INDEED will build on DIRC's results to address important challenges in extending these results and combining them with current practices, to ensure a real, long-term impact on the design and evaluation of dependable systems. It will apply a multidisciplinary approach in four major research activities:Timing and Structure. This work will further develop DIRC's time band concept for reasoning about processes that unfold on different time scales, from microseconds to days, within a system. We will define an appropriate descriptive language, and extend it to deal with probabilistic relationships between events in different time bands. We will then build a software tool to use in case studies, to validate the use of time bands in structuring dependable systems. Adaptation and diversity. This activity will help designers and assessors of socio-technical systems to address some of the hard problems caused by the difficulty of predicting how people adapt to computers. We will give designers greatly enhanced abilities to analyse quantitatively, control and exploit the phenomena of adaptation and diversity, which although often recognised in informal terms need more thorough and formal treatment. Our focus will be data-rich, knowledge intensive activities that are increasingly supported by automation.Responsibility and trust. Inappropriate allocation or perception of responsibilities, and inappropriate levels of trust in the various system components, are important causes of failure in computer-based systems. This work will support the modelling, management and analysis of responsibility and trust during the design and deployment of such systems, by developing the necessary notations, techniques and software tools.Confidence and Uncertainty in dependability cases. A case is the web of evidence and reasoning through which system dependability is assessed. DIRC defined confidence-based cases, which describe dependability claims together with the degree of confidence that can be had in them. We will produce methods for detailing and structuring cases, using the results of work on time bands; guidance for using more diverse evidence and arguments towards increasing confidence; new interdisciplinary understanding of the factors causing people to trust a case less (or more) than its contents warrant.These activities are integrated into a coherent programme of work. An integration mechanism is the use of real-world case studies where we work with our partners in the project (Voca, British Energy, CAA and Qinetiq) to challenge and validate our research.

Publications

10 25 50
 
Description This is in two areas

1. Guaranteed conservatism in safety claims

This research thread has concentrated on devising means whereby claims can be made about the safety of critical software-based systems that are guaranteed to be conservative (but practically usable).

(Littlewood and Rushby 2012) represents an important breakthrough in the assessment of the reliability of an important class of fault tolerant systems. Many systems, such as nuclear reactor protection systems, have a 1-out-of-2 architecture: if either one of the two channels works correctly then the system works correctly. A major problem in assessing the reliability of these systems arises from the fact that the two channels cannot be assumed to fail independently of one another, so it is not permissible simply to multiply the two channel pfds to obtain the system pfd (probability of failure on demand). You need to know how dependently the channels fail. It turns out that assessing the level of this dependence is as hard as simply assessing the system reliability as a black box, which is known to be infeasible (by orders of magnitude) for the levels of reliability needed for some critical applications. In this work we examined a special - but plausible - architecture in which one of the channels can be made sufficiently simple that it is "possibly perfect". So, for one channel (assumed to be complex) a pfd claim will be made, but for the other a pnp (probability not perfect) claim will be made. We proved that the product of these can be used as an estimate of the system pfd, and this is guaranteed to be conservative. The Associate Editor of TSE (Michael Jackson) wrote: "This paper is excellent.an important contribution to the field." One reviewer wrote: "The importance of [the authors'] achievement cannot be overestimated.....I consider this to be a watershed paper." We apply the result to other important systems, e.g. the common monitored systems in modern aircraft, and to real-time applications .

Related work (Littlewood and Povyakalo 2013) addressed the difficult problem of epistemic uncertainty in this model. The problem here concerns the subjective beliefs of a Bayesian assessor about the two parameters of the model: pfd of channel A; pnp of channel B. It is unlikely that they will be able to represent their beliefs in the required bivariate probability distribution: in particular, experts would find it hard (if not impossible) to express the dependence between beliefs about pfd and pnp. In this work we show how to avoid this problem of dependence. Experts need only express their marginal beliefs about pfd and pnp separately. The price paid here is further conservatism.

Related work involves different strands of probabilistic modelling with the common aim to provide conservative ways of reasoning about system safety in the presence of uncertainty and incompleteness of information. For example, in (Bishop, Bloomfield et al. 2011) we treat the important problem of Bayesian assessment when the assessor's subjective prior belief is limited in extent. Again, the results here are guaranteed to be conservative, but they are not sufficiently so as to be unusable.

In (Littlewood and Povyakalo 2013) we consider a different approach to the problem of assessing the reliability of a 1-out-of-2 system when failures of the two channels cannot be assumed to be independent. An informal approach to this problem assesses the channel pfds conservatively and then multiplies these together in the hope that the conservatism will be sufficient to overcome any possible dependence between the channel failures. In this work we place this kind of reasoning on a rigorous footing. Our rigorous formalism places strong constraints upon the claims that can be made via this kind of "trade-off" reasoning.

(Littlewood and Rushby 2012) and (Bishop, Bloomfield et al. 2011) were each singled out by the Editor-in-Chief of IEEE Trans Software Engineering to be the "Spotlight Paper" for the issue in which they were published.

References

Bishop, P., R. Bloomfield, B. Littlewood, A. Povyakalo and D. Wright (2011). "Towards a formalism for conservative claims about the dependability of software-based systems." IEEE Trans Software Engineering 37(5): 708-717.

Littlewood, B. and A. Povyakalo (2013). "Conservative bounds for the pfd of a 1-out-of-2 software-based system based on an assessor's subjective probability of "not worse than independence"." IEEE Trans Software Engineering 39(12): 1641-1653.

Littlewood, B. and A. Povyakalo (2013). "Conservative Reasoning about the Probability of Failure on Demand of a 1-out-of-2 Software-Based System in Which One Channel Is 'Possibly Perfect'." IEEE Trans Software Engineering 39(11): 1521-1530.

Littlewood, B. and J. Rushby (2012). "Reasoning about the reliability of diverse two-channel systems in which one channel is 'possibly perfect'." IEEE Trans Software Engineering 38(5): 1178-1194.

2. Decision support and human behviour

We continued the study begun in DIRC of a decision aid computer system for medical use, developing both statistical methods for analysis of the effects of computer alerting systems http://openaccess.city.ac.uk/1743/ , producing novel analyses of the detailed decisions of clinicians (on visual features rather than whole cases) (http://openaccess.city.ac.uk/1583/) and a theory of the underlying cognitive mechanisms (http://openaccess.city.ac.uk/384/) applicable to the many current applications of computer alerts. We obtained new insight about limits to the effectiveness of such tools in reducing errors and studied under which conditions they even cause some errors.

We studied the interaction between human behaviour and security algorithms in the case of a cryptography based e-voting system, developing a preliminary assurance case to address the practical need (before adoption of a specific e-voting system) for a complete case demonstrating that the system as a whole has sufficiently high probability of exhibiting the desired properties when in use in an actual election. We showed a possible organisation of a case in terms of four main requirements - accuracy, privacy, termination and 'trustedness'- and show some of the detailed organisation that such a case should have, the diverse kinds of evidence that needs to be gathered and some of the difficulties that would arise
http://openaccess.city.ac.uk/1592/


In the study of the effectiveness of design diversity, we extended
(http://openaccess.city.ac.uk/3027/)
previous models to cover non-independent development
processes for diverse versions. This gave us a rigorous way of
framing claims and open questions about how best to pursue
diversity, and about the effects - negative and positive - of commonalities
between developments, from specification corrections
to the choice of test cases. We obtained three theorems that, under
specific scenarios, identify preferences between alternative ways
of seeking diversity. The insights obtained addressed non-intuitive issues, including
how expected system reliability may be improved by creating
intentional "negative" dependencies
Exploitation Route The work informs:
- development of assurance and safety cases in critical industries by industry and regulators
-- the understanding of the technical and socio-technical in system assurance
-- the architecture of safety critical systems
Sectors Aerospace, Defence and Marine,Digital/Communication/Information Technologies (including Software),Energy,Financial Services, and Management Consultancy,Manufacturing, including Industrial Biotechology,Transport

URL http://www.city.ac.uk/centre-for-software-reliability/research
 
Description We have had a long and close involvement with the UK nuclear industry for almost 20 years. The fact that they have provided unbroken funding for 17 years for our work on diversity attests to its value to them. The industry used diversity in design and implementation long before there were computer systems playing critical safety protection roles. With the introduction of software-based systems, there were attempts to argue that simple notions of independence of failure could be used to support claims for very small system pfd (probability of failure on demand): e.g. for a 1-out-of-2 system, the system pfd could be claimed to be better than 10-6 if each channel were better than 10-3. We showed rigorously that such claims could not be justified, indeed that they were likely to be too optimistic [1, 2]. Our work allowed - indeed required - regulators to impose stricter requirements on nuclear licensees in making safety cases for multi-channel software-based protection systems. Most recently, our results concerning the limitations of what can be claimed for diversity have had an important role in the discussions between regulators and licensees concerning the safety of the protection systems of the proposed new UK reactors1. In a recent communication to Littlewood [Full text of this available on request], Mr Bob Jennings of ONR states: A team based at City University led by Professor B Littlewood has been at the forefront of research work on independence and diversity of nuclear safety systems[this work] has helped to transform many of the fundamental assumptions and mathematical models underpinning the safety analysis of many complex safety critical systems. The Office for Nuclear Regulation (ONR) has used the outcomes from the research work led by Professor Littlewood to assist in its assessment of complex safety systems. For example ONR's work on the assessment of the UK EPR and UK AP1000 was informed by the work undertaken at City University. [It] helped ONR to frame some very challenging questions the result of which was the introduction of a number of significant design changes to improve the safety of both reactors. For example one of the design improvements was the introduction of the Non-Computerised Safety Systems (NCSS) which considerably improved the diversity of the safety systems proposed for the UK EPR. Complementing these important "negative" results has been extensive work on what might be done in the face of these limitations. For example, we have provided guidance on means to achieve diversity between channels (albeit falling short of guaranteeing failure independence) [4]. In addition, from our extensive probabilistic modelling work we have many results that allow system pfd claims to be justified - essentially these results are provably conservative claims based on assumptions that fall short of independence. For example, in the case of a 1-out-of-2 system in which one channel is "possibly perfect" we have proved that the system pfd is better than the product of channel A's pfd and channel B's pnp [6]: this kind of reasoning may be used in the EPR (European Pressurised Reactor) safety case, where a simple possibly-perfect third channel is to be added to the originally proposed system2. The nuclear industry wants our DISPO project results to be disseminated to practising safety engineers, and will be funding an extensive technology transfer programme under the project, starting in 2014. The beneficiaries of this research are, most obviously, regulators and licensees of UK nuclear plant. But outside the UK, our diversity work has received recognition in the US nuclear industry.3 In addition, of course, there is benefit to the general public, not only in ensuring that reasoning about the safety of nuclear plant is rigorous and valid, but that it is seen to be so in order that safety claims are widely - and justifiably - believed.
First Year Of Impact 2012
Sector Aerospace, Defence and Marine,Energy
Impact Types Economic,Policy & public services

 
Description Brookhaven National Laboratory
Amount £30,000 (GBP)
Organisation Brookhaven National Laboratory 
Sector Public
Country United States
Start  
 
Description Brookhaven National Laboratory
Amount £30,000 (GBP)
Organisation Brookhaven National Laboratory 
Sector Public
Country United States
Start  
 
Description Civil Aviation Authority (CAA)
Amount £20,000 (GBP)
Organisation Department of Transport 
Department Civil Aviation Authority (CAA)
Sector Public
Country United Kingdom
Start  
 
Description Civil Aviation Authority (CAA)
Amount £20,000 (GBP)
Organisation Department of Transport 
Department Civil Aviation Authority (CAA)
Sector Public
Country United Kingdom
Start 01/2014 
End 03/2015
 
Description EDF Energy
Amount £110,000 (GBP)
Funding ID XXX 
Organisation EDF Energy 
Sector Private
Country United Kingdom
Start  
 
Description EDF Energy
Amount £110,000 (GBP)
Funding ID XXX 
Organisation EDF Energy 
Sector Private
Country United Kingdom
Start  
 
Description Leverhulme Trust
Amount £239,175 (GBP)
Funding ID F/00 353/H 
Organisation The Leverhulme Trust 
Sector Charity/Non Profit
Country United Kingdom
Start  
 
Description Leverhulme Trust
Amount £239,175 (GBP)
Funding ID F/00 353/H 
Organisation The Leverhulme Trust 
Sector Charity/Non Profit
Country United Kingdom
Start