Approximate product-forms and reversed processes for performance analysis (APROPOS)

Lead Research Organisation: Imperial College London
Department Name: Dept of Computing

Abstract

The need for models in the quantitative design of complex computer and communicaton systems is indisputable but their construction is hampered by the lack of a uniform methodology for building large models from smaller component-models - mirroring the design process of the systems themselves. Various specific, often ad hoc, techniques have been developed over the last four decades in response to contemporary design features, beginning with queueing networks that modeled multiprogrammed mainframe systems through to network-calculus descriptions of mobile networks and stochastic models of telecommunication systems and the internet. Unfortunately, such models are prone to exponential (or higher) growth in computational complexity, a phenomenon often referred to as state-space explosion .The present proposal aims to provide techniques and tools for deriving efficient, mainly approximate, solutions to models of modern networks. As in almost all analyses of such complex systems, the aim is to find separable solutions, which allow subsystems (components) to be solved separately and their solutions to then be combined in a simple way. More specifically, we propose to develop the recent significant results obtained by the proposer and Dr Andrea Marin through a three-year project supporting Marin as the RA. The said results have led to a number of research and tutorial papers that have been accepted and/or presented at the highest quality international venues in our research area. It is the proposer's view that the research is at a knee in an upward curve and the opportunity to work directly with a rising-star such as Marin is one not to be missed if at all possible; in addition to the above papers and tutorials, Marin has won best paper awards at two international conferences for his work in this area in the last 12 months. Furthermore, the presentation at Sigmetrics 2010 in New York was well received and led to several discussions with leading international researchers and plans for specific collaboration - see the Case for Support. The theoretical research proposed will supplement existing analytical techniques, such as queueing network modeling (QNM), which are still relevant but are lacking in expressive power for modeling today's systems. For example, Stochastic Petri Nets (SPNs) are suitable for describing virtual resource systems, as used in cloud computing for example, which is not the case for standard queueing networks. Similarly, (non-standard) networks with batch movements are important in models of energy-efficient systems (see the Case for Support), but in general do not have separable (or other efficient) solutions; preliminary results relating to this will be presented as a poster at Performance'10 in November. Last, optimisation is facilitated by the ability of our unique approach, using the Reversed Compound Agent Theorem (RCAT), to perturb specifications so as to create separable solutions, admitting the possibility of searching for a best-fit product-form solution.Based on these theoretical and practical developments, the first objective of the proposed project is to enhance the RCAT-approach to product-forms and semi-product-forms for application in models with SPN specifications and in networks with batch-movements. Perhaps even more importantly, the probability density function of the response time of tasks in passing along a path will be investigated in a new class of networks; preliminary results have already been obtained and will also be presented in the aforementioned poster. This work is a substantial advance on established approaches, such as Boxma and Daduna's as well as that of the proposed PI previously. We believe that three years of uninterrupted collaboration between the proposer and Marin in the AESOP research group will attain the goals summarised above and listed under Objectives .

Planned Impact

Although the focus of the proposal is on theoretical research, we also anticipate that benefits will arise from general exploitation of the efficient, compositional, performance analysis methodology that is implicit in the whole approach. In the longer term, its specific application in models of real systems, abstractions of which we consider as case studies in Task 5, will assist users and developers of such systems in carrying out efficient management and optimisation of performance. Further downstream, indirect benefits would also then accrue to the competitiveness of the UK ICT industry, the environment and the public at large, in part through the training of people to contribute to the knowledge economy. In particular, the project will provide excellent training in research for the RA as well as for PhD students (funded from other sources), who should emerge from the project with a good knowledge of the theory and practice of performance engineering. System designers and capacity planners in industry will be able to apply compositional techniques of the kind we use and propose to construct rapidly performance models for a wide variety of complex distributed systems, e.g. computer and communication systems, workflow processing systems (such as patient-flow in hospitals) and sensor networks. Our RCAT-based methodology can then be applied to either solve the models efficiently (in a separable manner) or to give guidance on how the model might be modified in such a way as to yield an efficient, approximate solution. This represents a significant advance over traditional analysis methods, either based on solution of explicit Markov chains (computationally intractable for large systems) or queueing network theory (highly restricted in its application). Cloud technologies in particular need to make scheduling decisions in real time and so require models that quantify performance rapidly, on-the-fly. Reasonable approximations are usually quite adequate and our methodology appears ideal for this purpose. As capacity planning consultants to a wide range of businesses, our industrial partner Metron will be in the best position to exploit the more practical output directly in the longer term. Communication of project outputs to a wider audience and engagement of relevant interested third parties with the project will take place through a variety of channels: (i) Events will be arranged locally, including the hosting of the primary international conference of our research field, the Joint Sigmetrics/Performance 2012 conference; (ii) Metron will host a workshop during the second half of the project, which will feature a selection of national and international participants from both academia and industry; (iii) A (virtual) project web server will be set up to promote and communicate the goals of the project, and to encourage wider collaboration; (iv) Project results and case studies will be published openly in major international journals, as referred to in Academic beneficiaries ; (v) Dissemination will also occur through travelling to relevant research institutions throughout the world (see Academic beneficiaries ). As outlined in their letter of support, we have set up six monthly meetings with Metron at which we will review progress and obtain input into future objectives. Further, Metron will assist us with integrating our techniques into their Athene capacity planning software, as well as in the provision of real-world test case scenarios. The PI has a good record of managing research grants and knowledge transfer activities, having been an investigator on over 20 EPSRC grants (15 as PI). These included in particular the grant Performance Modelling Of Client-Server Systems (PERMOCS) (GR/L06744/01), which featured Metron as a project partner. Outputs from that project were jointly published with Metron, and were incorporated into subsequent versions of the Athene toolset.

Publications

10 25 50
 
Description Key findings fall into 2 categories:
(1) Product-form solution; (2) Response time distributions in queues and networks of queues. The former provides efficient solutions for performance metrics in networks of finite capacity resources with contention (usually queues or a generalisation thereof) and the second provides detailed information about the delays in such systems. For example, rather than just estimating the average time it will take for a transaction to complete, the proportion of messages that fail to reach their destination in a pre-specified time can be predicted. Such information is often essential in meeting SLA requirements.

Regarding (1), the project has greatly extended the domain of networks with tractable product-form solutions in Markovian networks. These solutions can be constructed automatically by using a technology based on the RCAT theorem proved previously by the PI, and in fact the theorem itself has been generalised. These advances are clearly explained in the papers cited in the project output.

Regarding (2), it has proved difficult to find product-form extensions to the results for Jackson-type queueing networks that are overtake-free - results that were first obtained in the 1980s. A theorem was obtained with an interesting notion of forwards- and backwards-overtaking that appeared to be a generalisation. However, it turned out that in order to satisfy the conditions of the theorem, a network has to be tantamount to an overtake-free Jackson network! Although we did not prove this result, since there seemed little value in this, we could not find a concrete example that was more general.

Nevertheless, our work on G-networks (with negative customers) and batches (of both positive and negative customers) proved fruitful: we found new product-forms (via the methods of (1)) as well as response-time distributions in a tandem of 2 such queues. Indeed, this work is still in progress, although no longer funded specifically for the last year (i.e. after this project ended).

The tangible impact of the work is better, more efficient performance models which are ever-more important as the complexity of modern computers, networks and storage systems increases, along with rapidly growing workload. Last year a keynote address was given by the PI at the UK Performance Engineering Workshop (see publications).
Exploitation Route Queueing - or contention for shared resources - is a way of life. Any technology that can help with more efficient logistics is of benefit to the associated users. Therefore the findings will be of benefit to, and can be taken forward by any of the sectors identified below, where they can be customised appropriately.
Sectors Communities and Social Services/Policy,Digital/Communication/Information Technologies (including Software),Education,Healthcare,Transport