Intelligent Performance Optimisation of Virtualised Data Storage Systems (iPODS)

Lead Research Organisation: Imperial College London

Department Name: Computing

Abstract

Two key R&D questions emerge from the recent unprecedently rapid growth in both data and storage capacity: how best to map data onto physical disk devices, and on what factors to base the choice of this mapping? From a user perspective, it is important to ensure that an adequate quality of service (as dictated by application requirements) is delivered at reasonable cost. Additionally, since the total cost of ownership of disk storage is dominated by administration and management activities, ease-of-management and autonomous operation are vital. The technology pull outlined above has led naturally to the development and widespread adoption of virtualised storage infrastructures that incorporate intelligent storage fabrics. The physical resources underlying such systems are organised into storage tiers, each of which delivers a different cost/capacity ratio against a certain quality of service. Besides providing a single point of control and uniform view of storage system components across all tiers, an important management challenge for the intelligent storage fabric is to place data onto the most appropriate tier and then migrate it from one tier to another as the access profile evolves. Device selection and data placement within tiers is also critical. For example, to support the performance requirements of video streaming applications, it may be necessary to stripe video data across a number of RAID sub-systems, leveraging not only the capacity of the storage devices but also the performance of several RAID controllers. A popular platform for implementing high-performance virtualised storage systems is the Storage Area Network (SAN). This is a high-speed special-purpose network that interconnects different kinds of storage devices with associated data servers. Several commercial vendors offer SAN-based storage solutions including IBM, NetApp, EMC, Hitachi and Compellent. According to published literature, the mechanisms for fabric intelligence in these systems are relatively simple with inter-tier migration policies that are centred on capacity utilisation and failure recovery and that are not sensitive to any dimension of the access profile other than access frequency. The most sophisticated tiered SAN available today offers fixed-interval block-level data migration based on access frequency. All data within a tier is subject to a single static protection level and each tier has separate, static address spaces for live data and snapshots. Consequently, data-specific quality of service cannot be guaranteed, and space utilisation is potentially inefficient; large enterprises are therefore reluctant to adopt storage virtualisation for mission-critical applications. The focus of the present proposal is to develop more sophisticated fabric intelligence that is able to autonomously and transparently migrate data across tiers and organise data within tiers to deliver the required quality of service in terms of factors such as response time, availability, reliability, resilience, storage cost and power utilisation. This composite goal entails both the provision of intelligent data placement and migration strategies as well as the development of performance evaluation tools to assess their benefits quantitatively.The project is backed by two industrial partners who have committed senior technical staff to help us to validate our work in a realistic context. The news agency Reuters will provide the focus of our primary case study by helping us to understand their data architecture and storage-related quality of service requirements. The storage development team at IBM (Hursley), who design and implement Storage Area Network controllers, will provide us with I/O workload traces, will host a project PhD student for six months and will provide us with insights into the operation of state-of-the-art SAN-based storage solutions.

Funded Value:

£471,268

Funded Period:

Oct 07 - Mar 11

Funder:

EPSRC

Project Status:

Closed

Project Category:

Research Grant

Project Reference:

EP/F010192/1

Principal Investigator:

William Knottenbelt

Research Subject:

Info. & commun. Technol. (100%)

Research Topic:

Modelling & simul. of IT sys. (70%)

System on Chip (30%)

Organisations

People	ORCID iD
William Knottenbelt (Principal Investigator)
Peter Harrison (Co-Investigator)

Publications

Author Name

Title Publication Date Published

10 25 50

Wan, F (2008) SIMULATION AND MODELLING OF RAID 0 SYSTEM PERFORMANCE

Lebrecht, A.S. (2008) Towards a Performance Model for Virtualised Multi-tier Storage Systems

Lebrecht A.S. (2009) Using Bulk Arrivals to Model I/O Request Response Time Distributions in Zoned Disks and RAID Systems

Lebrecht A (2010) Analytical and Simulation Modelling of Zoned RAID Systems in The Computer Journal

Knottenbelt W (2011) Trends in Parallel, Distributed, Grid and Cloud Computing for Engineering

Harrison P (2010) Response time distribution of flash memory accesses in Performance Evaluation

Franciosi, F Knottenbelt, W.J (2009) Towards a QoS-aware Virtualised File System

Franciosi F (2011) Data allocation strategies for the management of Quality of Service in Virtualised Storage Systems

Bond, H.A. (2009) DATA PLACEMENT AND MIGRATION STRATEGIES FOR VIRTUALISED DATA STORAGE SYSTEMS

Key Findings
Software and Technical Products


Description	We have developed new techniques for large-scale data management. Specifically we have researched how to lay out data in complex multi-tiered file systems so that a particular level of user-specified quality of service can be given to data automatically, and without manual user intervention. The algorithms and techniques have been implemented in the form of a new filesystem for Linux. We have also developed new techniques for the modelling of response time in storage systems, especially disk arrays, combining simulation, analytical models and benchmarking.
Exploitation Route	Manufacturers of storage systems will be able to use our algorithms and techniques for data layout. Our modelling efforts will help others to predict the performance of their storage systems.
Sectors	Digital/Communication/Information Technologies (including Software)


Title	Quality-of-Service-aware File System
Description	A new kind of file system which automatically locates data within a multi-tiered file system so as to give the data a user-specified level of quality of service.
Type Of Technology	Software
Year Produced	2011
Impact	The student used the technical knowledge gained from the creation of this artefact to secure a senior position at Citrix.

Abstract

Organisations

People

ORCID iD

Publications