Intelligent Performance Optimisation of Virtualised Data Storage Systems (iPODS)

Lead Research Organisation: Imperial College London
Department Name: Dept of Computing


Two key R&D questions emerge from the recent unprecedently rapid growth in both data and storage capacity: how best to map data onto physical disk devices, and on what factors to base the choice of this mapping? From a user perspective, it is important to ensure that an adequate quality of service (as dictated by application requirements) is delivered at reasonable cost. Additionally, since the total cost of ownership of disk storage is dominated by administration and management activities, ease-of-management and autonomous operation are vital. The technology pull outlined above has led naturally to the development and widespread adoption of virtualised storage infrastructures that incorporate intelligent storage fabrics. The physical resources underlying such systems are organised into storage tiers, each of which delivers a different cost/capacity ratio against a certain quality of service. Besides providing a single point of control and uniform view of storage system components across all tiers, an important management challenge for the intelligent storage fabric is to place data onto the most appropriate tier and then migrate it from one tier to another as the access profile evolves. Device selection and data placement within tiers is also critical. For example, to support the performance requirements of video streaming applications, it may be necessary to stripe video data across a number of RAID sub-systems, leveraging not only the capacity of the storage devices but also the performance of several RAID controllers. A popular platform for implementing high-performance virtualised storage systems is the Storage Area Network (SAN). This is a high-speed special-purpose network that interconnects different kinds of storage devices with associated data servers. Several commercial vendors offer SAN-based storage solutions including IBM, NetApp, EMC, Hitachi and Compellent. According to published literature, the mechanisms for fabric intelligence in these systems are relatively simple with inter-tier migration policies that are centred on capacity utilisation and failure recovery and that are not sensitive to any dimension of the access profile other than access frequency. The most sophisticated tiered SAN available today offers fixed-interval block-level data migration based on access frequency. All data within a tier is subject to a single static protection level and each tier has separate, static address spaces for live data and snapshots. Consequently, data-specific quality of service cannot be guaranteed, and space utilisation is potentially inefficient; large enterprises are therefore reluctant to adopt storage virtualisation for mission-critical applications. The focus of the present proposal is to develop more sophisticated fabric intelligence that is able to autonomously and transparently migrate data across tiers and organise data within tiers to deliver the required quality of service in terms of factors such as response time, availability, reliability, resilience, storage cost and power utilisation. This composite goal entails both the provision of intelligent data placement and migration strategies as well as the development of performance evaluation tools to assess their benefits quantitatively.The project is backed by two industrial partners who have committed senior technical staff to help us to validate our work in a realistic context. The news agency Reuters will provide the focus of our primary case study by helping us to understand their data architecture and storage-related quality of service requirements. The storage development team at IBM (Hursley), who design and implement Storage Area Network controllers, will provide us with I/O workload traces, will host a project PhD student for six months and will provide us with insights into the operation of state-of-the-art SAN-based storage solutions.
Description We have developed new techniques for large-scale data management. Specifically we have researched how to lay out data in complex multi-tiered file systems so that a particular level of user-specified quality of service can be given to data automatically, and without manual user intervention. The algorithms and techniques have been implemented in the form of a new filesystem for Linux. We have also developed new techniques for the modelling of response time in storage systems, especially disk arrays, combining simulation, analytical models and benchmarking.
Exploitation Route Manufacturers of storage systems will be able to use our algorithms and techniques for data layout. Our modelling efforts will help others to predict the performance of their storage systems.
Sectors Digital/Communication/Information Technologies (including Software)

Title Quality-of-Service-aware File System 
Description A new kind of file system which automatically locates data within a multi-tiered file system so as to give the data a user-specified level of quality of service. 
Type Of Technology Software 
Year Produced 2011 
Impact The student used the technical knowledge gained from the creation of this artefact to secure a senior position at Citrix.