Optimal Scheduling of Scientific Application Workflows for Cloud-augmented Grid Infrastructures

Lead Research Organisation: University of Westminster
Department Name: Sch of Electronics and Computer Science

Abstract

Research scientists need cloud computing to flexibly support their computational requirements but also require user-friendly tools in order to engage. Demanding computational requirements for science research have been addressed through grid computing, which until recently has been based on a fixed physical wide-area infrastructure. Physical grid resources can become overloaded however - scientists may find this restricts their needs where the performance required at a particular moment in time cannot be addressed in real-time. User-friendly tools and environments are required to support grid computing in science communities. P-GRADE is a web-based portal, co-developed and by the investigators, for designing, submitting and monitoring workflows on the grid. It is highly graphical, and thus very appealing to a wide range of non-computing specialists.Familiarity is important to users. In order to develop the potential for cloud computing for the same community of science researchers, it is highly desirable, from their perspective, to retain continuity of computational support environment, as much as possible. The P-GRADE environment has been influenced by user requirements and aspirations over many years, and has already achieved a high degree of acceptance within the research science community, in view of its usability. In any case, to redesign a user interface from scratch is a very costly development activity.Interoperability between service grids and cluster grids was achieved by the investigators in the EU FP7-funded EDGeS project, so that users of each community may now access the resources of the other. Grid computing can also be extended onto cloud resources. The investigators have recently developed solutions for extending both service and cluster grids with virtual cloud resources.The usability of the tools and environment developed for the grid has been experimentally evaluated by the investigators in collaboration with biological research groups at the University of Westminster and Imperial College. These collaborative experiments have been very successful, in that the research biologists have found them very acceptable working environments, and have actively adopted them for their live research programmes. The collaborations have also confirmed that the computational performance of the grid solutions was significantly enhanced. The combined effect of a high degree of usability and enhanced computational performance has been to facilitate significant shifts in biological experimental methodology, leading to increased research productivity.Nevertheless, there are limits to the performance achievable on fixed grid implemented on physical infrastructure, due to the physical limitations of the infrastructure itself. Cloudbursting, ie. the ability to provide additional compute capacity that may be required on demand at specific times to cope with unpredictable peaks of research computing, is an attractive aspect of cloud computing which has the potential to break through the current performance constraints achievable on fixed grids. It is proposed to attach a cloud infrastructure to existing fixed grids to achieve a mixed grid infrastructure comprising both existing physical and virtualised cloud resources. It is then proposed to re-engineer the tools and environment to exploit cloudbursting in the mixed grid to achieve even greater performance. Finally, the performance of the mixed physical/virtual grid will be evaluated, to include statistics on resource utilisation, to inform the development of a costed performance model.

Planned Impact

1. Contribution to Knowledge 1.1 Scientific Advances and Techniques The project will demonstrate, through a number of case studies, the practical ability to deliver grid computing 'in the cloud' that directly supports research and is capable of accelerating scientific advances. It is expected that from this pilot, the technologies and methodologies exemplified will spread: initially to closely related research activities (for example, to other biological research areas capable of benefiting from extensive application of molecular modelling packages), and from there to different research domains, using different software packages. Both the delivered technology of applicable cloud computing, and the associated inter-disciplinary collaborative processes applied in the project, will in themselves represent scientific advances in computer science, and information systems respectively. The latter contribution is particularly important because inter-disciplinary communication has hitherto been a significant barrier in the development and deployment of true eScience. Complex computational processes need a considerable amount of computing expertise, yet computer experts rarely have a good understanding of, for example, molecular biology, which complicates communication with life scientists. Based on previous collaborations between the proposers and science groups from Imperial College and Westminster University, the project will deliver examplars of collaborative working, and disseminate this knowledge widely through conferences, and publications and other events. 2. People 2.1 Skills Through the proposed pilot project, the proposers expect to develop new projects, and new PhD programs that are steeped in the interdisciplinary nature of the engagement. Interdisciplinary engagements are relatively rare in academia, for a variety of funding and cultural reasons. In this case, for reasons outlined above, the interdisciplinary aspects of the project are vital, and for sustainability, must become ingrained in the research culture - at least for the next few years whilst the cloud computing paradigm becomes established. For the proposers, arising from this project, this ethos will be focussed through biotechnology, biomedicine and environmental science 3. Economy 3.1 Products and procedures The project is fundamentally committed to the delivery of sustainable interdisciplianry methodogies which can be transferred on some timescale to industry. Industry is far more committed to interdisciplianry team working, and therefore this should be a natural transition. Nevertheless, the team will need to enage with a relatively wide range of industries: for example from IT companies through to biomedical and healthcare verticals. Both the computing and scientific academic research partners will engage jointly on this endeavour, to identify the most effective routes to exploitation, supported by their respective university knowledge transfer organisations. 4. Society 4.1 Health Healthcare is a direct beneficiary of the proposed pilot, in that the demonstrator case studies will deliver advances in this field. 4.2 Culture The move to cloud computing represents a major paradigm shift in IT, and is relatively new to most people. The ingrained culture of doing everything yourself on your PC must and will change, as the benefits become increasingly obvious. The technology and processes of the proposed pilot are instrumental in helping to achieve this transition, in all walks of research and ultimately to society in general. Particluar examples include citizen software developers (crowdsourcing)

Publications

10 25 50

publication icon
Stephen Winter (Co-Author) (2012) Application Repository and Science Gateway for Molecular Docking Simulations in IWSG-Life 2012: 4th International Workshop on Science Gateways for Life Sciences, 23 - 25 May, 2012, Amsterdam, The Netherlands, in conjunction with Healthgrid Conference 2012 and featuring EGI.eu, 2012.

publication icon
Stephen Winter (Co-Author) (2011) Development of user friendly, high throughput screening for ligands and inhibitors of carbohydrate modifying enzymes in Proc. 2nd glyco-bioinformatics Beilstein-Institut symposium , 27th June - 1st July 2011, Potsdam, Germany

publication icon
Winter S (2014) Buttressing volatile desktop grids with cloud resources within a reconfigurable environment service for workflow orchestration in Journal of Cloud Computing: Advances, Systems and Applications

 
Description Research scientists need high-performance distributed computing infrastructures (DCIs) to flexibly support their computational requirements. Demanding computational requirements for research in science have been addressed through grid computing, which has been based on a fixed physical wide-area infrastructure. Physical grid resources can become overloaded however - scientists may find this restricts their needs where the performance required at a particular moment in time cannot be met. Cloudbursting is the ability to provide additional compute capacity in real time to cope with unpredictable peak demands of grid-based research computing. However, provision of reliable HPC resources is not the only computational barrier for research scientists. Experience has shown that most scientists (with some notable exceptions) have little time and insufficient motivation to engage with arcane tools and technologies and languages routinely used by computer scientists; conversely, IT specialists have no time to gain deep knowledge of every scientific domain they come into contact with. In order to engage effectively with a high-performance DCI, scientific communities require appropriate user-friendly tools.
An overarching aim of the project was to secure the engagement of scientists in non-computer science fields with high-performance cloud computation, and in that spirit, the project successfully delivered:
• A cloud-augmented grid platform for scientists to exploit cloud computing in their research
• A usable toolset for exploiting cloudbursting within a hybrid DCI comprising a cloud-augmented desktop grid platform
• A scheduler model for optimising workload across the hybrid DCI
A key part of the rationale was to explore and develops ways to significantly yet economically enhance the performance of existing grid solutions, through cloudbursting. The grid infrastructure selected for the project was an enterprise desktop grid - an inexpensive yet powerful resource that exploits underutilised processing cycles. Desktop grids are highly suited to the problem domain (molecular docking) yet by their nature, have certain limitations in performance.
To help overcome the communication-related barriers to engagement, the project chose to use P-GRADE, a highly-graphical web-based science gateway co-developed by the investigators, for designing, submitting and monitoring computational workflows on the grid and other DCIs. It is intuitive to use, thus very appealing to a wide range of non-computing specialists. The project nevertheless still required a close working relationship between the computer science partners, and the bioscience partners.
A final interesting conclusion from the project is that for many bioscientists, many aspects of computing, not just HPC engagement, remain a fundamental barrier. For example, the Autodock docking tool - used extensively in this project - is scarcely understood by the majority of biologists, even those who might benefit enormously from using it. Ironically therefore, the project has helped reduce the barriers to engagement with Autodock - by reducing the barriers to cloud and grid.
Exploitation Route The findings have contributed to a number of ongoing projects that are reducing the barriers to engagement by scientists in non-computer fields to HPC in the cloud.
Sectors Aerospace, Defence and Marine,Agriculture, Food and Drink,Chemicals,Creative Economy,Education,Energy,Environment,Healthcare,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology,Transport

 
Description They have contributed to ongoing projects to develop science gateways and cloud based support for SMEs
First Year Of Impact 2012
Sector Agriculture, Food and Drink,Environment,Manufacturing, including Industrial Biotechology,Pharmaceuticals and Medical Biotechnology
Impact Types Societal,Economic