EPSRC Centre for Doctoral Training in Cloud Computing for Big Data

Lead Research Organisation: Newcastle University
Department Name: Sch of Computing

Abstract

Cloud computing offers the ability to acquire vast, scalable computing resources on-demand. It is revolutionising the way in which data is stored and analysed. The dynamic, scalable approach to analysis offered by cloud computing has become important due to the growth of "big data": the large, often complex, datasets now being created in almost all fields of activity, from healthcare to e-commerce.

Unfortunately, due to a lack of expertise, the full potential of cloud computing for extracting knowledge from big data has rarely been achieved outside a few large companies; as a result, many organisations fail to realize their potential to be transformed through extracting more value from the data available to them.

UK industry faces a huge skills gap in this area as the demand for big data staff has risen exponentially (912%) over the past five years from 400 advertised vacancies in 2007 to almost 4,000 in 2012 (e-skills UK, Jan 2013). In addition, the demand for big data skills will continue to outpace the demand for standard IT skills, with big data vacancies forecast to increase by around 18% per annum in comparison with 2.5% for IT. Over the next five years this equates to a 92% rise in the demand for big data skills with around 132K new jobs being created in the UK (e-skills UK, Jan 2013).
While characteristics such as size, data dependency and the nature of business activity will affect the potential for organisations to realise business benefits from big data, organisations don't have to be big to have big data issues. The problems and benefits are as true for many SMEs as they are for big business which, inevitably broadens and increases the demand for cloud and big data skills. Further, even when security concerns prevent the use of external "public" clouds for certain types of data, organisations are applying the same approaches to their own internal IT resources, using virtualisation to create "private" clouds for data analysis.

Addressing these challenges requires expert practitioners who can bridge between the design of scalable algorithms, and the underlying theory in the modelling and analysis of data. It is perhaps not surprising that these skills are in short supply: traditional undergraduate and postgraduate courses produce experts in one or the other of these areas, but not both.

We therefore propose to create a multi-disciplinary CDT to fill this significant gap. It will produce multi-disciplinary experts in the mathematics, statistics and computing science of extracting knowledge from big data, with practical experience in exploiting this knowledge to solve problems across a range of application domains.

Based on a close collaboration between the School of Computing Science and the School of Mathematics and Statistics at Newcastle University, the CDT will address market requirements and overcome the existing skills barriers.

The student intake will be drawn from graduates in computing science, mathematics and statistics. Initial training will provide the core competencies that the students will require, before they collaborate in group projects that teach them to address real research challenges drawn from application domains, before moving on to their individual PhD topic. The PhD topics will be designed to allow the students to focus deeply on a real-world problem the solution of which requires an advance in the underlying computing, maths and statistics. To reinforce this focus, they will spend time on a placement hosted by an industrial or applied academic partner facing that problem. Their PhD research will therefore deepen their knowledge of the field and teach them how to exploit it to solve challenging problems.

Working in the new, custom-designed Cloud Innovation Centre, the students will derive continuous benefit from being co-located with researchers, industry experts, and their fellow students; immersing them in a group with a wide range of skills, knowledge and experiences.

Planned Impact

The CDT will have impact in a range of areas:


Industrial and Public Sector Impact

The Centre's main impact will be made through its graduates: it will develop highly skilled researchers with the theoretical and practical skills to transform existing organisations, and create successful new companies.

We have already obtained commitment from 30 partner organisations both large and small, regional, national and international, who wish to work closely with the CDT (as evidenced by the letters of support). Impact on them will come through students working on projects specified by partners, students being placed with partners during their PhD, and ultimately through students moving into positions of influence in organisations when they graduate.
The norm for all software developed in the CDT will be to release it as open source so that it can be exploited by industry. In our experience this can attract companies and be a catalyst for productive collaboration - code from our previous projects has been widely used internationally.


Economic Impact

The global cloud computing market is expected to grow from $38 billion in 2010 to $121 billion in 2015 (M&M, 2013). Working productively with partners will maximise the chances of economic impact, which will come through organisations using their newfound skills, expertise and tools to realise their potential to transform themselves.

UK industry faces a huge skills gap in this area. Demand for big data staff has risen exponentially (912%) over the past five years from 400 advertised vacancies in 2007 to almost 4,000 in 2012 (e-skills UK, Jan 2013). Over the next five years analysts forecast a 92% rise in the demand for big data skills with around 132K new jobs being created in the UK (e-skills UK, Jan 2013). The CDT will provide expert practitioners to fill this gap.

The reason Newcastle City Council is setting up the £2M cloud business engagement facility that will be co-located with the CDT is that it believes that it can transform the local economy by up-skilling existing workers. This investment brings funding for CPD, cloud events and other outreach activities that will disseminate the knowledge developed in the CDT.


Societal Impact

We will build on the knowledge and pathways created in the Social Inclusion through the Digital Economy Hub (SiDE: 2009-15), which is tackling big data challenges across a range of areas of societal importance e.g. healthcare and mobility for older people. We will build on our existing, long-term relationships with SiDE partners; maintain our links with organisations that represent disadvantaged groups; and work directly with users through the 3000 person User Pool created by the SiDE project.

The CDT also has a strong set of investigators tackling key healthcare challenges through the use of cloud computing in medicine, biology and neuroscience. These subjects are now under a deluge of data, and increasingly researchers (including those in the pool of potential supervisors for this CDT) are using cloud computing to extract knowledge from it.

An annual public engagement open-day will disseminate the CDT's work to a diverse audience.


Academic Impact

Academic impact will come from the graduating students (some of whom will stay in academia), ideas (through publications), the publication of open source software and our delivery of training courses to other CDTs and researchers.

The placing of CDT students at our overseas partner Universities - Berkeley and PUCRS, Brazil (please see letters of support) - will provide a way for our student's research to have direct international impact.

Publications

10 25 50