Leveraging the Google Cloud to Estimate Individual Level CO2 Emissions Linked to the School Commute

Lead Research Organisation: University of Liverpool
Department Name: Geography and Planning


Internationally, the rates of active transport (e.g. cycling or walking) to school are in decline and the corollary switch to less sustainable modes of travel are linked with negative effects on the environment in terms of increased emissions, increasing traffic congestion around schools and negative health impacts related to lower physical activity levels or pollutant exposure. In a UK context, schools account for 15% of total public sector emissions (DCFS, 2010), which in England is estimated to be the equivalent of around 9.4 million tonnes of CO2 per year (SDC, 2006). 7% (658k tonnes) of this total is associated with the pupil-school commute, and as such, there are significant environmental benefits of pupils adopting more sustainable travel behaviours.

This research project creates a national coverage and geographically sensitive model of CO2 emissions linked with the school commute. This involves the integration of a variety of public sector "big data", including the origin destination and mode choices for around 7.5 million pupils, and small area estimates of the emission characteristics of cars registered within very small geographic areas. These data are integrated to create a geographically sensitive estimate measure of an individual pupils contribution of CO2 related to their journey. The computational burden of processing such large data, and especially in estimating routes to school at a transport network level (road, rail etc) are great. The Google cloud environment is utilised in this research to reduce this computational burden.

Given the spatial diversity of population characteristics and circumstance, alongside differences in local infrastructure and policy; a 'one size fits all' approach to tacking the issue of emissions linked to the school commute is unlikely to be as fruitful as interventions tailored to local context. With the increasing availability of cloud computing in an era of public sector "big data", localised and geographically intelligent modelling approaches are increasingly accessible to the social sciences. However, technical challenges aside, there are also critical ethical concerns that need to be addressed related to data disclosure and privacy. As such this project establishes both technical procedures and also makes recommendations about the ethical use of cloud technology within the context sensitive individual level data.

For the first time, the ambitious spatial modelling techniques presented in this research integrate geographically localised input parameters and control for geographical context in the calibration of emissions linked to the school commute, enabling outputs to be explored down to the level of an individual. This research will map the geography of mode choice and emissions, also measure how influences on these patterns vary spatially.

Planned Impact

Who will benefit from this research?

This project is of substantial policy relevance to variety of key stakeholders looking at the impact of policy decisions on levels of CO2 emissions related to the school commute, and additionally, on those drivers of more active forms of transport, given their associated health benefits. For the first time, geographical heterogeneity of the influences and outputs of a CO2 / mode choice model will be explored, giving policy makers a better indication of the geography of likely responses to their interventions. A by no means comprehensive list of those stakeholders who we would envisage this project to be of benefit include: the Department for Education, Department for Transport, local authorities, school management; and from a cyber infrastructure perspective, Google and other cloud based analytical providers.

In addition to beneficiaries of the substantive output of this grant, from a methodological perspective, social science researchers will also gain a better understanding of how sensitive data can be used within a cloud computing (specifically google) environment. Furthermore, researchers interested in routing applications using the Google service will be provided with additional guidance on how these API can be practically exploited in their work.

How will they benefit from this research?

The model output from this research has a wide range of benefits. For example, within the theme 'Health and Wellbeing', given a specific policy scenario, the model would enable exploration of the loss or adoption of active transport; or, given the fine geographic scales in which the model can be applied, it would be possible to examine street level exposure to vehicular pollutants. You might for example use this information to create "walking bus" routes to minimise this exposure. Furthermore, within the context of 'Environment, Energy and Resilience', it would be possible to expand the literature on the influences of mode choice, to examine how these vary geographically. This would be of relevance when designing policies that encourage more sustainable environmental practices in a switch to a low carbon economy.

Specific deliverables which would be of relevance to a wider [non academic] community include:

Policy facing website - will include dissemination of software / code and results
Best practice guide on using sensitive public sector "Big Data" with Google cloud services
Training workshop on using cloud services for social science
R software package enabling the use of the Google transport API within R


10 25 50
Description This grant has developed a geocomputational model that combines street and transport network level routing with geographically disaggregate CO2 vehicle estimates to create a localised new emissions model linked with the commute to school. This work also demonstrated how multiple disparate datasources drawn from providers including the Department for Education and the Department for Transport can be integrated.

A substantive finding that has emerged from this work was that non-geographically sensitive models can both over or under predict emission estimates for local contexts when compared to those geographically specified models developed by this research. Although at a national scale emissions levels are reasonably comparable to non-geographical specified models (the averages smooth away local differences), if policy are to be implemented at a sub regional or local level, then we argued that alternative methods of emission estimates are required.

Secondly, the modelling framework was applied to examine how rates of active travel and emissions could be impacted through targeted or blanket policy change. It was found that in both cases, targeting policy at specific schools could have as much impact as more defuse policy that targeted multiple schools. As such, potential cost savings would be attained by focusing interventions on those most at need groups.

In parallel, this grant also explored challenges related to how sensitive public sector data could be processed in cloud computing environments. In the case of pupil data, it was found to be very difficult (if not impossible) to comply fully with DfE data sharing agreements and relevant Data Protection Act constraints. Alternative "offline" methods were used to calculate routes using open source libraries, and output distances were found to be comparable to those generated through cloud based routing engines. Constraints on cloud computing within this context are reported here: http://geographicdatascience.com/cloud/2014/03/15/Personal-data-in-the-cloud/.

Finally, various methods of street level interactivity visualisation were considered, with the final results presented in a Transport Map Book for each local authority (see RCUK Narrative Impact section).
Exploitation Route The model specification and code developed by this research is entirely open and available on github (https://github.com/alexsingleton/routing). It would be feasible for others to refine this model in various ways (as outlined in our published outputs), or implement as is, for either national emissions estimates or local case study analysis.

Further extension and linkage will likely occur through the Consumer Data Research Centre (ESRC Phase 2 Big Data centres) as the routing methodology developed within this project will be of utility for both the estimation of emissions linked with freight transit, and also other application areas such as accessibility of store types to different consumer groups.

We also expect the dissemination of results to local authority communities to continue through download of the atlases developed by this project.
Sectors Communities and Social Services/Policy,Education,Energy,Environment,Government, Democracy and Justice,Transport

Description The findings from this work relate to the creation of a geographically sensitive and micro scale model that estimates levels of emissions linked to school commuting; and additionally, explores those considerations required if attempting to implement such a model within a cloud computing environment (e.g. Google cloud), considering specifically the constraints of data protection legislation. The outputs from the national model are incorporated into a series of transport map books (http://www.alex-singleton.com/r/2014/09/09/Transport-Map-Book/) which are available for each local authority district (LAD). Downloads have been recorded, with 349 in total to date, with the the top 10 LAD as follows:Exeter; Liverpool; Oxford; Bristol;York; Sheffield.pdf; Cornwall; Barking and Dagenham; Cambridge; Teignbridge. There have been direct queries from two further local authorities about the transport atlas and reuse including Wealdon (Malcolm Harris, Policy Officer) and Lewisham (Ronan Smyth, Statistics and Research Officer). Oxford also link to these materials on their website - http://www.oxford.gov.uk/transportstats. Furthermore, the DfT (Daryl Lloyd, Head of Road Safety Statistics) have been in contact about future collaborations, specifically in the area of MSc dissertation projects. Processing of data within cloud infrastructure, and those constraints on use (related to the Data Protection Act) in the social sciences were presented as a short report (http://geographicdatascience.com/cloud/2014/03/15/Personal-data-in-the-cloud/); and integrated into a training course (16 Participants: splits - 38% academic staff; 37% academic student; 12.5% public sector; 12.5% commercial).
First Year Of Impact 2014
Sector Education,Environment,Government, Democracy and Justice,Transport
Impact Types Policy & public services

Description Course: R for Google Map Making 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact This course provided training on using for R for spatial analysis where there may be a component of cloud computing. Course materials located - http://rpubs.com/nickbearman/r-google-map-making and https://github.com/nickbearman/r-google-map-making-20140708

None in particular, although it was noted that the material on cloud computing was useful.
Year(s) Of Engagement Activity 2014
URL https://speakerdeck.com/nickbearman/using-google-maps-with-r-tue-8th-july-10-45am-4pm
Description Invited talk: Transformative Research in Geographic Information Science 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact As part of a panel session, this talk stimulated debate around the merits of open geographic information science.

Year(s) Of Engagement Activity 2014
URL https://speakerdeck.com/alexsingleton/transformative-research-in-geographic-information-science
Description Invited talk: What is so Big about Big Data? Some Observations on Open Data and Systems 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Very lively debate after the talk around definitions of big data

After the talk, we had further discussion about the appropriate use of the term Atlas V Map Book.
Year(s) Of Engagement Activity 2014
URL https://speakerdeck.com/alexsingleton/what-is-so-big-about-big-data-some-observations-on-open-data-a...
Description Report: Personal Administrative Data in the Cloud 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other academic audiences (collaborators, peers etc.)
Results and Impact Around 50 page views of the article.

Year(s) Of Engagement Activity 2014
URL http://geographicdatascience.com/cloud/2014/03/15/Personal-data-in-the-cloud/
Description Transport Atlas 
Form Of Engagement Activity A magazine, newsletter or online publication
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Different atlas have been downloaded 345 times. Generated interest on twitter and engagement with a number of local authorities.

Year(s) Of Engagement Activity 2014
URL http://www.alex-singleton.com/r/2014/09/09/Transport-Map-Book/