BIG data methods for improving windstorm FOOTprint prediction (BigFoot)

Lead Research Organisation: University of Exeter
Department Name: Mathematics

Abstract

Wind storms can cause great damage to property and infrastructure. The windstorm footprint (a map of maximum wind gust speed over 3 days) is an important summary of the hazard of great relevance to the insurance industry and to infrastructure providers. Windstorm footprints are conventionally estimated from meteorological data and numerical weather model analyses. However there are several interesting less structured data sources that could contribute to the estimation of the wind storm footprints, and more importantly will raise the spatial resolution of our estimates. This is important as there are important small-scale meteorological phenomena, such as sting jets, that are currently not well resolved by the current methods.

We propose to exploit three additional sources of data (and possibly others during the course of the project). The three sources so far identified identified are amateur observations available through the Met Office weather observations website (WOW), comments made on social media and video recorded on social media or CCTV. Amateur meteorological observations are currently collected by the Met Office but not used in producing the footprint estimates. We will investigate whether we can use them in the estimation of the storm footprint; a useful by-product will be estimates of the uncertainty for each WOW station. Social media, such as twitter or instagram, often contains comments on windstorms. These can range from comments on how windy it is, to reports of damage produced by storms. In some cases the geographical location of the message is provided by the device but in others it has to be inferred. There are very large numbers of messages posted on social media every day and it should be possible to used these to provide more detailed modelling of footprints. In addition to text, social media also records images and video. Video is also recorded extensively in the form of CCTV. Video recordings of trees, say, blowing in the wind include information on the strength of the windstorm. We will analyse such recordings to produce information on wind velocity and gust velocity.

Bringing together large quantities of diverse data is a complex procedure. We will develop, test, and compare two approaches in modern data science: statistical process modelling and machine learning. Both methods will aim to synthesise all the data into an estimate of the windstorm footprint (and its associated uncertainty). The former will concentrate on producing a map more like the current estimates based on the maximum gust speed while the latter data based methods will concentrate more on mapping the damage caused by the storm. Once we have estimates of the windstorm footprint from both social media and the modelling we will compare these with the standard products and, in consultation with stakeholder, establish any improvements.

Planned Impact

This project will develop innovative new methods for exploiting big data to improve understanding of windstorm hazard. Because of the potential to damage property and infrastructure, windstorm hazard is a major source of risk for Europe with average insured losses in the UK exceeding £500 million per year (Prudential Regulation Authority, 2015). Furthermore, extreme storm risk is likely to increase with climate change and is clearly identified as an area of research that urgently needs more attention for UK climate change adaptation (Climate Change Risk Assessment report, 2017).

This project will enable storm risk to be more reliably quantified, which will allow better decisions to be made to improve societal resilience to this hazard. The ability to exploit new sources of data that have good spatial coverage in areas with high exposure (e.g. urban postcode regions) will create new types of windstorm footprint that are better predictors of damage and loss. This new product will enable various industries make better informed decisions about windstorm risk e.g. re/insurers, power supply companies, etc. For example, footprints with more detail in areas with high exposure will provide more reliable benchmarks for back-testing catastrophe models that are used to price property and business insurance. In addition, improved spatial representation of windstorm footprints will improve understanding about smaller scale features in extreme wind gust speeds (e.g. sting jets) and help organisations such as the Met Office to improve weather prediction models and design improved observation networks for extreme weather.

In summary, this project will benefit the following groups:

* The meteorological community - new information about how current numerical models and observation networks may be misrepresenting spatial structure in extreme wind speeds, e.g. mesoscale sting jet features, will enable the Met Office to improve their weather prediction models and observation networks;
* Other hazard communities - the innovative new data science tools able to exploit increasing amounts of unstructured data will be useful for other hazards e.g. extreme precipitation required for flood modeling and estimation;
* The data science community - exciting new spatial big data methodologies developed via close interdisciplinary collaboration of statistical and computer scientists;The catastrophe re/insurance community - improved hazard footprint estimates will enable more reliable risk quantification and pricing for businesses and individuals;
* The wider business community - improved knowledge of storm risk will help to improve resilience to windstorm damage. This is particularly important for critical infrastructure such as power lines, communication networks, transport and offshore facilities but could also inform regulations for more resilient buildings able to withstand the increasing risk from windstorms with climate change;
* Policymakers - better understanding of the likely impact of storms will inform decision-making and planning to deliver greater resilience and climate change adaptation;
* The general public - will ultimately benefit from this project through the UK being better adapted and more resilient to extreme storms.
 
Title Data for Ideological biases in social sharing of online information about climate change 
Description This repository contains an anonymised dataset to support the paper "Ideological biases in social sharing of online information about climate change" by Tristan J.B. Cann, Iain S. Weaver and Hywel T.P. Williams, submitted for publication in PLOS ONE.The files present contain the following:tweet_ids - A list of all tweets ids used in the study.coded_urls - A list of the (up to) five most common URLs from each of the 75 most common domains. Where these were not social media sites and content was available, they were graded for political and climate bias by the human coders.domain_bias_grades - A list of domains and the final bias scores assigned to them following the standardisation process we applied to the scores received from our coders. The first line of this file is a header labelling the four columns as political bias, climate change bias, political standard deviation and climate change deviation.The networks folder contains subfolders for each of the seven weeks studied. Three files are provided for each week.week_x_bipartite_edges - A list of source, target pairs to define edges in the bipartite user-URL network. Source and target give the user and URL node IDs respectively. Pairs are not guaranteed to be unique, and duplicates should increment the edge weight.week_x_url_labels - A list of expanded URLs given in the order corresponding to the edge list described above.week_x_user_labels - A list of anonymised user IDs given in the order corresponding to the edge list for this week. These anonymised numeric user identifiers are consitent across each week for cross referencing. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Data_for_Ideological_biases_in_social_sharing_of_online_inform...
 
Title Data for Ideological biases in social sharing of online information about climate change 
Description This repository contains an anonymised dataset to support the paper "Ideological biases in social sharing of online information about climate change" by Tristan J.B. Cann, Iain S. Weaver and Hywel T.P. Williams, submitted for publication in PLOS ONE.The files present contain the following:tweet_ids - A list of all tweets ids used in the study.coded_urls - A list of the (up to) five most common URLs from each of the 75 most common domains. Where these were not social media sites and content was available, they were graded for political and climate bias by the human coders.domain_bias_grades - A list of domains and the final bias scores assigned to them following the standardisation process we applied to the scores received from our coders. The first line of this file is a header labelling the four columns as political bias, climate change bias, political standard deviation and climate change deviation.The networks folder contains subfolders for each of the seven weeks studied. Three files are provided for each week.week_x_bipartite_edges - A list of source, target pairs to define edges in the bipartite user-URL network. Source and target give the user and URL node IDs respectively. Pairs are not guaranteed to be unique, and duplicates should increment the edge weight.week_x_url_labels - A list of expanded URLs given in the order corresponding to the edge list described above.week_x_user_labels - A list of anonymised user IDs given in the order corresponding to the edge list for this week. These anonymised numeric user identifiers are consitent across each week for cross referencing. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://figshare.com/articles/dataset/Data_for_Ideological_biases_in_social_sharing_of_online_inform...