Global Surface Air Temperature (GloSAT)

Lead Research Organisation: University of Southampton
Department Name: Sch of Electronics and Computer Sci

Abstract

Surface temperature is the longest instrumental record of climate change and the measure used in the Paris Climate Agreement that aims to 'prevent dangerous anthropogenic interference with the climate system'. The Agreement defines an ambition to limit global temperature change to 1.5C or 2C above pre-industrial levels. The Intergovernmental Panel on Climate Change (IPCC) used a baseline of 1850-1900 for its definition of 'pre-industrial' as this is when existing instrumental records begin. It has been estimated that global temperatures may have already increased by 0.0-0.2C by this time, but this is uncertain due to lack of data. However, even using the 1850-1900 baseline, existing temperature datasets disagree on the amount of warming to date and this disagreement implies more than 20% uncertainty in the allowed carbon budget to meet the goals of the Paris Agreement solely due to uncertainty in observed surface temperature change. These differences between temperature datasets arise mostly from two structural uncertainties: the use of sea surface temperatures (SST) rather than air temperatures over the oceans, especially ice-covered regions, and differences in data coverage and interpolation strategies. This project addresses both.

To best inform decision-makers, records of temperature change must be as accurate, consistent, and long as possible. Existing global datasets start in 1850 or later, but we will extend the record a further 70 years back to the late 18th century. Current knowledge of this period comes from instrumental measurements in Europe, palaeo-proxies (tree-rings, corals or ice cores), and climate models. We will dramatically extend the spatial coverage of the early measured record in this 70-year period, which is important for understanding natural climate variability and the climate response to different radiative forcings. For example, the longer record includes the period of 5 large volcanic eruptions and extra cycles of multi-decadal climate oscillations. The new record will allow us to better disentangle the contributions of anthropogenic and natural factors on the climate system and quantify the effect humans have already had on Earth's temperature, and hence on future climate.

A major inconsistency has been past use of air temperature over land but SST over oceans. Recent advances mean we can produce a marine air temperature record to construct the first global air temperature dataset over ocean, land and ice, stretching back to the late 18th century. Our dataset will be independent from SST, currently the most uncertain component of global temperature. We will improve land, marine and cryosphere air temperature observations to make them more homogeneous and extend the global record further back in time. This requires fundamental research to better understand the bias and noise characteristics of historical observations and develop new error models. We will adopt sophisticated statistical techniques to allow the estimation of air temperature everywhere, even when there are gaps in the observations. We will expand the historical climate record with new ship's logbook and weather station digitisations focused on early data, sparse periods and regions, and the interfaces between land, ocean and ice. We will engage the public in the digitisation effort building on recent successful citizen science initiatives.

We will analyse the new surface air temperature record to better understand how temperatures have changed since the late 18th century. This longer record will give a better understanding of natural climate variations, both variability generated internally within the climate system and that due to external forcing factors such as volcanic eruptions and solar changes. This improved understanding of natural variability will enable us to more cleanly isolate the characteristic "fingerprints" of man-made climate change allowing us to more confidently detect and attribute human-induced changes

Planned Impact

The most recent Climate Change Public Attitude Tracking Survey in the UK found that
74% of respondents were either 'very' or 'fairly' concerned about climate change. Global political concern over the issue is reflected by the Paris Agreement negotiated by 196 parties at the 21st Conference of Parties in 2015. The UK has committed to reducing greenhouse emissions by 80% by 2050. Clean growth is a key element of the government's industrial strategy.

The most visible measure of climate change for both policymakers and the public is the historical temperature record, which measures how global temperatures have changed over the past 170 years, and shows periods of rapid warming in the early 20th century and for the whole of the last 50 years. However the record is not long enough to give a complete picture of temperature change since before the start of the industrial revolution, or how exceptional recent temperature change is in comparison to pre-industrial conditions. The current historical temperature record mixes air and water temperatures in a way which creates confusion when comparing with climate model predictions.

This project will produce a new global surface air temperature dataset starting in around 1780 - at least 70 years longer than any other and stretching back almost to the start of the industrial revolution. The length of the record, the use of air temperatures for both land and oceans, and improvements to our understanding of the historical measurements will make this new dataset the benchmark for the understanding human impact on the climate for both policymakers and the general public.

Temperature data underpins national and international policy on climate change. The UN Framework Convention on Climate Change (UNFCCC) process will benefit from improved quantification of temperature change since the pre-industrial period to feed into their regular 'global stocktake' which will monitor progress towards achieving the aims of the Paris Agreement to limit global temperature change to 1.5 or 2C above pre-industrial levels.

A range of datasets with differing spatial and temporal resolutions will be produced to represent improving data coverage over time, and incorporating new analyses of the level of confidence in the data at any point in time. This information will feed into national & international climate assessments, such as the annual State of the Climate Report, the UK Climate Change Risk Assessments, and future Intergovernmental Panel on Climate Change reports. The climate science community will also benefit from an improved understanding of natural climate variations, such as those due to volcanic eruptions and internal ocean-atmosphere interactions: the new dataset will be widely used to compare to other observational datasets, weather model reanalyses and climate model simulations. This will improve our understanding of past climate change and weather extremes, and implications for future change, providing data for risk management in both government and industry.

An important part of GloSAT will be the digitisation of historical observations from their current paper or scanned image formats. Much of this will be achieved through public engagement and citizen science digitisation, building on the highly successful WeatherRescue.org project, which has already rescued more than 2.5 million weather observations using thousands of volunteers. The rescued observations will be added to international weather observation databases such as ICOADS, ISTI, ISPD and the Copernicus Climate Data Store for the entire climate community to use. The use of citizen science for digitisation will also be exploited as an opportunity to engage the public in science and to communicate climate science.

Publications

10 25 50
 
Title GloSAT Historical Measurement Table Dataset 
Description Dataset containing 500 scanned historical measurement table documents from ship logs and land measurement stations. Data spans 200 years and has been sampled using a maximum variation sampling strategy to capture as many different styles of measurement table as possible. There is a mixture of handwritten, typed and combinations of the two. There are tables with bordered, semi-borders and borderless. Table region (header, body, full table, cells) annotations provided in a way designed to allow finergrained table detection and table structure recognition models to be trained and tested. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? No  
Impact Dataset is currently under peer review and will be released once published. This dataset will bring a significant new resource to the academic table detection community and promote research into finer grained table analysis than has been attempted previously. This dataset will allow the GloSAT data rescue team to build NLP models to process 100,000's of previously unaccessible historical measurement log book pages. That will in turn provide better datasets for GloSAT climate change models, which will in the next 2-3 years contribute directly to revised and improved climate change predictions (as input to UK and global govermental decision making around climate change issues). 
URL https://github.com/stuartemiddleton/glosat_table_dataset
 
Title rafamestre/Multimodal-USElecDeb60To16: v1.0.0 
Description Dataset and codes for the Multimodal USElecDeb60To16 dataset, released in the paper "Augmenting pre-trained language models with audio feature embedding for argumentation mining in political debates", published at the Findings of the 17th conference on European chapter of the Association for Computational Linguistics (EACL) in 2023. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact None so far 
URL https://zenodo.org/record/7628465
 
Description Collaboration with Kew Gardens for tech transfer (via funded interns) of NLP research around Illegal Wildlife Trade 
Organisation Royal Botanic Gardens
Country United Kingdom 
Sector Public 
PI Contribution I have supervised 3 University of Southampton Computer Science interns (working at Kew Gardens) and acted as expert advisor to Kew Gardens. This is part of an ongoing collaboration to deliver tech transfer of NLP research (originating from FloraGuard project) to Kew Gardens and development of proof of concept downstream applications by Kew Gardens which hopefully will create significant impact on how Illegal Wildlife Crime is managed by the UK and internationally (e.g. one of our longer term targets is adoption of our approach by the UK's National Wildlife Crime Unit).
Collaborator Contribution Kew have secured funding (source DEFRA grants) for downstream work around Illegal Wildlife Trade. This includes budget in this for myself (consultancy - I am named on grants) and 3 interns (£6000 each for easter/summer work, £18000 total). More is planned in future. Kew Gardens are also collaborating as experts in a locally funded (£10k) project to extract ingredients (focus illegal wildlife trade) from online auction images - this work on multimodal NLP is feeding into GloSAT grant work.
Impact Collaboration is multi-disciplinary (Computer Science + Social Science + Conservation Science). Software output >> Crawler software + Intelligence Visualization software [open source via github]
Start Year 2021
 
Title GloSAT table detection and table structure recognition models 
Description Algorithms for table detection, automatically detecting measurement table regions within scanned document images - a CascadeTabNet based model with (a) finetuning (b) post-processing Algorithms for table structure recognition, automatically detecting measurement table rows/columns/cells within scanned document images - a CascadeTabNet based model with (a) finetuning (b) post-processing - projection histogram method 
Type Of Technology Software 
Year Produced 2021 
Impact These algorithms will be used to segment scanned images of historical documents containing measurements from ship logs and land station logs. These algorithms will encourage the table detection academic community to develop finer grained models allowing extraction of more semantically relevant information from tables within documents. Code will be released open source once work is published (submission under peer review now). The algorithms will also be used by the GloSAT project to develop NLP models able to rebuild the textual information contained within historical air temperature measurement tables, which will in the next 2-3 years allow GloSAT climate models to improve the predictions presented to UK goverment and goverments worldwide. 
URL https://github.com/stuartemiddleton/glosat_table_dataset
 
Title Multimodal USElecDeb60To16 
Description Software and models that accompany USElecDeb60To16 dataset published at EACL 2023 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact None so far 
URL https://doi.org/10.5281/zenodo.5653503
 
Description Invited Seminar for University of Sheffield Computer Science Dept 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact Invited virtual seminar on my NLP research to Computer Science Dept (~50 people attended, reached ~100 via on demand viewing). Academics and postgraduate students attended.

Part of an ongoing engagement I have with academics from University of Sheffield NLP research group where my PDRA's + PhD's team visit about 3 times a year.
Year(s) Of Engagement Activity 2021
 
Description Invited Talk to International Atmospheric Circulation Reconstructions over the Earth (ACRE) initiative 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Presentation of my work on Automatic Reading of Tabulated Data for Data Rescue to ACRE 2021. Triggered dialogue and collaboration (data sharing) between UK's Met Office and US NOAA in context of GloSAT project.
Year(s) Of Engagement Activity 2021
URL http://www.met-acre.org/
 
Description Invited talk to local secondary school Embley - Independent Day & Boarding School in Hampshire 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach Local
Primary Audience Schools
Results and Impact 50 pupils (good gender mix) attended for a school seminar presenting my NLP research, which sparked questions and discussion afterwards. Very successful, students were inspired and I was invited back to give a talk next year.
Year(s) Of Engagement Activity 2023
 
Description Presentation of GloSAT project to UoS and NERC Senior Leadership Team 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Policymakers/politicians
Results and Impact Presented GloSAT project to senior leadership of NERC and University of Southampton (UoS).

Audience included Robyn Thomas (NERC Associate Director, Operations and Research Careers), Hannah Lacey (NERC Public Engagement Programme Manager), Mark Spearing (UoS interim Vice-Chancellor) and other senior leadership people.

Raised awareness for GloSAT and sparked some excitement around the possibilities of this project.
Year(s) Of Engagement Activity 2019
 
Description Turing @ Southampton showcase event - expert talk on 'Human in the loop AI' 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Turing showcase event hosted by University of Southampton but attended by UK-wide Turing community. Reach was 50-100 people (physically and online). My expert talk led to about 10 followon discussions with postgrads and early career researchers, and I was invited to give the talk again at another local early career researcher event in 2023.
Year(s) Of Engagement Activity 2022