📣 Help Shape the Future of UKRI's Gateway to Research (GtR)

We're improving UKRI's Gateway to Research and are seeking your input! If you would be interested in being interviewed about the improvements we're making and to have your say about how we can make GtR more user-friendly, impactful, and effective for the Research and Innovation community, please email gateway@ukri.org.

Deep Neural Networks for Real-Time Spectroscopic Analysis

Lead Research Organisation: Newcastle University
Department Name: Sch of Natural & Environmental Sciences

Abstract

Scientific breakthroughs are often strongly associated with technological developments, which enable the measurement of matter to an increased level of detail. A modern revolution is underway in X-ray spectroscopy (XS), driven by the transformative effect of next-generation, high-brilliance light sources e.g. Diamond Light Source and the European X-ray Free Electron Laser and the emergence of laboratory-based X-ray spectrometers. Alongside instrumental and methodological developments, the advances enabled in X-ray absorption (XAS) and (non-)resonant emission (XES and RXES/RIXS) spectroscopies are having far-reaching effects across the natural sciences. However, these new kinds of experiments, and their ever-higher resolution and data acquisition rates, have brought acutely into focus a new challenge: How do we efficiently and accurately analyse these data to ensure that valuable quantitative information encoded in each spectrum can be extracted?

The high information content of an XS, demands detailed theoretical treatments to link the spectroscopic observables to the underlying geometric, electronic and spin structure. However, this is a far from trivial task. A prime example is found in the XS of disordered systems, e.g. in operando catalysts, in which the spectrum represents an average signal recorded from many inequivalent absorption sites. The disorder of the system must be modelled for a quantitative analysis, but to treat theoretically every possible chemically inequivalent absorption site (or even to sample a meaningful number of such sites) is computationally challenging, resource-intensive, and time-consuming. It is presently out of reach for the majority of XS end-users and, for the most complex systems, even expert theoreticians. To add to this, it is not always apparent to end-users: a) how to apply the most appropriate theoretical treatments, or b) where more insight might be attainable from the data by their application. Consequently, the status quo is to rely heavily on empirical rules, e.g. the scaling of absorption edge position with oxidation state, or to collect reference spectra and use linear combinations of these to fit the absorption profile. As long as this status quo is unchallenged, the many XS experiments remain useful for little more than fingerprinting, and a wealth of valuable quantitative information is left unexploited, ultimately limiting our understanding.

The objective of this fellowship proposal is to develop and subsequently equip researchers with easy-to-use, computationally inexpensive, and accessible tools for the fast and automated analysis and prediction of XS. We will optimize and deploy deep neural networks (DNNs) capable of providing instantaneous predictions of XS for arbitrary absorption sites, introducing a step change in ease and accuracy of the XS data analysis workflow. Using DNNs, it is possible to reduce the time taken to predict XS data from hours/days to seconds, democratise data analysis, open the door to the development of new high-throughput XS experiments, and allow end users to plan and utilise better their beamtime allocations by facilitating on-the-fly 'real-time' analysis/diagnostics for XS data.

Publications

10 25 50

publication icon
Milne CJ (2023) Disentangling the evolution of electrons and holes in photoexcited ZnO nanoparticles. in Structural dynamics (Melville, N.Y.)

publication icon
Penfold T (2024) Exploring the Influence of Approximations for Simulating Valence Excited X-ray Spectra in The Journal of Physical Chemistry A

publication icon
Verma S (2023) Uncertainty quantification of spectral predictions using deep neural networks. in Chemical communications (Cambridge, England)

 
Description Computational spectroscopy has emerged as a critical tool for researchers looking to achieve both qualitative and quantitative interpretations of experimental spectra. Over the past decade, increased interactions between experiment and theory have created a positive feedback loop that has stimulated developments in both domains. In particular, the increased accuracy of calculations has led to them becoming an indispensable tool for the analysis of spectroscopies across the electromagnetic spectrum. This progress is especially well demonstrated for short-wavelength techniques, e.g. core-hole (x-ray) spectroscopies, whose prevalence has increased following the advent of modern x-ray facilities including third-generation synchrotrons and x-ray free-electron lasers. While calculations based on well-established wavefunction or density-functional methods continue to dominate the greater part of spectral analyses in the literature, emerging developments in machine-learning algorithms are beginning to open up new opportunities to complement these traditional techniques with fast, accurate, and affordable 'black-box' approaches.

In this award, we have developed the XANESNET code. Based upon machine learning techniques, this code is able to decode the dense information content of modern X-ray spectra satisfactorily, yet at the same time - fast, affordable, and accessible enough to appeal to researchers.

Our XANESNET code address two fundamental challenges: the so-called forward (property/structure-to-spectrum) and reverse (spectrum-to-property/structure) mapping problems. The forward mapping approach is similar to the approach used by computational researchers in the sense that an input structure is used to generate a spectral observable. In this area the objective of XANESNET is to supplement and support analysis provided by first principles quantum mechanical simulations. The reverse mapping problem is perhaps the more natural of the two, as it has a clear connection to the problem that X-ray spectroscopists face day-to-day in their work: how can a measurement/observable be interpreted? Here we are seeking to provide methodologies in allow the direct extraction of properties from a recorded spectrum.
Exploitation Route Our findings and underlying code can be used by:
i) Experimentalists seeking to interpret their results
ii) Theoreticians seeking to developing their understanding and improving the performance of machine learning models.
Sectors Energy

Environment

 
Description UK High-End Computing Consortium for X-ray Spectroscopy (HPC-CONEXS)
Amount £371,871 (GBP)
Funding ID EP/X035514/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 01/2023 
End 12/2026
 
Title A Deep Neural Network for Valence-to-Core X-ray Emission Spectroscopy 
Description This data is used to extend our XANESNET deep neural network (DNN) to predict the lineshape of first-row transition metal K-edge valence-to-core X-ray emission (VtC-XES) spectra. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Data set for machine learning for X-ray emission. 
URL https://data.ncl.ac.uk/articles/dataset/A_Deep_Neural_Network_for_Valence-to-Core_X-ray_Emission_Spe...
 
Title Accurate, Affordable, and Generalisable Machine Learning Simulations of Transition Metal X-ray Absorption Spectra using the XANESNET Deep Neural Network 
Description Data Supporting the Publication of the same title. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Training sets for machine learning for X-ray absorption spectroscopy. 
URL https://data.ncl.ac.uk/articles/dataset/Accurate_Affordable_and_Generalisable_Machine_Learning_Simul...
 
Title An On-the-Fly Deep Neural Network for Simulating Time-Resolved Spectroscopy: Predicting the Ultrafast Ring Opening Dynamics of 1,2-Dithiane 
Description Revolutionary developments in ultrafast light source technology are enabling experimental spectro-scopists to probe the structural dynamics of molecules and materials on the femtosecond timescale. The capacity to investigate ultrafast processes afforded by these resources accordingly inspires the-oreticians to carry out high-level simulations which facilitate the interpretation of the underlying dynamics from ultrafast experiments. In this Article, we implement a Deep Neural Network (DNN) to convert excited-state molecular dynamics simulations into time-resolved spectroscopic signals. Our DNN is trained on-the-fly from first-principles theoretical data obtained from a set of time-evolving molecular dynamics. The train-test process iterates for each time-step of the dynamics data until the network can predict spectra with sufficient accuracy to replace the computationally intensive quan-tum chemistry calculations required to produce them, at which point it simulates the time-resolved spectra for longer timescales. The potential of this approach is demonstrated by probing dynamics of the the ring opening of 1,2-dithiane using sulphur K-edge X-ray absorption spectroscopy. The benefits of this strategy will be more markedly apparent for simulations of larger systems which will exhibit a more notable computational burden, making this approach applicable to the study of a diverse range of complex chemical dynamics. 
Type Of Material Database/Collection of data 
Year Produced 2023 
Provided To Others? Yes  
Impact Training sets for machine learning for on-the-Fly Deep Neural Network for Simulating Time-Resolved Spectroscopy: Predicting the Ultrafast Ring Opening Dynamics of 1,2-Dithiane 
URL https://data.ncl.ac.uk/articles/dataset/An_On-the-Fly_Deep_Neural_Network_for_Simulating_Time-Resolv...
 
Title Training Data for Machine Learning Models 
Description A machine learning model is only as good as the training set used to develop it. Here you can find the training sets used in our work in a format compatible with the XANESNET code. Structure to Spectrum Training Sets XANES of the 3d transition metals (XAS-3dtm) Our reference datasets comprise site geometries ('samples') of first-row transition metal (Ti-Zn) complexes harvested from the transition metal Quantum Machine (tmQM) dataset . The dataset for each first-row transition metal comprised all of the structures from the tmQM dataset containing that element, as extracted from the 2020 release of the Cambridge Structural Database (CSD) and subsequently optimised at the GFN2-xTB level of theory. The tmQM dataset was initially generated by applying seven filters to exclude: (i) all structures except those containing a single transition metal; (ii) all structures not containing a minimum of one C and one H atom (allowing only these other elements: B, Si, N, P, As, O, S, Se, F, Cl, Br, and I); (iii) the structure of all extraneous molecular components beyond that of the transition metal complex (e.g. counter-ions); (iv) all polymeric structures; (v) all structures without three-dimensional coordinates; (vi) all structures with disordered atoms; and (vii) all structures with charges greater than +1 and less than -1. Full details of the construction and composition of the tmQM dataset can be found XANESNET here. For the XAS set, the K-edge XANES spectra ("labels") for these structures were calculated using multiple scattering theory (MST) as implemented in the FDMNES package. We have developed nine independent reference datasets, one for each first-row transition metal (Ti-Zn) x-ray absorption edge; the number of samples contained in the reference datasets scales from as few as ~1100 (V) to ~8660 (Ni). Further details can be found at CD Rankine and TJ Penfold. Journal of Chemical Physics 156 (2022) 164102. Valence to Core XES of the 3d transition metals (XES-3dtm) For the XES training data, the VtC-XES spectra ('labels') for all of the structures in our reference datasets were calculated using a quasi-one-electron approach implemented in the ORCA quantum chemistry package. All VtC-XES spectrum calculations used the TPSSh exchange and correlation density functional and the def2-SVP basis set. The light-matter interaction was described using the electric dipole, magnetic dipole, and electric quadrupole transition moments. After calculation, each VtC-XES spectrum was broadened using a Voigt function containing Lorentzian and Gaussian components. The Gaussian component reflects the limited experimental resolution of VtC-XES and had a fixed width of 1.5 eV in all cases. The Lorentzian component reflects the effect of core-hole lifetime broadening and is a sum of the 1s and 2p core-hole lifetime broadening for each element, i.e. it is element-dependent. Further details can be found at TJ Penfold and CD Rankine. Molecular Physics 121 (2023) e2123406. XANES of the Sulphur K-edge including electronic character (S-Kedge) Our reference datasets comprise X-ray absorption site geometries ("\textit{samples}") of small organic molecules complexes containing a single sulphur extracted from the GBD13 dataset. All molecules have <25 atoms giving a total number of samples of 134,877 Sulphur K-edge spectra. Sulphur K-edge XAS spectra ("{labels}") for all of the structures in our reference datasets were calculated using a Restricted Excitation Window Time-Dependent Density Functional Theory (REW-TDDFT) as implemented in the ORCA quantum chemistry package. All calculations were performed using the BP86 exchange and correlation functional. Scalar relativistic effects were described using a Douglas-Kroll-Hess (DKH) Hamiltonian of 2nd^{nd}nd order. In all calculations a DKH-def2-TZVP basis set was used. The light-matter interaction was described using the electric dipole, magnetic dipole, and electric quadrupole transition moments \cite{debeer2008prediction}. After calculation, each spectrum was broadened using a Gaussian function with a fixed width of 1.0 eV in all cases. A final pre-processing step was carried out to scale the target spectra for each reference dataset into the 0-1 interval independently by dividing through by the largest calculated cross-section in that reference dataset. The structures are provided in the form of the input p-DOS+wACSF descriptor with a length of 121. The first 80 elements are associated with the electronic p-DOS descriptor while the remaining elements are associated with the nuclear wACSF descriptor. Further details can be found at t.b.c. Spectrum to Structure Training Sets Our reference dataset contains 36,657 spectra-structure pairs. The structures are in the form of the wACSF descriptor and can be read as described in the file. This dataset incorporates 77 of the elements from the periodic table and molecules with a coordination number, defined as the number of atoms within 2.5 Angstrom of the absorbing atom, between 2 and 16. The Fe K-edge XANES spectra ("labels") for these structures were calculated using multiple-scattering theory (MST) within the muffin-tin approximation as implemented in the FEFF package. The calculation used a self-consistent potential and full multiple scattering up to a radius of 6 Angstrom around the absorbing atom. After calculations, the absorption cross-sections were resampled via interpolation into 475 points over an energy range of 7112-7160 eV. This dataset also includes 22 spectra-structure pairs associated with experimental data to assess the performance of the network when applied to experimental data. Further details can be found at T David, NKN Aznan, K Garside and TJ Penfold Digital Discovery 2 (2023) 1461-1470. Delta-Learning Training Sets This reference datasets comprise of 1124 x-ray absorption site geometries of Rhodium complexes harvested from the transition metal Quantum Machine (tmQM) dataset. The Rh L3-edge spectra for all of the structures in our reference datasets were calculated using a Restricted Excitation Window Time-Dependent Density Functional Theory (REW-TDDFT) as implemented in the ORCA quantum chemistry package. All spectra were computed twice using the BLYP and B3LYP exchange and correlation density functionals, with the difference between the two simulations used for training. It is noted that the choice of functional will systematically influence the absolute transition energies calculations and therefore before taking the difference, all the spectra calculated using BLYP and B3LYP were shifted by +19.5 and -5.5 eV respectively to match the absolute energy of the experimental white line. While this constant spectral shift applied to the whole training set could be a limitation to other types of spectroscopy, in the present case of x-ray spectroscopy, because the transitions derive from core orbitals, which are not involved in bonding and remain largely unchanged for different molecules, this approach ensure consistency for each sample. Scalar relativistic effects were described using a Douglas-Kroll-Hess (DKH) Hamiltonian of 2nd order.45 In all calculations an aug-cc-pVTZ-DK basis set was used for the Rh and all other elements used a DKH-def2-SVP basis set. The light-matter interaction was described using the electric dipole, magnetic dipole, and electric quadrupole transition moments.44 After calculation, each spectrum was broadened using a Gaussian function with a fixed width of 1.5 eV in all cases Further details can be found at L Watson, T Pope, RM Jay, A Banerjee, P Wernet and TJ Penfold Structural Dynamics 10 (2023) 064101. 
Type Of Material Database/Collection of data 
Year Produced 2022 
Provided To Others? Yes  
Impact Makes developing and building upon the progress we have made easier. 
URL https://gitlab.com/team-xnet/training-sets
 
Description J. Olof Johansson 
Organisation University of Edinburgh
Country United Kingdom 
Sector Academic/University 
PI Contribution Experimental data, using X-ray ray free electron lasers, to study time-resolved X-ray absorption and emission on Mn small molecular magnets.
Collaborator Contribution Performing the experiments and providing the data.
Impact In progress.
Start Year 2023
 
Description Jon Marangos on Organic Photovoltaics 
Organisation Imperial College London
Country United Kingdom 
Sector Academic/University 
PI Contribution We are performing quantum chemistry calculations on the X-ray absorption spectra at the N- and F- K-edges to follow the photoexcited dynamics of emerging photovoltaic materials. These measurements are performed at the LCLS X-ray Free Electron Laser.
Collaborator Contribution Performing the experiments at the LCLS X-ray Free Electron Laser.
Impact In progress
Start Year 2024
 
Description Phillippe Wernet 
Organisation Lund University
Country Sweden 
Sector Academic/University 
PI Contribution A collaboration on the analysis of time-resolved x-ray data has been emerged, with the focus of applying our machine learning to time-resolved data.
Collaborator Contribution Provided data.
Impact In progress
Start Year 2023
 
Description Prof. Jenny Lockard 
Organisation Rutgers University
Country United States 
Sector Academic/University 
PI Contribution Theory and Computations using the machine learning models for the analysis of X-ray spectroscopy.
Collaborator Contribution Providing experiment data for analysis and interpretation.
Impact Paper in preparation
Start Year 2023
 
Title Open Training Sets for Machine Learning Models on X-ray spectroscopy 
Description A machine learning model is only as good as the training set used to develop it. Here you can find the training sets used in our work in a format compatible with the XANESNET code. In total 5 training sets, with over 1 million spectra and structures are provided. 
Type Of Technology Software 
Year Produced 2023 
Open Source License? Yes  
Impact None known at present. 
URL https://gitlab.com/team-xnet/training-sets
 
Title XANESET 
Description We have developed and deployed a deep neural network-XANESNET-for predicting the lineshape of first-row transition metal K-edge x-ray absorption near-edge structure (XANES) spectra. XANESNET predicts the spectral intensities using only information about the local coordination geometry of the transition metal complexes encoded in a feature vector of weighted atom-centered symmetry functions. 
Type Of Technology Software 
Year Produced 2022 
Open Source License? Yes  
Impact We think that the theoretical simulation of X-ray spectroscopy (XS) should be fast, affordable, and accessible to all researchers. The popularity of XS is on a steep upward trajectory globally, driven by advances at, and widening access to, high-brilliance light sources such as synchrotrons and X-ray free-electron lasers (XFELs). However, the high resolution of modern X-ray spectra, coupled with ever-increasing data acquisition rates, brings into focus the challenge of accurately and cost-effectively analyzing these data. Decoding the dense information content of modern X-ray spectra demands detailed theoretical calculations that are capable of capturing satisfactorily the complexity of the underlying physics but that are - at the same time - fast, affordable, and accessible enough to appeal to researchers. This is a tall order - but we're using deep neural networks to make this a reality. 
URL https://gitlab.com/team-xnet/xanesnet
 
Title webCONEXS 
Description We have released a web portal where researchers that are not experts in simulations can perform some very simple XANES simulations. Through the web portal, three codes can be run: FDMNES, ORCA and Quantum Expresso. There are instructions of how to access the web-CONEXS and how to run the different codes available at Diamond light source when users are granted beam time. To run web-CONEXS, you need to have a federal ID and password, and also have had a proposal accepted in one of the spectroscopy beamlines at some point in the past. 
Type Of Technology Webtool/Application 
Year Produced 2023 
Open Source License? Yes  
Impact Too early to tell.