An EPSRC National Research Facility to facilitate Data Science in the Physical Sciences: The Physical Sciences Data science Service (PSDS)

Lead Research Organisation: University of Southampton
Department Name: Sch of Chemistry

Abstract

The modern physical scientist cannot perform their research without generating significant quantities of data, having recourse to related/prior data, significant data analysis and integrating results with other data. This requires a range of skills and resources that are not available to the majority of physical scientists. There is therefore an urgent need in the physical sciences for providing access to data and integrating them with data science approaches. This requires building a new skills base that enables and empowers working in a data science way.

The Physical Sciences Data-science Service (PSDS) will provide a single place where existing databases, open data sources and data that is still being worked on can be stored and searched in a unified way. This means that it will become trivial to find and combine different types of physical sciences data - from details on structure to measured physical properties of materials. It will also make possible instant comparison of and context for experiment data with that already available.

This is just the start however. There is enormous potential for being able to perform data science across all of these data, that is for example, Machine Learning and Artificial Intelligence approaches, which are becoming a new avenue of research in their own right.

It is vital that data science becomes a routine tool for all physical scientists. For many this will mean learning new skills. The PSDS will therefore develop a training programme around the four main competencies (statistics, programming/tools, computational methods & data visualisation) required to perform data science. Identified links with networks and postgraduate training will enable PSDS users to gain deeper skills in various aspects of data science.

The long-term aim is for the PSDS, and therefore data science, to become a seamless, key part of the research infrastructure for physical scientists.

Planned Impact

The purpose of the service is to facilitate new data science approaches to research for a broad cross-section of the physical sciences research community. The impact on the communities and disciplines within physical sciences will be very significant. By providing data resources and analysis processes that would not otherwise be available to the researcher, entirely new results or avenues of research will open up. This will be further enhanced due to the way in which it will be possible to blend and analyse data across different resources. This is not just a matter of breaking down silos, but rather fuelling far-reaching research. The vision is that large aggregations of data can be generated and systematically analysed using machine learning or artificial intelligence approaches. This would effect a paradigm shift in the way in which physical sciences discovery and analysis is performed, so resulting in new science that quite simply could not be achieved by traditional methods.

Early career researchers will be significantly impacted by this service. Our training programme will be particularly aimed at this audience, taking those with a sound, but traditional, physical science background and exposing them to new, emergent methods that promise to revolutionise the subject area. It is the earlier generation of researchers that are best poised to take advantage of this new approach and our training programme is founded in a data science training framework designed specifically to develop researchers in this new direction - we will provide the first rungs on the ladder and link into networks and training schemes that can develop trainees further. We expect a new type of researcher to result from this approach.

The providers of data resources will be particularly impacted. In an open environment where new aggregations of data can be developed on the fly depending on a specific research question, there is enormous opportunity. Data that is currently siloed and therefore only useful to narrow communities will now be available for exploitation in a multitude of new ways that were previously unimagined. The ability for these new collections of data to be mined or systematically analysed provides further opportunity for data providers as well as users.

These new approaches that liberate data and make it available for large scale systematic analysis will ultimately benefit the materials science and chemical manufacturing industries. Time to discovery will be reduced, optimisation and efficiency in manufacture enabled, and new and more elaborate property-driven products engineered. The UK high-value industry needs many more data-science minded researchers and this service will provide a 'low barrier to entry' method for a large number of traditionally educated researchers to engage.

Publications

10 25 50
 
Title AI3SD Video: A Career in Chemistry & Beyond 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 28 external views in addition to being part of our Skills4Scientists Series. This was a collaboration between AI3SD and PSDS. 
URL https://eprints.soton.ac.uk/451125/
 
Title AI3SD Video: All's Fair in love and data management 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #1 - Research Data Management Session, which focussed on several areas of good data management practices. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 46 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL http://eprints.soton.ac.uk/id/eprint/450266
 
Title AI3SD Video: Building your professional contacts - Networking for Scientists and/or Introverts 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the third talk in the Skills4Scientists #6 - Careers 1 Session, which focussed on on several areas of careers advice that will be useful to you as you complete your studies and begin your careers. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 23 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/451153/
 
Title AI3SD Video: Collaborative Data Management 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #1 - Research Data Management Session, which focussed on several areas of good data management practices. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 50 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL http://eprints.soton.ac.uk/id/eprint/450268
 
Title AI3SD Video: Collaborative Reports/Presentations 
Description his talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the fourth talk in the Skills4Scientists #5 - Posters, Presentations & Reports Session, which focussed on several areas of communication for your research; presentations, posters and reports. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 18 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450846/
 
Title AI3SD Video: Cultivating your Web Presence 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #6 - Careers 1 Session, which focussed on on several areas of careers advice that will be useful to you as you complete your studies and begin your careers. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 28 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450840/
 
Title AI3SD Video: Data Analysis Case Study 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the third talk in the Skills4Scientists #4 - Intro to Python 2 Session, which was a follow on from our Intro to Python 1 course, with a focus on working further with the core elements of Python and performing data analysis, using Jupyter notebooks and Anaconda. This course is designed to allow you to follow along with the content and examples as the course goes, but you will also be provided with course material to allow you to cover it again after the live event. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 78 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450568/
 
Title AI3SD Video: Data Generation, Data Standards and Metadata Capture in Drug Discovery 
Description Biomedical research and drug discovery are based on a continuous cycle of scientific findings being made, refined, and translated into new treatments. However, over recent years it has become clear that only a fraction of all published research findings are actually reproducible, causing waste and delays in our efforts to bring new drugs to patients. The answer is changing the way we generate and capture data, including experimental metadata. Especially in light of the increasing role of Artificial Intelligence in drug discovery, it is critical to rethink the way we approach data generation as the most important input for AI-driven drug discovery. The talk will address these recent advances in data and metadata capture based on fully automated experimentation and novel data standards. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 166 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://eprints.soton.ac.uk/447526/
 
Title AI3SD Video: Data legislation, personal and non-personal data, ethical issues and protecting your IP rights 
Description Scientists and researchers handle vast amounts of data in the course of their work, and in recent years technology and computational power has revolutionised the ability to create, store and analyse data. Scientific research increasingly now requires scientists to be skilled in computational analysis and the ability to work with algorithms, as they deal with larger and larger data sets. The handling of data in science brings with it legal considerations in relation to data protection and intellectual property rights. This talk will give an overview of the data legislation that applies to scientists and researchers including guidance from data protection authorities and decisions taken by courts, the differences between personal and non-personal data, the ethical issues involved in the use of algorithms and how you can protect your intellectual property rights. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 99 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL http://eprints.soton.ac.uk/id/eprint/447091
 
Title AI3SD Video: Data publication - a personal tale 
Description In this talk, I will discuss the theory and practice of data publication both from the perspective of an academic journal editor, but also as a scientific researcher who created datasets, and who got scooped. I'll touch on the importance of data management and data citation, and give an overview of how data publication has grown over the past years, and where we want to be heading in the future. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 47 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://eprints.soton.ac.uk/447368/
 
Title AI3SD Video: Digitising your Chemistry for Recordability, Shareabilty and Reproducibility 
Description Mark's talk focused on three specific areas as to how you can digitise your data and workflows to improve your productivity, increase discovery of your data and make your research more reproducible. These tips were broken down into smaller areas in which you could implement them, with examples taken from the chemistry and life sciences domains. The three main areas which Mark included in his tips were: binning the old fashioned write up, collecting data throughout the whole experiment and sharing your data in accessible and transferable formats. The talk gave examples using the Digital GlasswareTM products offered by DeepMatter, in addition to ways to incorporate the tips in different systems. Mark concludes his talk by commenting on the large proportion of science that is currently irreproducible and the ways in which human interaction introduces opportunities for error. These tips aim to resolve these issues, increasing the reproducibility of science and reducing the errors. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 200 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://eprints.soton.ac.uk/447531/
 
Title AI3SD Video: Ethical data management - balancing individual privacy and public benefit 
Description This talk will cover aspects of ethical data management, focussing on the key issues of participant consent, data minimisation, and data anonymisation, using examples from health sciences and engineering. Content within the talk aims to cover: big picture issues (societal benefits to data sharing versus individual right to privacy), relevant legislation (GDPR, DPA 2018 and FoIA 2000), what happens when things go wrong, managing risk via informed consent, data minimisation and anonymisation (formal, statistical and functional) and best practice guidelines and tools. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 54 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL http://eprints.soton.ac.uk/id/eprint/447090
 
Title AI3SD Video: GitHub & LaTeX Demo 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the third talk in the Skills4Scientists #3 - Version Control and LaTeX Session, which focussed on focus on teaching the basics of LaTeX and version control. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 92 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450565/
 
Title AI3SD Video: Giving your Open Data the best chance to realise its potential 
Description Chris is not a researcher, but he's worked with a lot of them over 23 years at the University of Southampton. He's seen a lot of hard work on open data fail to achieve the potential it could have. A common issue is that rather than face the reality of what's going wrong, it's easier to invest more in the aspects of your dataset and data service that are good than to identify and fix aspects that are bad. Such "hygiene factors" don't have to be perfect but they must all be good enough. Failure in any one may lead to failure of your data to achieve its potential, no matter how well you do on other factors. Chris will give some examples of the most common open data hygiene factors, and some tips from the public sector open data community on how to address them pragmatically. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 65 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://eprints.soton.ac.uk/447527/
 
Title AI3SD Video: Intro to Ethics 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #7 - Ethical Research event. This talk provided a brief intro to some of the areas of consideration when looking at the ethical conduct of research. This is not any form of formal ethics training and if you wish to learn more about ethics you should contact the relevant departments at your institution. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 31 external views in addition to being part of our Summer Seminar Series. This was the first online series we created to engage with our Network members worldwide during the COVID-19 pandemic. This series helped launch our YouTube channel which now has over 500 subscribers, and greatly increased our Network membership. 
URL https://eprints.soton.ac.uk/451137/
 
Title AI3SD Video: Intro to LaTeX 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #3 - Version Control and LaTeX Session, which focussed on focus on teaching the basics of LaTeX and version control. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 117 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450562/
 
Title AI3SD Video: Introduction to Git 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #3 - Version Control and LaTeX Session, which focussed on focus on teaching the basics of LaTeX and version control. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 76 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450563/
 
Title AI3SD Video: LaTeX in Overleaf 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the fourth talk in the Skills4Scientists #3 - Version Control and LaTeX Session, which focussed on focus on teaching the basics of LaTeX and version control. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 67 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450567/
 
Title AI3SD Video: Linked Data - Examples and Heuristics 
Description With their inherent flexibility and robustness to change, the decentralised interconnected knowledge graphs that lie at the heart of semantic web technologies are ideally suited for the challenges of converting the messy, often incomplete, and internally heterogeneous datasets of the Humanities into machine processable data. Although a matter of some debate, the reuse and adoption of known ontologies, schema, and taxonomies across disparate projects across the Arts, Humanities, and Social Sciences landscape has been steadily increasing over the last decade in particular. This talk will describe the practical approaches and heuristics of such Linked Data projects, commenting on the effect of political, institutional, and socio-cultural factors in their planning, implementation, and evaluation. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 78 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://eprints.soton.ac.uk/447528/
 
Title AI3SD Video: Love notes to the future: the importance of metadata 
Description Isobel's talk was focused on the importance of good research data management and how this can pay off in the future. This talk discussed four main aspects of data management: The data management plan, data storage, finding your data and sharing your data. Good data management is really important. Ultimately, managing your data well will save you time and effort in the future, making it easier to find, use, and distribute to others later on. A core part of data management is the data management plan, which should set out the full plan for what data is going to be gathered, how it is going to be catalogued and in what formats it is going to be stored in (paper/electronic/physical data). 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 119 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://eprints.soton.ac.uk/447529/
 
Title AI3SD Video: Pitfalls and Gotcha's with bioactivity data 
Description John's talk focused on his experiences working with bioactivity data and drug discovery research along with some of the problems and errors that people have encountered when working in this sphere. In the past researchers could reasonably know 'most' of the research within a field, but now we have much larger scale research, more participants and more data but without a lot of the groundwork being laid for good data sharing and reusability. Now there is a lot of messy data out there; inaccessible data, cryptic data and poorly described data. John talks about some of the bioactivity resources that are available for researchers and some of the successes that these data sources have had when handling large amounts of data. John gives plenty of tips about things to look out for when examining chemical and biological data, with plenty of examples. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 131 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://eprints.soton.ac.uk/447530/
 
Title AI3SD Video: Presenting in Person & Online 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #5 - Posters, Presentations & Reports which focussed on several areas of communication for your research; presentations, posters and reports. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 19 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450844/
 
Title AI3SD Video: Producing a good Poster 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the third talk in the Skills4Scientists #5 - Posters, Presentations & Reports Session, which focussed on several areas of communication for your research; presentations, posters and reports. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 44 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450845/
 
Title AI3SD Video: Publishing and Citing Data in Practice 
Description Sharing data, whether openly or on a more restricted basis, is increasingly expected of researchers in many areas. This is incredibly valuable for the scientific community but does take more work than simply putting the results in a drawer after the paper is published, so how can we make sure that the originators of data get full credit for their labour while ensuring that the data continues to be accessible in the long term? Data citation has a big role to play in the answer to this question, and this talk will give you an overview of the principles of data citation and how to implement them in practice. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Forms part of our YouTube Channel. Has received 611 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://eprints.soton.ac.uk/447369/
 
Title AI3SD Video: Setup, environments, installing packages, intro to Jupyter 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 122 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL http://eprints.soton.ac.uk/id/eprint/450248
 
Title AI3SD Video: Skills4Scientists - Poster & Careers Symposium - Poster Compilation 
Description This video forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video is a compilation of posters presented at the Skills4Scientists Posters & Careers Symposium. These poster presentations are predominantly from summer students involved in the AI3SD 2021 summer internship program. Higher resolution versions of the posters are available on the poster symposium website: https://www.ai3sd.org/s4s-symposium20...Not all poster presenters requested a recording of their talk. The following posters recordings are included in this compilation video. Poster 1 - Nearer the nearsightedness principle: Large-scale quantum chemical calculations - Andras Vekassy (University of Southampton) Poster 3 - Combining Ultrasonic Methods and Machine Learning Techniques to Assess Baked Products Quality - Erhan Gulsen (University of Nottingham) Poster 4 - Interactive Knowledge-Based Solvent Selection Tool - Hewan Zewdu (University of Nottingham)Poster 5 - CV in High Throughput Chemistry - Jamie Longino (University of Strathclyde)Poster 9 - Dewetting in Thin Liquid Films: Using Sparse Optimization to Learn Evolution Equations - Aspen Fenzl (University of Sheffield)Poster 12 - Creating a merged dataset and its exploration with different Machine Learning algorithms - Maximilian Hoffman (Freie Universität of Berlin)Poster 14 - Bayesian optimisation in Chemistry - Rubaiyat Khondaker (University of Cambridge) Poster 15 - A deep neural network for generation of functional organic materials - Rhyan Barrett (University of Warwick) Thank you to our sponsors Optibrium (https://www.optibrium.com/) and Dotmatics (https://www.dotmatics.com/) who supported this event. These poster presentations were live cartooned by ErrantScience (errantscience.com) which is also available on our YouTube Channel. Sections Intro: (0:00) Andras Vekassy - Nearer the nearsighted principle: Large-scale quantum chemical calculations: (0:17) Erhan Gulsen - Combining Ultrasonic Methods and Machine Learning Techniques to Assess Baked Products Quality: (06:11) Hewan Zewdu - Interactive Knowledge-Based Solvent Selection Tool: (12:09) Jamie Longino - CV in High Throughput Chemistry: (16:52) Aspen Fenzl - Dewetting in Thin Liquid Films: Using Sparse Optimization to Learn Evolution Equations: (21:21) Maximillian Hoffman - Creating a merged dataset and its exploration with different machine learning algorithms: (27:31) Rubaiyat Khondaker - Bayesian optimisation in Chemistry: (34:33) Rhyan Barrett - A deep neural network for generation of functional organic materials: (40:06) Further details from this series can be found at: https://www.ai3sd.org/skills4scientists 
Type Of Art Film/Video/Animation 
Year Produced 2022 
Impact Forms part of our YouTube Channel. Has received 34 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/451464/
 
Title AI3SD Video: Typing, Variables, Data Types & Functions 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #4 - Intro to Python 2 Session, which was a follow on from our Intro to Python 1 course, with a focus on working further with the core elements of Python and performing data analysis, using Jupyter notebooks and Anaconda. This course is designed to allow you to follow along with the content and examples as the course goes, but you will also be provided with course material to allow you to cover it again after the live event. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 50 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450614/
 
Title AI3SD Video: Using RDKit 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 2691 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450309/
 
Title AI3SD Video: Writing a CV 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #6 - Careers 1 Session, which focussed on on several areas of careers advice that will be useful to you as you complete your studies and begin your careers. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 47 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450847/
 
Title AI3SD Video: Writing a good Abstract & Best Practices for Scientific Communication 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #5 - Posters, Presentations & Reports which focussed on several areas of communication for your research; presentations, posters and reports. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 36 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/450843/
 
Title AI3SD Video: Writing an ethics application 
Description This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #7 - Ethical Research Session, which focussed on several areas of ethical research including discussions on why ethics is important and how to write an ethics application. 
Type Of Art Film/Video/Animation 
Year Produced 2021 
Impact Forms part of our YouTube Channel. Has received 34 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. 
URL https://eprints.soton.ac.uk/451154/
 
Title The (long) journey from supporting information to Publishing and Finding FAIR data in chemistry 
Description Electronic supporting information had its origins in the early to mid 1990s and it has evolved in a highly ad hoc manner since then. The concept of FAIR data arose about five years ago to try in part to rationalise the chaotic state of ESI. The talk will illustrate these developments by presenting a case study illustrating how one (either human or AI) might use the properties of FAIR to "F"ind some highly focused chemical spectroscopic and computational data. I will conclude by trying to unpick some of the supporting infrastructures which enable this and how the creators of the data facilitate this by using metadata to describe and then publish the data. The talk incorporates some elements of FAIR by having its own metadata and its own persistent identifier (as a DOI): https://doi.org/ff6g so that you can yourself Find, Access, Interoperate, Re-use and Cite it as appropriate. 
Type Of Art Film/Video/Animation 
Year Produced 2020 
Impact Part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. 
URL https://data.hpc.imperial.ac.uk/resolve/?doi=7629
 
Description Physical Sciences Data Infrastructure (PSDI) Phase 1 Pilot
Amount £1,002,306 (GBP)
Funding ID EP/W032252/1 
Organisation Engineering and Physical Sciences Research Council (EPSRC) 
Sector Public
Country United Kingdom
Start 11/2021 
End 03/2022
 
Title ChASe tool developed and deployed as a service. It compares the cost of chemical reagents across all major suppliers, ensuring academics can get their materials for research in an efficient manner - both timely and cost effective. 
Description ChASe tool developed and deployed as a service. It compares the cost of chemical reagents across all major suppliers, ensuring academics can get their materials for research in an efficient manner - both timely and cost effective. 
Type Of Material Improvements to research infrastructure 
Year Produced 2020 
Provided To Others? Yes  
Impact Economic and and efficiency benefits for PSDS users. 
URL https://www.psds.ac.uk/node/91
 
Title Propersea 
Description Propersea is a newly developed online resource designed to provide predictions for a range of molecular and physico-chemical properties for small molecules. There are 20 predicted properties, including: melting point, boiling point, density, logP, solubility, and polarizability. It will also predict the IUPAC name for the molecule; this feature required additional research and specialised code development. Propersea is integrated into the PSDS tool suite. 
Type Of Material Improvements to research infrastructure 
Year Produced 2021 
Provided To Others? Yes  
Impact Novel algorithm to derive IUPAC names from chemical identifiers based on machine learning and natural language processing. 
URL https://www.psds.ac.uk/propersea
 
Title Additional file 1 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier 
Description Additional file 1. 100 molecule test set including InChI and IUPAC names from PubChem, ACD /I-Labs, ChemAxon, Mestrelab and the transformer presented in the current paper. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Translating_the_InChI_adap...
 
Title Additional file 1 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier 
Description Additional file 1. 100 molecule test set including InChI and IUPAC names from PubChem, ACD /I-Labs, ChemAxon, Mestrelab and the transformer presented in the current paper. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Translating_the_InChI_adap...
 
Title Additional file 2 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier 
Description Additional file 2. Training script for OpenNMT-py. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Translating_the_InChI_adap...
 
Title Additional file 3 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier 
Description Additional file 3. InChI character vocabulary needed for training the model with OpenNMT. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Translating_the_InChI_adap...
 
Title Additional file 4 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier 
Description Additional file 4. IUPAC character vocabulary needed for training the model with OpenNMT. 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Translating_the_InChI_adap...
 
Title IUPAC Naming Algorithm 
Description PSDS has developed a novel machine learning model for generation of IUPAC names. This machine learning model that we have built is a sequence-to-sequence model that can predict the IUPAC name from the molecules InChI (International Chemical Identifier) string. The model uses two stacks of transformers in an encoder-decoder architecture, a setup similar to the neural networks used in state-of-the-art machine translation. Unlike neural machine translation, which usually tokenizes input and output into words or sub-words, our model processes the InChI and predicts the IUPAC name character by character. The model was trained on a dataset of 10 million InChI/IUPAC name pairs freely downloaded from the National Library of Medicine's online PubChem service and tested on a dataset of 200,000 compounds. Training took seven days on a Tesla K80 GPU, and the model achieved a test set accuracy of 90.7 %. 
Type Of Material Computer model/algorithm 
Year Produced 2021 
Provided To Others? Yes  
Impact Integrated into the PSDS offering and made available for use by the Academic Community. 
URL https://www.psds.ac.uk/propersea
 
Title InChI to IUPAC name machine learning model 
Description This is a machine learning model that predicts IUPAC names from InChI. It was trained on a dump of PubChem's database, and has a transformer encoder-decoder architecture. Instructions Requires: Python >= 3.6 PyTorch == 1.6.0 1. Install OpenNMT-py version 2.0.0:
pip install OpenNMT-py==2.0.0
2. Prepare InChI to be translated by splitting into individual characters separated by whitespace and saving in a text file. You can predict multiple IUPAC names by having one InChI per line (see example.inchi for reference). 3. Perform the prediction with the supplied model file:
onmt_translate --beam_size 10 --length_penalty wu --alpha 1.0 --model inchi2iupac_step_259200.pt --src  --max_length 300 --output 
 
Type Of Material Database/Collection of data 
Year Produced 2021 
Provided To Others? Yes  
URL https://zenodo.org/record/5081158
 
Description Delivery of PSDS 
Organisation Science and Technologies Facilities Council (STFC)
Country United Kingdom 
Sector Public 
PI Contribution Southampton provides community expertise and engagement, development of new communities, training, management and direction.
Collaborator Contribution STFC provides platfom development, platform delivery and support, license negotiation, management and strategic direction.
Impact Functioning service
Start Year 2019
 
Description AI4SD, PSDS & PSDI Skills4Scientists Series 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Undergraduate students
Results and Impact This series was organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI4SD), the Physical Sciences Data-Science Service (PSDS), and the Physical Sciences Data Infrastructure (PSDI). This series was initially run over summer 2021 and aimed to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. The first iteration of this series was primarily aimed at final year undergraduates / early stage PhD students.

This series has now been run again in 2022 and 2023 and is in further development for 2024 to create a flipped/blended learning course, and to make a wide range of materials available online alongside the initial video content.
Year(s) Of Engagement Activity 2021,2022,2023
URL https://eprints.soton.ac.uk/453198/
 
Description Failed it to Nailed it 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Other audiences
Results and Impact A series of workshops around research data management.
Year(s) Of Engagement Activity 2020
URL https://www.ai3sd.org/ai3sd-events/
 
Description Failed it to Nailed it Seminar Series 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Industry/Business
Results and Impact This 'Failed it to Nailed it - Getting Data Sharing Right' series is a series of events run by the Artificial Intelligence for Scientific Discovery Network+ (AI3SD), the Cell Press Patterns Journal and the Physical Sciences Data-Science Service (PSDS). These events are a product of the data sharing survey we ran in 2020.
Year(s) Of Engagement Activity 2020
URL https://www.ai3sd.org/ai3sd-online-seminar-series/data-seminar-series-2020/
 
Description Hackathon and Training Workshop - machine learning in Chemistry 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach Regional
Primary Audience Postgraduate students
Results and Impact 20 people, a mixture of students and researchers attended a workshop event hosted in collaboration with the Artificial Intelligence for Scientific Discovery Network. This workshop was set across two days, made up of a mixture of presentations and tutorials followed by a data hackathon. In the hackathon attendees formed teams and worked to address one of a number of challenges set under supervision from several advisers present. The event was very well received and the attendees reported back at the end of the event about their increased awareness of sourcing, processing and analysing data. All attendees said they would attend similar events in the future.
Year(s) Of Engagement Activity 2019
 
Description Machine Learning for Atomistic Modelling Autumn School 2023 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact This was a three day event that took place in person at the Daresbury Laboratory. It was a machine learning for materials training course that was run by the Physical Sciences Data Infrastructure (PSDI) initiative in collaboration with PSDS, AI4SD, STFC-SCD and CCP5.This training was targeted towards PhD students, in particular those in the Materials and Molecular Simulations field. The aim of this training was to introduce attendees to the latest methods of machine learning applied to atomistic simulation of materials.
This training encompassed a number of talks and practical sessions, focusing on the basics of machine learning, machine learning interatomic potentials and graph neural networks. There was also an opportunity for attendees to present a poster on their work. Overall the school was very well received with requests to run it as a yearly event.
Year(s) Of Engagement Activity 2023
URL https://www.psdi.ac.uk/event/machine-learning-autumn-school-2023/
 
Description Presentation at the American Chemical Society 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Professional Practitioners
Results and Impact Building a Physical Sciences Data-science Service to support FAIR data, given by Brian Matthews. ACS Spring Meeting 2021, session INF008 Framing FAIR: Scientific Research Data Sharing Policies, Frameworks and Principles
Year(s) Of Engagement Activity 2021
 
Description RSC Faraday Meeting 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact A talk was given about the Introduction of the Physical Sciences Data Science Service at the Faraday Division Chemistry Software tools meeting. This was a meeting addressing a range of students and researchers and raised awareness about the new Service with a short question and discussion session afterwards.
Year(s) Of Engagement Activity 2018
 
Description Requirements analysis with National Research Facilities 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact Representatives from National Research Facilities in the UK joined a workshop with PSDI to discuss their data needs and requirements. Several sessions were run with active discussion among participants. Follow up discussion has been had about further activities to be explored with PSDI.
Year(s) Of Engagement Activity 2022