An EPSRC National Research Facility to facilitate Data Science in the Physical Sciences: The Physical Sciences Data science Service (PSDS)
Lead Research Organisation:
University of Southampton
Department Name: Sch of Chemistry
Abstract
The modern physical scientist cannot perform their research without generating significant quantities of data, having recourse to related/prior data, significant data analysis and integrating results with other data. This requires a range of skills and resources that are not available to the majority of physical scientists. There is therefore an urgent need in the physical sciences for providing access to data and integrating them with data science approaches. This requires building a new skills base that enables and empowers working in a data science way.
The Physical Sciences Data-science Service (PSDS) will provide a single place where existing databases, open data sources and data that is still being worked on can be stored and searched in a unified way. This means that it will become trivial to find and combine different types of physical sciences data - from details on structure to measured physical properties of materials. It will also make possible instant comparison of and context for experiment data with that already available.
This is just the start however. There is enormous potential for being able to perform data science across all of these data, that is for example, Machine Learning and Artificial Intelligence approaches, which are becoming a new avenue of research in their own right.
It is vital that data science becomes a routine tool for all physical scientists. For many this will mean learning new skills. The PSDS will therefore develop a training programme around the four main competencies (statistics, programming/tools, computational methods & data visualisation) required to perform data science. Identified links with networks and postgraduate training will enable PSDS users to gain deeper skills in various aspects of data science.
The long-term aim is for the PSDS, and therefore data science, to become a seamless, key part of the research infrastructure for physical scientists.
The Physical Sciences Data-science Service (PSDS) will provide a single place where existing databases, open data sources and data that is still being worked on can be stored and searched in a unified way. This means that it will become trivial to find and combine different types of physical sciences data - from details on structure to measured physical properties of materials. It will also make possible instant comparison of and context for experiment data with that already available.
This is just the start however. There is enormous potential for being able to perform data science across all of these data, that is for example, Machine Learning and Artificial Intelligence approaches, which are becoming a new avenue of research in their own right.
It is vital that data science becomes a routine tool for all physical scientists. For many this will mean learning new skills. The PSDS will therefore develop a training programme around the four main competencies (statistics, programming/tools, computational methods & data visualisation) required to perform data science. Identified links with networks and postgraduate training will enable PSDS users to gain deeper skills in various aspects of data science.
The long-term aim is for the PSDS, and therefore data science, to become a seamless, key part of the research infrastructure for physical scientists.
Planned Impact
The purpose of the service is to facilitate new data science approaches to research for a broad cross-section of the physical sciences research community. The impact on the communities and disciplines within physical sciences will be very significant. By providing data resources and analysis processes that would not otherwise be available to the researcher, entirely new results or avenues of research will open up. This will be further enhanced due to the way in which it will be possible to blend and analyse data across different resources. This is not just a matter of breaking down silos, but rather fuelling far-reaching research. The vision is that large aggregations of data can be generated and systematically analysed using machine learning or artificial intelligence approaches. This would effect a paradigm shift in the way in which physical sciences discovery and analysis is performed, so resulting in new science that quite simply could not be achieved by traditional methods.
Early career researchers will be significantly impacted by this service. Our training programme will be particularly aimed at this audience, taking those with a sound, but traditional, physical science background and exposing them to new, emergent methods that promise to revolutionise the subject area. It is the earlier generation of researchers that are best poised to take advantage of this new approach and our training programme is founded in a data science training framework designed specifically to develop researchers in this new direction - we will provide the first rungs on the ladder and link into networks and training schemes that can develop trainees further. We expect a new type of researcher to result from this approach.
The providers of data resources will be particularly impacted. In an open environment where new aggregations of data can be developed on the fly depending on a specific research question, there is enormous opportunity. Data that is currently siloed and therefore only useful to narrow communities will now be available for exploitation in a multitude of new ways that were previously unimagined. The ability for these new collections of data to be mined or systematically analysed provides further opportunity for data providers as well as users.
These new approaches that liberate data and make it available for large scale systematic analysis will ultimately benefit the materials science and chemical manufacturing industries. Time to discovery will be reduced, optimisation and efficiency in manufacture enabled, and new and more elaborate property-driven products engineered. The UK high-value industry needs many more data-science minded researchers and this service will provide a 'low barrier to entry' method for a large number of traditionally educated researchers to engage.
Early career researchers will be significantly impacted by this service. Our training programme will be particularly aimed at this audience, taking those with a sound, but traditional, physical science background and exposing them to new, emergent methods that promise to revolutionise the subject area. It is the earlier generation of researchers that are best poised to take advantage of this new approach and our training programme is founded in a data science training framework designed specifically to develop researchers in this new direction - we will provide the first rungs on the ladder and link into networks and training schemes that can develop trainees further. We expect a new type of researcher to result from this approach.
The providers of data resources will be particularly impacted. In an open environment where new aggregations of data can be developed on the fly depending on a specific research question, there is enormous opportunity. Data that is currently siloed and therefore only useful to narrow communities will now be available for exploitation in a multitude of new ways that were previously unimagined. The ability for these new collections of data to be mined or systematically analysed provides further opportunity for data providers as well as users.
These new approaches that liberate data and make it available for large scale systematic analysis will ultimately benefit the materials science and chemical manufacturing industries. Time to discovery will be reduced, optimisation and efficiency in manufacture enabled, and new and more elaborate property-driven products engineered. The UK high-value industry needs many more data-science minded researchers and this service will provide a 'low barrier to entry' method for a large number of traditionally educated researchers to engage.
Publications
Allender CJ
(2020)
The Role of Growth Directors in Controlling the Morphology of Hematite Nanorods.
in Nanoscale research letters
Bernet T
(2024)
Modeling the Thermodynamic Properties of Saturated Lactones in Nonideal Mixtures with the SAFT-? Mie Approach.
in Journal of chemical and engineering data
Coles S
(2020)
Taking FAIR on the ChIN: The Chemistry Implementation Network
in Data Intelligence
Gramlich A
(2022)
Plasma Nitriding of an Air-Hardening Medium Manganese Forging Steel
in HTM Journal of Heat Treatment and Materials
Greenacre VK
(2022)
Tungsten(VI) selenide tetrachloride, WSeCl4 - synthesis, properties, coordination complexes and application of [WSeCl4(SenBu2)] for CVD growth of WSe2 thin films.
in Dalton transactions (Cambridge, England : 2003)
Handsel J
(2021)
Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier.
in Journal of cheminformatics
Title | AI3SD Video: A Career in Chemistry & Beyond |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 28 external views in addition to being part of our Skills4Scientists Series. This was a collaboration between AI3SD and PSDS. |
URL | https://eprints.soton.ac.uk/451125/ |
Title | AI3SD Video: All's Fair in love and data management |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #1 - Research Data Management Session, which focussed on several areas of good data management practices. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 46 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | http://eprints.soton.ac.uk/id/eprint/450266 |
Title | AI3SD Video: Building your professional contacts - Networking for Scientists and/or Introverts |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the third talk in the Skills4Scientists #6 - Careers 1 Session, which focussed on on several areas of careers advice that will be useful to you as you complete your studies and begin your careers. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 23 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/451153/ |
Title | AI3SD Video: Collaborative Data Management |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #1 - Research Data Management Session, which focussed on several areas of good data management practices. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 50 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | http://eprints.soton.ac.uk/id/eprint/450268 |
Title | AI3SD Video: Collaborative Reports/Presentations |
Description | his talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the fourth talk in the Skills4Scientists #5 - Posters, Presentations & Reports Session, which focussed on several areas of communication for your research; presentations, posters and reports. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 18 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450846/ |
Title | AI3SD Video: Cultivating your Web Presence |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #6 - Careers 1 Session, which focussed on on several areas of careers advice that will be useful to you as you complete your studies and begin your careers. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 28 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450840/ |
Title | AI3SD Video: Data Analysis Case Study |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the third talk in the Skills4Scientists #4 - Intro to Python 2 Session, which was a follow on from our Intro to Python 1 course, with a focus on working further with the core elements of Python and performing data analysis, using Jupyter notebooks and Anaconda. This course is designed to allow you to follow along with the content and examples as the course goes, but you will also be provided with course material to allow you to cover it again after the live event. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 78 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450568/ |
Title | AI3SD Video: Data Generation, Data Standards and Metadata Capture in Drug Discovery |
Description | Biomedical research and drug discovery are based on a continuous cycle of scientific findings being made, refined, and translated into new treatments. However, over recent years it has become clear that only a fraction of all published research findings are actually reproducible, causing waste and delays in our efforts to bring new drugs to patients. The answer is changing the way we generate and capture data, including experimental metadata. Especially in light of the increasing role of Artificial Intelligence in drug discovery, it is critical to rethink the way we approach data generation as the most important input for AI-driven drug discovery. The talk will address these recent advances in data and metadata capture based on fully automated experimentation and novel data standards. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 166 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://eprints.soton.ac.uk/447526/ |
Title | AI3SD Video: Data legislation, personal and non-personal data, ethical issues and protecting your IP rights |
Description | Scientists and researchers handle vast amounts of data in the course of their work, and in recent years technology and computational power has revolutionised the ability to create, store and analyse data. Scientific research increasingly now requires scientists to be skilled in computational analysis and the ability to work with algorithms, as they deal with larger and larger data sets. The handling of data in science brings with it legal considerations in relation to data protection and intellectual property rights. This talk will give an overview of the data legislation that applies to scientists and researchers including guidance from data protection authorities and decisions taken by courts, the differences between personal and non-personal data, the ethical issues involved in the use of algorithms and how you can protect your intellectual property rights. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 99 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | http://eprints.soton.ac.uk/id/eprint/447091 |
Title | AI3SD Video: Data publication - a personal tale |
Description | In this talk, I will discuss the theory and practice of data publication both from the perspective of an academic journal editor, but also as a scientific researcher who created datasets, and who got scooped. I'll touch on the importance of data management and data citation, and give an overview of how data publication has grown over the past years, and where we want to be heading in the future. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 47 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://eprints.soton.ac.uk/447368/ |
Title | AI3SD Video: Digitising your Chemistry for Recordability, Shareabilty and Reproducibility |
Description | Mark's talk focused on three specific areas as to how you can digitise your data and workflows to improve your productivity, increase discovery of your data and make your research more reproducible. These tips were broken down into smaller areas in which you could implement them, with examples taken from the chemistry and life sciences domains. The three main areas which Mark included in his tips were: binning the old fashioned write up, collecting data throughout the whole experiment and sharing your data in accessible and transferable formats. The talk gave examples using the Digital GlasswareTM products offered by DeepMatter, in addition to ways to incorporate the tips in different systems. Mark concludes his talk by commenting on the large proportion of science that is currently irreproducible and the ways in which human interaction introduces opportunities for error. These tips aim to resolve these issues, increasing the reproducibility of science and reducing the errors. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 200 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://eprints.soton.ac.uk/447531/ |
Title | AI3SD Video: Ethical data management - balancing individual privacy and public benefit |
Description | This talk will cover aspects of ethical data management, focussing on the key issues of participant consent, data minimisation, and data anonymisation, using examples from health sciences and engineering. Content within the talk aims to cover: big picture issues (societal benefits to data sharing versus individual right to privacy), relevant legislation (GDPR, DPA 2018 and FoIA 2000), what happens when things go wrong, managing risk via informed consent, data minimisation and anonymisation (formal, statistical and functional) and best practice guidelines and tools. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 54 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | http://eprints.soton.ac.uk/id/eprint/447090 |
Title | AI3SD Video: GitHub & LaTeX Demo |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the third talk in the Skills4Scientists #3 - Version Control and LaTeX Session, which focussed on focus on teaching the basics of LaTeX and version control. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 92 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450565/ |
Title | AI3SD Video: Giving your Open Data the best chance to realise its potential |
Description | Chris is not a researcher, but he's worked with a lot of them over 23 years at the University of Southampton. He's seen a lot of hard work on open data fail to achieve the potential it could have. A common issue is that rather than face the reality of what's going wrong, it's easier to invest more in the aspects of your dataset and data service that are good than to identify and fix aspects that are bad. Such "hygiene factors" don't have to be perfect but they must all be good enough. Failure in any one may lead to failure of your data to achieve its potential, no matter how well you do on other factors. Chris will give some examples of the most common open data hygiene factors, and some tips from the public sector open data community on how to address them pragmatically. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 65 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://eprints.soton.ac.uk/447527/ |
Title | AI3SD Video: Intro to Ethics |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #7 - Ethical Research event. This talk provided a brief intro to some of the areas of consideration when looking at the ethical conduct of research. This is not any form of formal ethics training and if you wish to learn more about ethics you should contact the relevant departments at your institution. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 31 external views in addition to being part of our Summer Seminar Series. This was the first online series we created to engage with our Network members worldwide during the COVID-19 pandemic. This series helped launch our YouTube channel which now has over 500 subscribers, and greatly increased our Network membership. |
URL | https://eprints.soton.ac.uk/451137/ |
Title | AI3SD Video: Intro to LaTeX |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #3 - Version Control and LaTeX Session, which focussed on focus on teaching the basics of LaTeX and version control. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 117 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450562/ |
Title | AI3SD Video: Introduction to Git |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #3 - Version Control and LaTeX Session, which focussed on focus on teaching the basics of LaTeX and version control. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 76 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450563/ |
Title | AI3SD Video: LaTeX in Overleaf |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the fourth talk in the Skills4Scientists #3 - Version Control and LaTeX Session, which focussed on focus on teaching the basics of LaTeX and version control. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 67 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450567/ |
Title | AI3SD Video: Linked Data - Examples and Heuristics |
Description | With their inherent flexibility and robustness to change, the decentralised interconnected knowledge graphs that lie at the heart of semantic web technologies are ideally suited for the challenges of converting the messy, often incomplete, and internally heterogeneous datasets of the Humanities into machine processable data. Although a matter of some debate, the reuse and adoption of known ontologies, schema, and taxonomies across disparate projects across the Arts, Humanities, and Social Sciences landscape has been steadily increasing over the last decade in particular. This talk will describe the practical approaches and heuristics of such Linked Data projects, commenting on the effect of political, institutional, and socio-cultural factors in their planning, implementation, and evaluation. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 78 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://eprints.soton.ac.uk/447528/ |
Title | AI3SD Video: Love notes to the future: the importance of metadata |
Description | Isobel's talk was focused on the importance of good research data management and how this can pay off in the future. This talk discussed four main aspects of data management: The data management plan, data storage, finding your data and sharing your data. Good data management is really important. Ultimately, managing your data well will save you time and effort in the future, making it easier to find, use, and distribute to others later on. A core part of data management is the data management plan, which should set out the full plan for what data is going to be gathered, how it is going to be catalogued and in what formats it is going to be stored in (paper/electronic/physical data). |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 119 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://eprints.soton.ac.uk/447529/ |
Title | AI3SD Video: Pitfalls and Gotcha's with bioactivity data |
Description | John's talk focused on his experiences working with bioactivity data and drug discovery research along with some of the problems and errors that people have encountered when working in this sphere. In the past researchers could reasonably know 'most' of the research within a field, but now we have much larger scale research, more participants and more data but without a lot of the groundwork being laid for good data sharing and reusability. Now there is a lot of messy data out there; inaccessible data, cryptic data and poorly described data. John talks about some of the bioactivity resources that are available for researchers and some of the successes that these data sources have had when handling large amounts of data. John gives plenty of tips about things to look out for when examining chemical and biological data, with plenty of examples. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 131 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://eprints.soton.ac.uk/447530/ |
Title | AI3SD Video: Presenting in Person & Online |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #5 - Posters, Presentations & Reports which focussed on several areas of communication for your research; presentations, posters and reports. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 19 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450844/ |
Title | AI3SD Video: Producing a good Poster |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the third talk in the Skills4Scientists #5 - Posters, Presentations & Reports Session, which focussed on several areas of communication for your research; presentations, posters and reports. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 44 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450845/ |
Title | AI3SD Video: Publishing and Citing Data in Practice |
Description | Sharing data, whether openly or on a more restricted basis, is increasingly expected of researchers in many areas. This is incredibly valuable for the scientific community but does take more work than simply putting the results in a drawer after the paper is published, so how can we make sure that the originators of data get full credit for their labour while ensuring that the data continues to be accessible in the long term? Data citation has a big role to play in the answer to this question, and this talk will give you an overview of the principles of data citation and how to implement them in practice. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Forms part of our YouTube Channel. Has received 611 external views in addition to being part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://eprints.soton.ac.uk/447369/ |
Title | AI3SD Video: Setup, environments, installing packages, intro to Jupyter |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 122 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | http://eprints.soton.ac.uk/id/eprint/450248 |
Title | AI3SD Video: Skills4Scientists - Poster & Careers Symposium - Poster Compilation |
Description | This video forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video is a compilation of posters presented at the Skills4Scientists Posters & Careers Symposium. These poster presentations are predominantly from summer students involved in the AI3SD 2021 summer internship program. Higher resolution versions of the posters are available on the poster symposium website: https://www.ai3sd.org/s4s-symposium20...Not all poster presenters requested a recording of their talk. The following posters recordings are included in this compilation video. Poster 1 - Nearer the nearsightedness principle: Large-scale quantum chemical calculations - Andras Vekassy (University of Southampton) Poster 3 - Combining Ultrasonic Methods and Machine Learning Techniques to Assess Baked Products Quality - Erhan Gulsen (University of Nottingham) Poster 4 - Interactive Knowledge-Based Solvent Selection Tool - Hewan Zewdu (University of Nottingham)Poster 5 - CV in High Throughput Chemistry - Jamie Longino (University of Strathclyde)Poster 9 - Dewetting in Thin Liquid Films: Using Sparse Optimization to Learn Evolution Equations - Aspen Fenzl (University of Sheffield)Poster 12 - Creating a merged dataset and its exploration with different Machine Learning algorithms - Maximilian Hoffman (Freie Universität of Berlin)Poster 14 - Bayesian optimisation in Chemistry - Rubaiyat Khondaker (University of Cambridge) Poster 15 - A deep neural network for generation of functional organic materials - Rhyan Barrett (University of Warwick) Thank you to our sponsors Optibrium (https://www.optibrium.com/) and Dotmatics (https://www.dotmatics.com/) who supported this event. These poster presentations were live cartooned by ErrantScience (errantscience.com) which is also available on our YouTube Channel. Sections Intro: (0:00) Andras Vekassy - Nearer the nearsighted principle: Large-scale quantum chemical calculations: (0:17) Erhan Gulsen - Combining Ultrasonic Methods and Machine Learning Techniques to Assess Baked Products Quality: (06:11) Hewan Zewdu - Interactive Knowledge-Based Solvent Selection Tool: (12:09) Jamie Longino - CV in High Throughput Chemistry: (16:52) Aspen Fenzl - Dewetting in Thin Liquid Films: Using Sparse Optimization to Learn Evolution Equations: (21:21) Maximillian Hoffman - Creating a merged dataset and its exploration with different machine learning algorithms: (27:31) Rubaiyat Khondaker - Bayesian optimisation in Chemistry: (34:33) Rhyan Barrett - A deep neural network for generation of functional organic materials: (40:06) Further details from this series can be found at: https://www.ai3sd.org/skills4scientists |
Type Of Art | Film/Video/Animation |
Year Produced | 2022 |
Impact | Forms part of our YouTube Channel. Has received 34 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/451464/ |
Title | AI3SD Video: Typing, Variables, Data Types & Functions |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #4 - Intro to Python 2 Session, which was a follow on from our Intro to Python 1 course, with a focus on working further with the core elements of Python and performing data analysis, using Jupyter notebooks and Anaconda. This course is designed to allow you to follow along with the content and examples as the course goes, but you will also be provided with course material to allow you to cover it again after the live event. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 50 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450614/ |
Title | AI3SD Video: Using RDKit |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 2691 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450309/ |
Title | AI3SD Video: Writing a CV |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #6 - Careers 1 Session, which focussed on on several areas of careers advice that will be useful to you as you complete your studies and begin your careers. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 47 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450847/ |
Title | AI3SD Video: Writing a good Abstract & Best Practices for Scientific Communication |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the first talk in the Skills4Scientists #5 - Posters, Presentations & Reports which focussed on several areas of communication for your research; presentations, posters and reports. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 36 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/450843/ |
Title | AI3SD Video: Writing an ethics application |
Description | This talk forms part of the Skills4Scientists Series which has been organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI3SD) and the Physical Sciences Data-Science Service (PSDS). This series ran over summer 2021 and aims to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. This series is primarily aimed at final year undergraduates / early stage PhD students. This video was the second talk in the Skills4Scientists #7 - Ethical Research Session, which focussed on several areas of ethical research including discussions on why ethics is important and how to write an ethics application. |
Type Of Art | Film/Video/Animation |
Year Produced | 2021 |
Impact | Forms part of our YouTube Channel. Has received 34 external views in addition to being part of our Skills4Scientists Series which was aimed at educating final year undergraduates / early stage PhD students. |
URL | https://eprints.soton.ac.uk/451154/ |
Title | The (long) journey from supporting information to Publishing and Finding FAIR data in chemistry |
Description | Electronic supporting information had its origins in the early to mid 1990s and it has evolved in a highly ad hoc manner since then. The concept of FAIR data arose about five years ago to try in part to rationalise the chaotic state of ESI. The talk will illustrate these developments by presenting a case study illustrating how one (either human or AI) might use the properties of FAIR to "F"ind some highly focused chemical spectroscopic and computational data. I will conclude by trying to unpick some of the supporting infrastructures which enable this and how the creators of the data facilitate this by using metadata to describe and then publish the data. The talk incorporates some elements of FAIR by having its own metadata and its own persistent identifier (as a DOI): https://doi.org/ff6g so that you can yourself Find, Access, Interoperate, Re-use and Cite it as appropriate. |
Type Of Art | Film/Video/Animation |
Year Produced | 2020 |
Impact | Part of our Failed it to Nailed it Seminar Series. This was a collaboration between AI3SD, PSDS and Patterns to educate on and discuss different aspects of data sharing based on a survey run in 2020. |
URL | https://data.hpc.imperial.ac.uk/resolve/?doi=7629 |
Description | Physical Sciences Data Infrastructure (PSDI) Phase 1 Pilot |
Amount | £1,002,306 (GBP) |
Funding ID | EP/W032252/1 |
Organisation | Engineering and Physical Sciences Research Council (EPSRC) |
Sector | Public |
Country | United Kingdom |
Start | 11/2021 |
End | 03/2022 |
Title | ChASe tool developed and deployed as a service. It compares the cost of chemical reagents across all major suppliers, ensuring academics can get their materials for research in an efficient manner - both timely and cost effective. |
Description | ChASe tool developed and deployed as a service. It compares the cost of chemical reagents across all major suppliers, ensuring academics can get their materials for research in an efficient manner - both timely and cost effective. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2020 |
Provided To Others? | Yes |
Impact | Economic and and efficiency benefits for PSDS users. |
URL | https://www.psds.ac.uk/node/91 |
Title | Propersea |
Description | Propersea is a newly developed online resource designed to provide predictions for a range of molecular and physico-chemical properties for small molecules. There are 20 predicted properties, including: melting point, boiling point, density, logP, solubility, and polarizability. It will also predict the IUPAC name for the molecule; this feature required additional research and specialised code development. Propersea is integrated into the PSDS tool suite. |
Type Of Material | Improvements to research infrastructure |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Novel algorithm to derive IUPAC names from chemical identifiers based on machine learning and natural language processing. |
URL | https://www.psds.ac.uk/propersea |
Title | Additional file 1 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier |
Description | Additional file 1. 100 molecule test set including InChI and IUPAC names from PubChem, ACD /I-Labs, ChemAxon, Mestrelab and the transformer presented in the current paper. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Translating_the_InChI_adap... |
Title | Additional file 1 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier |
Description | Additional file 1. 100 molecule test set including InChI and IUPAC names from PubChem, ACD /I-Labs, ChemAxon, Mestrelab and the transformer presented in the current paper. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_1_of_Translating_the_InChI_adap... |
Title | Additional file 2 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier |
Description | Additional file 2. Training script for OpenNMT-py. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_2_of_Translating_the_InChI_adap... |
Title | Additional file 3 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier |
Description | Additional file 3. InChI character vocabulary needed for training the model with OpenNMT. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_3_of_Translating_the_InChI_adap... |
Title | Additional file 4 of Translating the InChI: adapting neural machine translation to predict IUPAC names from a chemical identifier |
Description | Additional file 4. IUPAC character vocabulary needed for training the model with OpenNMT. |
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://springernature.figshare.com/articles/dataset/Additional_file_4_of_Translating_the_InChI_adap... |
Title | IUPAC Naming Algorithm |
Description | PSDS has developed a novel machine learning model for generation of IUPAC names. This machine learning model that we have built is a sequence-to-sequence model that can predict the IUPAC name from the molecules InChI (International Chemical Identifier) string. The model uses two stacks of transformers in an encoder-decoder architecture, a setup similar to the neural networks used in state-of-the-art machine translation. Unlike neural machine translation, which usually tokenizes input and output into words or sub-words, our model processes the InChI and predicts the IUPAC name character by character. The model was trained on a dataset of 10 million InChI/IUPAC name pairs freely downloaded from the National Library of Medicine's online PubChem service and tested on a dataset of 200,000 compounds. Training took seven days on a Tesla K80 GPU, and the model achieved a test set accuracy of 90.7 %. |
Type Of Material | Computer model/algorithm |
Year Produced | 2021 |
Provided To Others? | Yes |
Impact | Integrated into the PSDS offering and made available for use by the Academic Community. |
URL | https://www.psds.ac.uk/propersea |
Title | InChI to IUPAC name machine learning model |
Description | This is a machine learning model that predicts IUPAC names from InChI. It was trained on a dump of PubChem's database, and has a transformer encoder-decoder architecture. Instructions Requires: Python >= 3.6 PyTorch == 1.6.0 1. Install OpenNMT-py version 2.0.0:
2. Prepare InChI to be translated by splitting into individual characters separated by whitespace and saving in a text file. You can predict multiple IUPAC names by having one InChI per line (see example.inchi for reference). 3. Perform the prediction with the supplied model file:
|
Type Of Material | Database/Collection of data |
Year Produced | 2021 |
Provided To Others? | Yes |
URL | https://zenodo.org/record/5081158 |
Description | Delivery of PSDS |
Organisation | Science and Technologies Facilities Council (STFC) |
Country | United Kingdom |
Sector | Public |
PI Contribution | Southampton provides community expertise and engagement, development of new communities, training, management and direction. |
Collaborator Contribution | STFC provides platfom development, platform delivery and support, license negotiation, management and strategic direction. |
Impact | Functioning service |
Start Year | 2019 |
Description | AI4SD, PSDS & PSDI Skills4Scientists Series |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Undergraduate students |
Results and Impact | This series was organised as a joint venture between the Artificial Intelligence for Scientific Discovery Network+ (AI4SD), the Physical Sciences Data-Science Service (PSDS), and the Physical Sciences Data Infrastructure (PSDI). This series was initially run over summer 2021 and aimed to educate and improve scientists skills in a range of areas including research data management, python, version control, ethics, and career development. The first iteration of this series was primarily aimed at final year undergraduates / early stage PhD students. This series has now been run again in 2022 and 2023 and is in further development for 2024 to create a flipped/blended learning course, and to make a wide range of materials available online alongside the initial video content. |
Year(s) Of Engagement Activity | 2021,2022,2023 |
URL | https://eprints.soton.ac.uk/453198/ |
Description | Failed it to Nailed it |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Other audiences |
Results and Impact | A series of workshops around research data management. |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.ai3sd.org/ai3sd-events/ |
Description | Failed it to Nailed it Seminar Series |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Industry/Business |
Results and Impact | This 'Failed it to Nailed it - Getting Data Sharing Right' series is a series of events run by the Artificial Intelligence for Scientific Discovery Network+ (AI3SD), the Cell Press Patterns Journal and the Physical Sciences Data-Science Service (PSDS). These events are a product of the data sharing survey we ran in 2020. |
Year(s) Of Engagement Activity | 2020 |
URL | https://www.ai3sd.org/ai3sd-online-seminar-series/data-seminar-series-2020/ |
Description | Hackathon and Training Workshop - machine learning in Chemistry |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | Regional |
Primary Audience | Postgraduate students |
Results and Impact | 20 people, a mixture of students and researchers attended a workshop event hosted in collaboration with the Artificial Intelligence for Scientific Discovery Network. This workshop was set across two days, made up of a mixture of presentations and tutorials followed by a data hackathon. In the hackathon attendees formed teams and worked to address one of a number of challenges set under supervision from several advisers present. The event was very well received and the attendees reported back at the end of the event about their increased awareness of sourcing, processing and analysing data. All attendees said they would attend similar events in the future. |
Year(s) Of Engagement Activity | 2019 |
Description | Machine Learning for Atomistic Modelling Autumn School 2023 |
Form Of Engagement Activity | Participation in an activity, workshop or similar |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | This was a three day event that took place in person at the Daresbury Laboratory. It was a machine learning for materials training course that was run by the Physical Sciences Data Infrastructure (PSDI) initiative in collaboration with PSDS, AI4SD, STFC-SCD and CCP5.This training was targeted towards PhD students, in particular those in the Materials and Molecular Simulations field. The aim of this training was to introduce attendees to the latest methods of machine learning applied to atomistic simulation of materials. This training encompassed a number of talks and practical sessions, focusing on the basics of machine learning, machine learning interatomic potentials and graph neural networks. There was also an opportunity for attendees to present a poster on their work. Overall the school was very well received with requests to run it as a yearly event. |
Year(s) Of Engagement Activity | 2023 |
URL | https://www.psdi.ac.uk/event/machine-learning-autumn-school-2023/ |
Description | Presentation at the American Chemical Society |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | International |
Primary Audience | Professional Practitioners |
Results and Impact | Building a Physical Sciences Data-science Service to support FAIR data, given by Brian Matthews. ACS Spring Meeting 2021, session INF008 Framing FAIR: Scientific Research Data Sharing Policies, Frameworks and Principles |
Year(s) Of Engagement Activity | 2021 |
Description | RSC Faraday Meeting |
Form Of Engagement Activity | A talk or presentation |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Postgraduate students |
Results and Impact | A talk was given about the Introduction of the Physical Sciences Data Science Service at the Faraday Division Chemistry Software tools meeting. This was a meeting addressing a range of students and researchers and raised awareness about the new Service with a short question and discussion session afterwards. |
Year(s) Of Engagement Activity | 2018 |
Description | Requirements analysis with National Research Facilities |
Form Of Engagement Activity | A formal working group, expert panel or dialogue |
Part Of Official Scheme? | No |
Geographic Reach | National |
Primary Audience | Professional Practitioners |
Results and Impact | Representatives from National Research Facilities in the UK joined a workshop with PSDI to discuss their data needs and requirements. Several sessions were run with active discussion among participants. Follow up discussion has been had about further activities to be explored with PSDI. |
Year(s) Of Engagement Activity | 2022 |