CP2K For Emerging Architectures And Machine Learning

Lead Research Organisation: University of Lincoln
Department Name: School of Maths and Physics

Abstract

The CP2K software (www.cp2k.org) is a highly efficient and parallelizable open-source atomistic simulation tool able to calculate the energies and forces (as well as other properties) of large collections of atoms at a variety of levels of theory. This makes it a prime candidate for usage in novel machine learning and other data driven areas of research as well as more traditional materials science work. Indeed, CP2K was one of the most intensively used codes on ARCHER and has an extensive and growing user base on ARCHER2. CP2K was one of the acceptance test codes evaluated on ARCHER2, where it underperformed in comparison to the average of the suite of test codes, highlighting the need for this code to be refactored and optimised for the ARCHER2 hardware and emerging systems like Bede. In addition to the clear need to tune CP2K to the ARCHER2 architecture, there are two other main drivers for our bid: 1) to support the UK user base of ~200 researchers through user meetings and workshops 2) develop CP2K so that it can become the favoured density functional theory (DFT) engine for the rapidly expanding community applying machine learning (ML) methods to materials and structure prediction.

Previously, we have obtained funding to support the CP2K community and develop the CP2K code. The "CP2K-UK" grant ran from 2013 to 2018 and helped build a large, connected and productive community of CP2K users and developers in the UK. Annual meetings typically attracted around 100 attendees demonstrating a clear demand from CP2K users. Usage of CP2K on national supercomputers grew significantly during this period.

Currently, we are experiencing a dramatic shift in the way that the materials modelling community tackles scientific problems as ML and artificial intelligence transform more areas of research. For machine learning there are two application scenarios: (1) machine learning interaction potentials, and (2) machine learning molecular/materials properties. For (1), CP2K can provide energies and forces, and for (2) CP2K can provide a range of properties that can be learned. We will address these scenarios by providing software for:

- Rapid and efficient sampling of high-quality ab initio data

- Easy and reproducible environments for developing and utilizing ML potentials

- Clear documentation of workflows and scientific method

- Better integration with materials databases allowing data-mining of results.

CP2K is particularly well placed to address these challenges through its intrinsic efficiency in generating data. This also means that less energy is used for during the training process of building ML methods helping the UK's net zero targets. Because CP2K is open source with a sustained and growing development base for over a decade and a clear code development ethos, it is readily amenable for integration with other software and libraries.

We will support and expand this extremely successful community and develop a suite of tools for CP2K to enable highly efficient, reproducible, and flexible workflows on the new and next generation of UK hardware and the emerging generation of materials modelers that rely upon ML methods.

We will provide community led improvements to CP2K, develop a flexible and robust ML potential work environment encompassing CP2K and partner ML codes. The community will be grown by a series of hands-on workshops that encompass both local and international experts and use both traditional presentation and online learning materials. We will also extend our activities to industrial partners including Johnson Matthey.

Publications

10 25 50