Building a world-class tool for automated data cleaning

Lead Research Organisation: University College London
Department Name: Physics and Astronomy

Abstract

Background: SherlockML is general-purpose platform for state-of-the art data science. It enables one-click deployment of cloud servers of arbitrary size for computation-intensive tasks, as well as easy collaboration, access to a wide range of open-source and proprietary tools, all with a convenient user experience.

Aims: In this project, you would be building from scratch a new tool for SherlockML to improve the workflow of data scientists, as well as the quality of their work. In particular, you would use statistical methods (e.g., Bayesian inference) and machine learning (e.g., undirected models) to build a general library for data cleaning. This problem permits solutions over a range of sophistication, with the potential for a very sophisticated implementation for the most general version of the product.

Workplan: We would follow an agile methodology with regular (daily at least) check-ins, ensuring that the project is well scoped and that you receive all the support you need from our engineers, data scientists, and machine learning researchers. We would expect that you would work at ASI's offices in Central London, which is an exciting entrepreneurial environment with great resources and interesting people.

Outcomes: The output of this project would be a data-cleaning library to be productionised into SherlockML. We would expect that this project is optimally suited to a 6 month internship and that you will be able to build a high-impact product from the top to bottom. We would also expect that you will be able to learn a lot through your colleagues, as well as our many knowledge sharing initiatives.

Publications

10 25 50

Studentship Projects

Project Reference Relationship Related To Start End Student Name
ST/P006736/1 01/10/2017 30/09/2024
2043125 Studentship ST/P006736/1 01/04/2018 30/09/2018 Ashwin Chopra