The Analysis of Automation by Means of Automation: A Machine Learning Approach to Job Tasks.

Lead Research Organisation: University of Warwick
Department Name: Economics


Computers and automation have had a massive impact on working life since the 1980s and the latest technological changes strongly suggest that this trend will continue. Inventions such as the self-driving car, general improvements in 'drone'-based robotics, and the emergence of 'expert systems' for medical diagnosis create the prospect that automation will soon be having an impact on jobs that were previously resilient to being replaced by machines. In turn, the nature of earnings and job demand could change in unexpected ways that could cause upheaval for our welfare system and general social cohesion. In particular, even more low skill jobs could be replaced by machines and jobs in the middle and upper end of the wage distribution could face new threats. The process of offshoring could also be accelerated as new technology reduces the need for physical co-location and face-to-face interactions in services.

This project provides a new analysis of automation that is itself based on automatic, computer driven classification techniques. Since 1938, US job experts have been compiling and regularly updating rich, detailed task descriptions across approximately 12,000 jobs. These job descriptions and their associated numerical 'characteristics scores' have been the basis of a new 'job tasks' view of the labour market. This job tasks view has split jobs into bundles of tasks best summarised by firstly 'routine' tasks (ones that can be codified and programmed into a machine of some type) and secondly 'nonroutine' tasks (those tasks that require advanced manual or analytical skills that cannot yet be programmed into a computer). In turn, this task view has had much success in explaining the pattern of jobs in the labour market.

The analysis that this project will put forward will be based directly on the text job descriptions in these expert-written job databases. Machine learning methods associated with the 'Big Data' approach will be used to decompose the job description text to discover the underlying, latent structure of the tasks that make up our occupations. For example, these methods will allow us to pick up the diffusion or spread of phrases and concepts associated with computers across occupations. These statistical methods are based on the principle of automated 'pattern recognition' of clustered and repeated text, so in an ironic twist automated methods will themselves be used to study the structure of automation in the economy. The main contribution of this approach is to provide measures of what has been changing within occupations. Recent analysis of the labour market has shown that much change has been occurring at the 'within group' level, that is, dispersion inside specific educational, industry or occupational categories. The methods used in this project will measure in detail what has been changing within occupations by studying the pattern of tasks as suggested by the text job descriptions.

The project also features a novel dissemination plan. The usual tools of academic publication and presentation will be deployed along with a comprehensive media campaign to raise public awareness of the research. The Principal Investigator has a strong track record in both these areas of academic publication and public engagement. This will be enhanced by new publicly available data that will be supported with a data/policy blog and online code repository. The aim will be to bring the 'open source' code-sharing culture into the social sciences where these practices have not yet taken root.

Planned Impact

The project will have a diverse set of beneficiaries both inside and outside academia. In the following, I list these beneficiaries and describe in detail how they will benefit.
A special advantage of this project is that it has academic benefits that accrue both in terms of the content of the research (that is, the focus on the economics of job tasks) and in terms of methodology (namely, the high-profile implementation of 'Big Data' machine learning techniques as an example for applications on other social and economic topics).
(i) Content -Based Benefits.
The project will provide new knowledge about the changing structure of job tasks and the role of technology in the modern labour market. This has been a major front for labour economics research since the early 2000s when the key study by Autor, Levy and Murnane (2003) was first published. A literature on tasks represented by over 2,000 citations has emerged since this work and this proposal outlines a research design that will make major advances in the measurement of tasks. These text-derived measures of tasks will be released as publicly available data for researchers to use in diverse applications related to inequality, the future of low-skill work and the nature of off-shoring to name three prominent areas.
(ii) Methodology
The project will be a major, high profile and accessible implementation of 'Big Data' machine learning techniques in economics. Importantly, the PI has a comprehensive dissemination plan outlined that will maximize the accessibility of the work across economics and other social sciences. Code will be provided and promoted in a specialist policy and data-oriented blog, a tool which is popular in the heavily open source data science community but less common in the social sciences. Furthermore, this will be supported long with an extensive programme of academic presentations across fields and disciplines.
(iii) Interdisciplinary Engagement
The proposal shows a unique and well-considered plan for engagement with the core 'data science' disciplines of statistics and computer science. The PI has undertaken serious 'proof-of-concept' programming work to study the feasibility of the project and make himself aware of the technical demands that will be faced in this project. This has been achieved through personal research and through an emerging network that the PI has been building at Warwick with the relevant data science departments and centres. This FRL will allow the PI to capitalize on this work and build an important bridge between the disciplines.

The main public beneficiaries will be those policy agencies whose concerns cover the performance of the labour market. The following represents three agencies where the PI has had personal experience as an invited research presenter.
- Department of Work and Pensions (DWP): This department is responsible for welfare policies that are closely related to the economic prospects for workers in the low-skill labour market. As automation has developed, this has displaced the jobs of many low-skill workers. Changes in the scope of automation are set to increase this challenge and the research in this project will assist in identifying these trends.
- Department for Business, Innovation and Skills (BIS): This department deals in part with business-related trade and their interaction with skill demand in the economy. One major current policy challenge is offshoring, that is, the transfer of work to other non-UK locations. Again, progress in automation and communication are set to increase the future scope for off-shoring, with fewer jobs being constrained by the need for local geographical interactions.
- OECD: As the developed world's primary economic think tank the OECD is also concerned with off-shoring and low-skill work. The PI has an extensive network at the OECD and this research will be of detailed interest to them across policy areas.


10 25 50
Description Please note that I will update this section in May/June 2018. A key paper from the project is being finalised and will represent the core parts of the key findings. Anyone reading this before that time should feel free to contact me about the findings of the project or any other issues.
Exploitation Route See above - this section will be update in May/June 2020.
Sectors Communities and Social Services/Policy,Government, Democracy and Justice

Description Since the award has just recently ended the non-academic impacts are still in progress. However, the methodological tools developed in the project are well-position to make such an impact. This extends across two dimensions. Firstly, the classification model that was developed will play a role in 'letting text speak to data', that is building a bridge between text information and formal economic data. Secondly, the findings on the changing structure of tasks in the labour market will inform debates about the impact of automation on work.
First Year Of Impact 2018
Sector Communities and Social Services/Policy,Government, Democracy and Justice
Impact Types Economic,Policy & public services

Title Digitized Occupational Information 
Description This project has digitized 4 major waves of the Dictionary of Occupational Titles (DOT), a major information source describing thousands of occupations. The information has also been consolidated into a term-document matrix database for study with machine learning text analysis tools. This databases and the associated code will be made available at the publication stage of the award outputs. 
Type Of Material Database/Collection of data 
Provided To Others? No  
Impact The database will represent the first easily accessible digital version of historical occupation information. Currently, the majority of research on this topic has been focused on the 1991 version of the Dictionary of Occupational Titles (DOT). Making the historical data available will facilitate the analysis of occupational changes over time. 
Description Visiting Researcher, Alan Turing Institute 
Organisation Alan Turing Institute
Country United Kingdom 
Sector Academic/University 
PI Contribution I was part of the new Visiting Researcher scheme at the Alan Turing Institute (ATI) which is designed to foster cross-disciplinary links between statistics, computer science and the social science. I worked on this grant project (automation) as well as general methodological work on applying machine learning / natural language tools to social science problems. My research was presented internally at Turing and I participated in workshops that focused on the design of Turing's new economic and social data science research programme.
Collaborator Contribution The ATI provided office space and a forum to present my work. Most importantly, the ATI provided a network of researchers beyond the social sciences that I would not have been easily able to tap into otherwise. This enabled me to develop a deep network amongst statistics and computer science researchers at UCL and Warwick in particular, as well the in-house staff at ATI.
Impact na
Start Year 2017
Description Meetings with DWP and ONS data experts and policy analysts 
Form Of Engagement Activity A formal working group, expert panel or dialogue
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Professional Practitioners
Results and Impact I participated in the Turing Institute-HSBC Economic Data Science Start-Up Workshop. This was a meeting of academics and data/policy professionals from the economics, statistics and computer science disciplines. Specifically, it was well-attended by analysts and policy development staff from the Office of National Statistics (ONS) and the Department of Work and Pensions. I discussed my research on automation and the matching of text information with formal economic data with these staff and other Turing researchers. Two developments emerged from this. Firstly, I was contacted by ONS and DWP staff for further discussions that were centred directly on the research conducted for my award. In particular, these analysts / policy development staff were interested in the 'general purpose' applications of the methods that were developed as part of my grant research. Second, I was asked by the Alan Turing Institute (ATI) to be the Theme Leader for their 'Changing Nature of Work" research programme, which included some attached funding.
Year(s) Of Engagement Activity 2017