Development and enhancement of Longitudinal Education Outcomes (LEO) data
Lead Research Organisation:
UNIVERSITY COLLEGE LONDON
Department Name: Learning and Leadership
Abstract
Understanding how much individuals and society benefit from different education and training courses is vital for governments weighing up investments in education and skills. Access to data with rich information on education, training and earnings is crucial to estimating these benefits, and having a large enough sample to consider whether the benefits vary across different groups (e.g. by socio-economic background) or different areas of the country is crucial in informing important policy decisions, such as the extent to which investment in education and skills for disadvantaged individuals or those living in 'left-behind' areas will help 'level up' the country.
We have access to such data in England, known as the Longitudinal Education Outcomes (LEO) data, which links together education records, benefit records and tax records. These data have provided crucial new insight into how much individuals and areas benefit from higher education, for example. But, to date, access to these data has been restricted to a relatively small number of individuals and organisations. The data are becoming more widely available, but the number and complexity of the datasets included as part of LEO presents a substantial barrier to new users, as it means they have to invest a lot of time in understanding the data before they are able to use it effectively, and may mean some important research questions go unanswered as a result.
Moreover, the data could be even more useful if we were able to incorporate additional information. For example, if we could include information on the places where individuals work - and who works with whom - then we could understand how much investment in education and training benefits people's colleagues, and the businesses in which they work. Similarly, if we were able to link in information about which individuals applied to university, and where, and compare this to the offers they received and where they went, we could understand more about the role of individual preferences and university decisions in generating the strong links evident between socio-economic background, education choices and later outcomes.
Our project will fill both of these gaps. Specifically, it will:
1. Enhance existing LEO data by:
a. Creating a simplified and consistent set of variables summarising important pieces of information from the data, such as measures of educational attainment, employment and earnings, that researchers can use to help get them started with their analysis.
b. Linking in new contextual data, such as about the areas in which individuals live.
c. Sharing documentation, code and metadata for these newly created variables (in a. and b.)
d. Creating and running an online forum through which current and potential users can find information about the data and future developments, and seek help from other users.
e. Providing introductory and advanced training events to build capacity in use of the data.
2. Link in new data, including on the places where individuals work and, for those who applied, information on their university applications and offers, and:
a. Incorporate this data into each of the elements outlined under 1. for existing LEO data, i.e. produce documentation and consistent variables for these new data; merge in additional relevant contextual data (e.g. on the 'quality' of the higher education institutions applied for and attended); and build awareness and capacity in use of this new data by incorporating information into the online forum and providing bespoke training events and resources.
b. Undertake new research to demonstrate the value of this new data in addressing important policy-relevant questions, such as on the link between education and business productivity, and whether policies which give lower university entry offers to students from more disadvantaged backgrounds are effective in improving outcomes for these individuals.
We have access to such data in England, known as the Longitudinal Education Outcomes (LEO) data, which links together education records, benefit records and tax records. These data have provided crucial new insight into how much individuals and areas benefit from higher education, for example. But, to date, access to these data has been restricted to a relatively small number of individuals and organisations. The data are becoming more widely available, but the number and complexity of the datasets included as part of LEO presents a substantial barrier to new users, as it means they have to invest a lot of time in understanding the data before they are able to use it effectively, and may mean some important research questions go unanswered as a result.
Moreover, the data could be even more useful if we were able to incorporate additional information. For example, if we could include information on the places where individuals work - and who works with whom - then we could understand how much investment in education and training benefits people's colleagues, and the businesses in which they work. Similarly, if we were able to link in information about which individuals applied to university, and where, and compare this to the offers they received and where they went, we could understand more about the role of individual preferences and university decisions in generating the strong links evident between socio-economic background, education choices and later outcomes.
Our project will fill both of these gaps. Specifically, it will:
1. Enhance existing LEO data by:
a. Creating a simplified and consistent set of variables summarising important pieces of information from the data, such as measures of educational attainment, employment and earnings, that researchers can use to help get them started with their analysis.
b. Linking in new contextual data, such as about the areas in which individuals live.
c. Sharing documentation, code and metadata for these newly created variables (in a. and b.)
d. Creating and running an online forum through which current and potential users can find information about the data and future developments, and seek help from other users.
e. Providing introductory and advanced training events to build capacity in use of the data.
2. Link in new data, including on the places where individuals work and, for those who applied, information on their university applications and offers, and:
a. Incorporate this data into each of the elements outlined under 1. for existing LEO data, i.e. produce documentation and consistent variables for these new data; merge in additional relevant contextual data (e.g. on the 'quality' of the higher education institutions applied for and attended); and build awareness and capacity in use of this new data by incorporating information into the online forum and providing bespoke training events and resources.
b. Undertake new research to demonstrate the value of this new data in addressing important policy-relevant questions, such as on the link between education and business productivity, and whether policies which give lower university entry offers to students from more disadvantaged backgrounds are effective in improving outcomes for these individuals.
| Description | Our grant has supported, and continues to support, the development of a range of training and capacity building resources to increase knowledge and understanding of education administrative data, with a particular focus on the Longitudinal Education Outcomes dataset. We have already delivered training to over 200 members of the research community, with around 35-40% coming from non-academic audiences - primarily central government departments, including the Department for Education, but also policymakers from Northern Ireland, local government, and a range of third sector organisations. Running our courses online means that we can also more easily reach stakeholders from around the country, including Manchester, Edinburgh, Bristol and Swansea, and we have provided follow-up support to a number of participants, enabling them to move forward with data applications. The partnership we have developed with the LEO programme team at the Department for Education - and in particular the secondment arrangements that we worked to put in place - have enabled us to build new partnerships and collaborations, including with other parts of the Department, increasing the impact of our work on this grant. For example, we have been an integral part of the team working to understand the key requirements of a new administrative dataset to measure the Opportunity Mission's key metric of intergenerational income mobility, and were instrumental in identifying the chosen solution, which is now being implemented. Our collaborations with the former Unit for Future Skills (now Skills England) and HE access team are further evidence of the opportunities and potential for impact generated by being embedded within the Department, which this grant facilitated. These activities would likely not have occurred in the absence of this grant. |
| First Year Of Impact | 2023 |
| Sector | Education |
| Impact Types | Societal Policy & public services |
| Description | Development and delivery of introduction to GRADE course |
| Geographic Reach | National |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | We improved the knowledge of participants on our course, enabling them to apply for and use GRADE data more effectively, with the potential to increase the quantity and quality of research in the public benefit. |
| URL | https://www.eventbrite.com/cc/cepeo-adruk-administrative-data-training-courses-3888843 |
| Description | Development and delivery of introduction to LEO course |
| Geographic Reach | National |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | We improved the knowledge of participants on our course, enabling them to apply for and use LEO data more effectively, with the potential to increase the quantity and quality of research in the public benefit. |
| URL | https://www.eventbrite.com/cc/cepeo-adruk-administrative-data-training-courses-3888843 |
| Description | Development and delivery of introduction to NPD and its linked data course |
| Geographic Reach | National |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | We improved the knowledge of participants on our course, enabling them to apply for and use NPD, LEO, GRADE and GUiE data more effectively, with the potential to increase the quantity and quality of research in the public benefit. |
| URL | https://www.ucl.ac.uk/ioe/departments-and-centres/centres/centre-education-policy-and-equalising-opp... |
| Description | Development and delivery of more detailed introduction to NPD course |
| Geographic Reach | National |
| Policy Influence Type | Influenced training of practitioners or researchers |
| Impact | We improved the knowledge of participants on our course, enabling them to apply for and use NPD data more effectively, with the potential to increase the quantity and quality of research in the public benefit. |
| URL | https://www.eventbrite.com/cc/cepeo-adruk-administrative-data-training-courses-3888843 |
| Description | Invited to sit on ADRUK youth transitions community catalyst steering group |
| Geographic Reach | National |
| Policy Influence Type | Participation in a guidance/advisory committee |
| Description | Invited to sit on Data Access and Engagement Programme advisory group |
| Geographic Reach | National |
| Policy Influence Type | Participation in a guidance/advisory committee |
| Description | Invited to sit on LEO and PPMD Integration and Development Project Board |
| Geographic Reach | National |
| Policy Influence Type | Participation in a guidance/advisory committee |
| Description | Investing In Digital Skills For Research: Education Administrative Data And BeYond (IDS-READY) |
| Amount | £407,876 (GBP) |
| Funding ID | UKRI306 |
| Organisation | United Kingdom Research and Innovation |
| Sector | Public |
| Country | United Kingdom |
| Start | 09/2024 |
| End | 03/2027 |
| Description | Collaboration with the Department for Education's HE access team |
| Organisation | Department for Education |
| Country | United Kingdom |
| Sector | Public |
| PI Contribution | We are updating analysis of the role of prior attainment in driving inequalities in access to higher education. Previous work undertaken by our team is still regarded as the 'go-to' source of information on this for the Department, despite being about 10 years old. We are adding to the team's evidence base for policy development by updating these findings and also extending them to consider mature learners. |
| Collaborator Contribution | Staff at DfE facilitated access to more recent years of HESA data than are available via the Secure Research Service, and are providing the conduit through which these results can be shared with ministers and policy officials in the Department. |
| Impact | The work is still ongoing. The collaboration is multi-disciplinary, including both analysts (social scientists) and policy officials (disciplines unknown). |
| Start Year | 2024 |
| Description | Collaboration with the former Unit for Future Skills |
| Organisation | Department for Education |
| Country | United Kingdom |
| Sector | Public |
| PI Contribution | We are providing quality assurance for a new linked administrative-survey dataset - the Longitudinal Education Outcomes dataset linked to the Annual Survey of Hours and Earnings - as well as contributing new research findings on a topic of interest to the Unit for Future Skills (now Skills England). |
| Collaborator Contribution | Staff at the Unit for Future Skills facilitated access to the LEO-ASHE data and shared their knowledge and understanding of the data, and code, to support us with our work. |
| Impact | The work is still ongoing. The collaboration is primarily with economists. |
| Start Year | 2024 |
| Description | Partnership with the Department for Education LEO Programme team |
| Organisation | Department for Education |
| Country | United Kingdom |
| Sector | Public |
| PI Contribution | All members of the UCL team funded by this grant are on part-time secondment to the LEO Programme team at the Department for Education (DfE), who are responsible for the development and sharing of LEO data. We are fully embedded within the LEO Programme team, with our work on this grant supporting them to deliver their objectives to enhance the usability and use of LEO data amongst the external research community. With the arrival of the new government, this team's remit expanded to consider how to develop data to capture the Opportunity Mission's key metric for success, which our team are also supporting them to deliver. |
| Collaborator Contribution | The LEO Programme team at DfE are responsible for developing and enhancing the LEO data and supporting resources to serve the needs of researchers both inside and outside government. They are the gatekeepers to the LEO data and provide the conduit through which we can access data to derive new variables and create shareable code, improve documentation, and liaise with contacts elsewhere within DfE (e.g. to determine the best approach to creating and sharing synthetic data) and in other government departments (e.g. ONS, to explore how we can share code with researchers inside the SRS). |
| Impact | So far the partnership has resulted in co-delivery of various training and capacity building events detailed elsewhere, including the ADRUK public engagement discussion around LEO in September 2022; the grant launch event in October 2022; the ADRUK pre-conference workshop in November 2023; the LEO training courses in March 2024 and January 2025; the updated LEO gov.uk webpages and user guide. Ongoing work includes the creation and sharing of derived variables, the creation and sharing of a low fidelity synthetic version of the LEO data, and the development of a new linked data resource which will bring together family income in childhood and adult earnings with a view to being able to estimate intergenerational income mobility. The team are also a partner on the further funding we have secured to extend our training and capacity building activities (detailed elsewhere). This collaboration is with a non-academic partner whose expertise centres around project management and delivery, so is multi-disciplinary. |
| Start Year | 2022 |
| Description | ADRUK Cambridge workshop |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Policymakers/politicians |
| Results and Impact | I gave a talk as part of an ADRUK Workshop on Administrative Data for Public Policy Research organised by the University of Cambridge. The purpose was to raise awareness of different types of education administrative data and how they could be used to address policy-relevant research questions. Audiences were engaged and asked a number of questions about the data. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.educ.cam.ac.uk/events/workshops/adruk/ |
| Description | ADRUK blog on value of LEO data |
| Form Of Engagement Activity | Engagement focused website, blog or social media channel |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Postgraduate students |
| Results and Impact | Authored a blog, published on the ADRUK website, highlighting the innovative nature of the LEO data with a view to increasing its use for research in the public benefit. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.adruk.org/news-publications/news-blogs/how-the-longitudinal-education-outcomes-data-is-b... |
| Description | ADRUK pre-conference workshop 2023 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Policymakers/politicians |
| Results and Impact | We successfully applied to host a pre-conference workshop at the ADRUK conference in November 2023. This also acted as a second LEO user group meeting. The purpose was to bring together the LEO user community again, share developments and receive feedback on future plans. The main outcome was to raise awareness of the new iteration of LEO data that had recently been made available to external researchers. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://virtual.oxfordabstracts.com/#/event/4218/program?session=79378&s=269 |
| Description | ADRUK public engagement discussion on LEO |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Third sector organisations |
| Results and Impact | ADRUK convened a group of third sector organisations whose work could usefully be informed by LEO data/research to understand their perceptions of its value, and any risks they identified in it being used for research purposes. A report summarising the discussion was subsequently published (URL below). |
| Year(s) Of Engagement Activity | 2022 |
| URL | https://www.adruk.org/fileadmin/uploads/adruk/Documents/PE_reports_and_documents/LEO_report_key_mess... |
| Description | ADRUK training and capacity building workshop |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Participated in a workshop organised by ADRUK with a view to sharing information about training and capacity building resources and discussing ways in which to enhance the coverage and reach of these activities in future. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Blog to accompany LEO I2 release |
| Form Of Engagement Activity | Engagement focused website, blog or social media channel |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Media (as a channel to the public) |
| Results and Impact | I wrote a blog for the ADRUK website to accompany the launch of the second iteration of LEO data. The aim was to promote the data, particularly the new datasets that had been linked in, drawing attention to its potential to address policy-relevant research questions for the public good. |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://www.adruk.org/news-publications/news-blogs/new-longitudinal-education-outcomes-data-made-ava... |
| Description | DfE HE team workshop |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Policymakers/politicians |
| Results and Impact | Participated in a workshop hosted by the Department for Education's higher education team to share exemplar research being undertaken as part of our grant which is relevant to their portfolio, including on university admissions and the 'spillover' effects of tertiary education. |
| Year(s) Of Engagement Activity | 2025 |
| Description | Dialogue with FFT re. synthetic data |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Third sector organisations |
| Results and Impact | We met with representatives from FFT Education Datalab several times to discuss plans for the creation of NPD and LEO synthetic data. This ongoing dialogue resulted in a partnership on a new funding application (detailed in the relevant section) which we successfully submitted to an invited call to UKRI's Digital Research Infrastructure programme to co-create further training and capacity building activities, including new high fidelity synthetic data subsets. |
| Year(s) Of Engagement Activity | 2024,2025 |
| Description | Dialogue with UCAS re. richer offers data |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Third sector organisations |
| Results and Impact | I worked with colleagues in the Department for Education to put forward a business case to the Universities and Colleges Admissions Service to extend the data currently available for linkage in LEO. This dialogue has continued intermittently, but has not yet resulted in any new data being shared. We are continuing to pursue these discussions. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Dialogue with Wage and Employment Dynamics team re. synthetic data |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | The purpose of the dialogue was to share information about the creation and sharing of low fidelity synthetic data, to identify best practice. The discussion was used to inform the creation of a low fidelity synthetic version of the LEO data. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Expert coding group |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Third sector organisations |
| Results and Impact | We convened a group of existing LEO users to understand the extent of existing resources (such as code, derived variables) that could potentially be shared with the external research community, and to identify gaps that our grant could most usefully fill. This discussion resulted in a number of individuals sharing code to add to a code repository, and also helped to shape the focus of our efforts to create new exemplar code. |
| Year(s) Of Engagement Activity | 2023 |
| Description | First LEO user group meeting/grant launch event |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Third sector organisations |
| Results and Impact | We held an event to launch the grant, which also doubled as an introductory LEO user group meeting. The purpose was to bring together individuals from a range of organisations interested in LEO for the purposes of undertaking, commissioning or using research, to offer greater insight into the LEO data, and to share information regarding planned future developments and obtain feedback on these plans, including around prospective data linkages and training and capacity building activities. |
| Year(s) Of Engagement Activity | 2022 |
| Description | NCRM DTRN webinar |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Postgraduate students |
| Results and Impact | Presented at a Data Resources Training Network - Exploring Educational Outcomes through National Datasets - to improve understanding of the LEO data amongst the research community, particularly postgraduate students, with a view to increasing use of the data for research in the public benefit. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.ncrm.ac.uk/resources/video/?id=4977 |
| Description | Participation in LEO cross-government steering group meetings |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Policymakers/politicians |
| Results and Impact | We participated in LEO cross-government steering group meetings in May 2024 and November 2024. The purpose of the group is to discuss the development of LEO data in England (and other UK nations) with a range of interested parties, including other government departments/bodies who contribute data to LEO (e.g. DWP, HMRC, Jisc). We shared information regarding our plans to develop a low fidelity synthetic version of the LEO and sought permission from all data owners to undertake this endeavour, which we secured. This enabled us to move ahead with plans to create and share a low fidelity synthetic version of the LEO data. |
| Year(s) Of Engagement Activity | 2024 |
| Description | Presentation at Higher Education Access and Funding conference |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Policymakers/politicians |
| Results and Impact | We presented findings from an exemplar research project using the LEO data. The purpose was to inform public debate and policy decision-making in relation to higher education access and funding. The event highlighted commonalities and differences in the challenges faced by very different HE funding systems (e.g. England vs. US), and sparked ideas regarding how best funding and access challenges could be addressed. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://global.georgetown.edu/events/higher-education-access-and-funding-challenges-and-policy-optio... |
| Description | Project advisory board |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Third sector organisations |
| Results and Impact | We convened an advisory board to inform the work undertaken on the grant. The group consisted of representatives from third sector organisations (e.g. Edge Foundation, Education Policy Institute, FFT Education Datalab, National Foundation for Educational Research, Resolution Foundation, Sutton Trust), government departments (e.g. HM Treasury, the Office for National Statistics) and non-governmental organisations (e.g. Office for Students). These individuals represented organisations with expertise and/or interest in the LEO data, who could provide insight into the needs of data users, to ensure the grant delivered outputs of greatest value to the external research community. The group has met four times to date (in October 2022, May 2023, January 2024 and January 2025) and has informed the types of training and capacity building activities we will deliver, as well as future data linkages. |
| Year(s) Of Engagement Activity | 2022,2023,2024,2025 |
| Description | RES DTP data workshop |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Postgraduate students |
| Results and Impact | Presented at a Royal Economic Society Doctoral Training Partnership event - Databases for Research Economists - to improve understanding of the LEO data amongst the research community, particularly postgraduate students, with a view to increasing use of the data for research in the public benefit. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://res.org.uk/committees/education-training-committee/res-doctoral-training-programme/expert-wo... |
| Description | RES conference special session 2023 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Policymakers/politicians |
| Results and Impact | We co-organised a special session at the Royal Economic Society conference in Glasgow in 2023, one of the aims of which was to showcase the power of LEO data to a wide range of audiences. We presented work using LEO and also hosted a policy panel discussion, with members including Osama Rahman, former Chief Analyst and Chief Scientific Advisor at the Department for Education. The purpose of the event was to raise awareness of the LEO data and to showcase the type of policy-relevant research questions it can address. |
| Year(s) Of Engagement Activity | 2023 |
| Description | RES conference special session 2024 |
| Form Of Engagement Activity | Participation in an activity, workshop or similar |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Policymakers/politicians |
| Results and Impact | We organised and ran a special session on Diversity and Productivity at the Royal Economic Society conference in Belfast in March 2024, with Osama Rahman, Director of ONS's Data Science Campus and Head of Diversity for the Government Economic Service. This showcased the value of LEO data for addressing policy-relevant questions and sparked interest among audience members about the data and how it could be used for research in future. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://virtual.oxfordabstracts.com/event/4880/program |
| Description | Updated gov.uk webpages for LEO |
| Form Of Engagement Activity | Engagement focused website, blog or social media channel |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Public/other audiences |
| Results and Impact | We worked with the LEO programme team at the Department for Education to update and reorganise the information shared regarding the LEO data via the gov.uk webpages. We provided summary information about what the data is, in addition to how to apply to access it, and made the information easier to find and navigate. The purpose was to increase the use of LEO data for research in the public benefit. |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://www.gov.uk/government/publications/longitudinal-education-outcomes-leo-dataset/longitudinal-... |
