Creating harmonised and scalable methods and tools for constructing households in large diverse administrative and health research datasets
Lead Research Organisation:
Swansea University
Department Name: Institute of Life Science Medical School
Abstract
The COVID pandemic has shown how important household circumstances such as over-crowding are for physical and mental health and education. It has also highlighted the benefits of using information collected routinely by public services for research.
At present it is difficult to use this routine information for research into household circumstances such as over-crowding because we can't easily identify households and join information on housing to information on health and education. We need better tools and routine datasets to carry out research on households, especially on households with the most pressing health, social and housing problems.
One way to do this is by using 'Unique Property Reference Numbers' - UPRNs. Every property in the UK has a UPRN - a string of 12 numbers - which, unlike postcodes, are unique to an individual property whether it is a care home, flat, house, shop, or school. We can add UPRNs to the addresses patients provide when they register with a general practitioner. The UPRNs can then be coded or 'encrypted' to protect privacy and allow research to be safely carried out.
UPRNs are a key that help us to link information about people who share the same household. They can also be used to link households and the health data of their members to housing information - for example number of rooms or floor space recorded routinely for council tax assessments - so that we can identify overcrowded households.
We are a group of researchers from Scotland, Wales and London who have worked on UPRNs together. We now plan to create a new dataset and tools - currently unavailable - to identify households and their social, economic, environmental, educational, and health circumstances for researchers to use. Our project sets out to help researchers use UPRNs safely and answer some important questions about households.
We will develop some examples to show how useful this by looking at how living in households with more health, social and housing problems affects children's physical and mental health, their school attendance and achievements, and their need for social care. We will also use it to understand where the most vulnerable households are in our communities and how this might affect their access to and use of the public services they need.
We will work with our local communities to explain UPRNs and their benefits more clearly and understand how they want us to use UPRNs for public benefit and to hear about and address any concerns they may have about their safe use, including those relevant to privacy.
At present it is difficult to use this routine information for research into household circumstances such as over-crowding because we can't easily identify households and join information on housing to information on health and education. We need better tools and routine datasets to carry out research on households, especially on households with the most pressing health, social and housing problems.
One way to do this is by using 'Unique Property Reference Numbers' - UPRNs. Every property in the UK has a UPRN - a string of 12 numbers - which, unlike postcodes, are unique to an individual property whether it is a care home, flat, house, shop, or school. We can add UPRNs to the addresses patients provide when they register with a general practitioner. The UPRNs can then be coded or 'encrypted' to protect privacy and allow research to be safely carried out.
UPRNs are a key that help us to link information about people who share the same household. They can also be used to link households and the health data of their members to housing information - for example number of rooms or floor space recorded routinely for council tax assessments - so that we can identify overcrowded households.
We are a group of researchers from Scotland, Wales and London who have worked on UPRNs together. We now plan to create a new dataset and tools - currently unavailable - to identify households and their social, economic, environmental, educational, and health circumstances for researchers to use. Our project sets out to help researchers use UPRNs safely and answer some important questions about households.
We will develop some examples to show how useful this by looking at how living in households with more health, social and housing problems affects children's physical and mental health, their school attendance and achievements, and their need for social care. We will also use it to understand where the most vulnerable households are in our communities and how this might affect their access to and use of the public services they need.
We will work with our local communities to explain UPRNs and their benefits more clearly and understand how they want us to use UPRNs for public benefit and to hear about and address any concerns they may have about their safe use, including those relevant to privacy.
Publications
Firman N
(2023)
Is obesity more likely among children sharing a household with an older child with obesity?
in International Journal of Population Data Science
Griffiths LJ
(2024)
Children and young people's body mass index measures derived from routine data sources: A national data linkage study in Wales.
in PloS one
Harper G
(2024)
Determining households from patient addresses and unique property reference numbers in general practitioner electronic health records
in International Journal of Population Data Science
MacRae C
(2025)
Impact of household size and co-resident multimorbidity on unplanned hospitalisation and transition to care home.
in Nature communications
Wilk M
(2025)
Inequalities in household overcrowding in an ethnically diverse urban population: a cross-sectional study using linked health and housing records
in International Journal of Population Data Science
| Description | Welsh Government Welsh Index of Multiple Deprivation (WIMD) 2025 Physical Environment Domain scientific advisory group |
| Geographic Reach | National |
| Policy Influence Type | Participation in a guidance/advisory committee |
| URL | https://www.gov.wales/proposed-indicators-welsh-index-multiple-deprivation-wimd-2025-html#157843 |
| Description | Child and adolescent Health Impacts of Learning Indoor environments under net zero: The CHILI Hub. |
| Amount | £5,460,988 (GBP) |
| Funding ID | MR/Z50645X/1 |
| Organisation | Medical Research Council (MRC) |
| Sector | Public |
| Country | United Kingdom |
| Start | 02/2025 |
| End | 02/2030 |
| Description | Maternal And preGnancy hEalth aNd elevaTed heAt (MAGENTA): novel data-linkages to understand how temperature impacts pregnancy outcomes for people living in deprived communities in Wales and London |
| Amount | £2,202,473 (GBP) |
| Funding ID | 228009/Z/23/Z |
| Organisation | Wellcome Trust |
| Sector | Charity/Non Profit |
| Country | United Kingdom |
| Start | 01/2024 |
| End | 12/2026 |
| Title | Implementation of ASSIGN address matching software into the SAIL databank |
| Description | Using the QMUL ASSIGN Algorithm in the SAIL infrastructure to improve address matching and cleaning. |
| Type Of Material | Improvements to research infrastructure |
| Year Produced | 2024 |
| Provided To Others? | Yes |
| Impact | Improvements to the SAIL and SeRP infrastructure which support address matching and household analyses. |
| URL | https://github.com/SwanseaUniversityMedical/ASSIGN |
| Title | East London Database |
| Description | The East London Database is an annual snapshot of data that contains basic health and demographic information for all 2.4 million people registered with a GP practice in North East London. Deidentified data is extracted from GP electronic health records from across the North East London NHS region on 1 April each year. The database has been produced annually since 2010 but in February 2025 was updated to include a household flag using methods developed through the Healthy Households project. The database is made available to accredited local authority public health analysts across north east London to enable secure access to deidentified data for public health purposes including Joint Strategic Needs Assessments. |
| Type Of Material | Database/Collection of data |
| Year Produced | 2025 |
| Provided To Others? | Yes |
| Impact | Too early to addess |
| URL | https://www.qmul.ac.uk/ceg/data-resources/east-london-database/ |
| Title | ASSIGN |
| Description | Address cleaning and UPRN matching algorithm |
| Type Of Technology | Software |
| Year Produced | 2023 |
| Open Source License? | Yes |
| Impact | Improved address matching in Wales and Scotland TREs |
| URL | https://github.com/endeavourhealth-discovery/ASSIGN |
| Description | Administrative Data Research UK |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Talk on the use of linked geospatial data in SAIL |
| Year(s) Of Engagement Activity | 2023 |
| Description | Advisory role to Scottish Government on anonymised spatial data linkages. |
| Form Of Engagement Activity | A formal working group, expert panel or dialogue |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Scottish government approached me to act in an advisory capacity on the spatial data linkages we have developed in SAIL to understand how they can be replicated in Scotland. |
| Year(s) Of Engagement Activity | 2024,2025 |
| Description | Conference presentation: David Clark, International Population Data Linkage Network (IPDLN) Conference (September, 2024 Chicago): CURL Extensions - Weaving a Unique Property Reference Number into Address Histories in the Scottish Community Health Index |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | David Clark presented a paper: CURL Extensions - Weaving a Unique Property Reference Number into Address Histories in the Scottish Community Health Index |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://ipdln.org/2024-conference/ |
| Description | Conference presentation: Marta Wilk, ADRUK (Nov 2023), Validation of a dynamic method of measuring households and populations from primary care Electronic Health Records: Cross-sectional comparison with Office for National Statistics Census 2021 estimates |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Other audiences |
| Results and Impact | Marta Wilk, conference presentation at ADRUK conference (Nov, 2023, Birmingham): Validation of a dynamic method of measuring households and populations from primary care Electronic Health Records: Cross-sectional comparison with Office for National Statistics Census 2021 estimates (Authors: Marta Wilk, Gill Harper, Nicola Firman, Chris Dibben, Rich Fry, Carol Dezateux) |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://ijpds.org/article/view/2189 |
| Description | Conference presentation: Marta Wilk, International Population Data Linkage Network (IPDLN) Conference (September, 2024 Chicago): Associations between household overcrowding and adult mental illness in an ethnically diverse urban population: cross-sectional study using linked primary care and housing records |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Marta Wilk presented a paper: Associations between household overcrowding and adult mental illness in an ethnically diverse urban population: cross-sectional study using linked primary care and housing records |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://ipdln.org/2024-conference/ |
| Description | Conference presentation: Marta Wilk, SSM, 2024, Household associations between child and adult weight status in an ethnically diverse urban population: cross-sectional study using linked primary care and National Child Measurement Programme records |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Other audiences |
| Results and Impact | Marta Wilk gave a talk entitled "Household associations between child and adult weight status in an ethnically diverse urban population: cross-sectional study using linked primary care and National Child Measurement Programme records" to academics and researchers at the Society for Social Medicine Conference (Newcastle, September 2023). Co-authors: Gill Harper, Nicola Firman, Silvia Liverani, Carol Dezateux |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://jech.bmj.com/content/77/Suppl_1/A129.2 |
| Description | Conference presentation: Nicola Firman, ADRUK Nov 2023, Is obesity more likely among children sharing a household with an older child with obesity? |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Other audiences |
| Results and Impact | Nicola Firman presentation at ADRUK Conference (Nov, 2023), on behalf of research team Nicola Firman, Marta Wilk, Milena Marszalek, Lucy Griffiths, Gill Garper, Carol Dezateux: Is obesity more likely among children sharing a household with an older child with obesity? |
| Year(s) Of Engagement Activity | 2023 |
| URL | https://ijpds.org/article/view/2203 |
| Description | Conference presentation: Nicola Firman, International Population Data Linkage Network (IPDLN) Conference (September, 2024 Chicago): Residential mobility and receipt of measles, mumps and rubella vaccination: analysis of linked primary care electronic health records in a disadvantaged London region |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Other audiences |
| Results and Impact | Nicola Firman gave a presentation: Residential mobility and receipt of measles, mumps and rubella vaccination: analysis of linked primary care electronic health records in a disadvantaged London region |
| Year(s) Of Engagement Activity | 2024 |
| URL | https://ipdln.org/2024-conference/ |
| Description | Conference presentation: Nicola Firman, SSM, 2024, Residential mobility and receipt of measles, mumps and rubella vaccination: analysis of linked primary care electronic health records in a disadvantaged London region |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Nicola Firman presented a paper: Residential mobility and receipt of measles, mumps and rubella vaccination: analysis of linked primary care electronic health records in a disadvantaged London region |
| Year(s) Of Engagement Activity | 2024 |
| Description | Invited talk on the methodological challenges of linking environmental data to linked data |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | International |
| Primary Audience | Professional Practitioners |
| Results and Impact | Invited to speak at an international talk in Munich, Germany on the challenges of linking environmental data to health data as part of the HDR UK Helmholtz collaboration. |
| Year(s) Of Engagement Activity | 2025 |
| Description | Scottish Adminstrative Data Research |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | A talk on the use of linked environmental data and geospatial methods in SAIL. |
| Year(s) Of Engagement Activity | 2023 |
| Description | Society for Social Medicine (September 2024, Glasgow). (Wilk) Associations between household overcrowding and adult mental illness in an ethnically diverse urban population: cross-sectional study using linked primary care and housing records |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | National |
| Primary Audience | Professional Practitioners |
| Results and Impact | Marta Wilk presented a paper: Associations between household overcrowding and adult mental illness in an ethnically diverse urban population: cross-sectional study using linked primary care and housing records |
| Year(s) Of Engagement Activity | 2024 |
| Description | Swansea University Medical School Seminar Series |
| Form Of Engagement Activity | A talk or presentation |
| Part Of Official Scheme? | No |
| Geographic Reach | Local |
| Primary Audience | Professional Practitioners |
| Results and Impact | A seminar series on the use of linked environmental and health data. |
| Year(s) Of Engagement Activity | 2024 |
