1. Main points

  • We have produced admin-based population estimates (ABPEs) for 2016 to 2020, based on our two most recent ABPE methods (v2.0 and v3.0). Our analysis shows that both have strengths and limitations but produce results for the population of England and Wales that are broadly in line with official mid-year estimates (MYEs).

  • ABPE v2.0 is consistently higher than the mid-year estimates over 2016 to 2020 (up to 2% higher in 2020) and appears to inflate the population over time.

  • ABPE v3.0 remains lower than the MYEs (up to 2.4% lower in 2018) and although inflation is not seen in ABPE v3.0 this is at the expense of underestimating the population in some groups.

  • Both ABPE versions showed coverage patterns with local authorities (LAs) that were relatively stable over time in comparison with the MYEs. However, there were considerable differences in the coverage patterns across different LAs. Our analysis explores what might be contributing to the variation in coverage patterns, for example, student moves to and from university.

  • Linking consecutive years of the ABPE time series enables us to look more closely at records that appear and drop-off our datasets between years. Fewer people generally leave ABPE v2.0 between years, suggesting our method does not remove enough people when they leave England and Wales.

  • We see spikes in drop-offs and appearances around life transition ages (such as leaving school or reaching retirement) in both versions but this is more pronounced in ABPE v3.0 suggesting our method is very sensitive to activity changes.

  • Future ABPE versions will build on the strengths of these methods, and we will explore new approaches and data sources to address their limitations.

Back to table of contents

2. Overview

We are transforming the way we produce population and migration statistics to better meet the needs of our users. We aim to produce the best statistics from the best available data at any given point in time. Our research update provides background to all our latest research. The objective of our admin-based population estimate (ABPE) research is to approximate the usually resident population1 down to small areas with admin data. This article describes the progress we have recently made in understanding our ABPEs in more depth.

For the first time, we have produced a time series of our two current ABPE methods, referred to as ABPE v2.0 and ABPE v3.0. Each method uses a different set of rules to decide whether to include an individual in the usually resident population.

The different rules for each ABPE mean that they have different properties and we have previously published several articles describing their development. We have been able to analyse estimates from both methods and make comparisons with 2011 Census and official mid-year estimates (MYEs) to help us understand how well they relate to the target population.

In this article we explain how the time series has enabled us to extend our understanding of the stability and behaviour of the ABPEs over time.

This article:

  • recaps previous research and methods to produce the ABPEs

  • describes the findings observed in the time series

  • examines how moves within England and Wales in the ABPEs help us understand more about quality

  • explains the areas where we intend to focus our development next

More about the population and migration

Notes for: Overview

  1. We are currently adopting the UN definition of "usually resident" – that is, the place at which a person has lived continuously for at least 12 months, not including temporary absences for holidays or work assignments, or intends to live for at least 12 months (United Nations, 2008).
Back to table of contents

3. National findings from the ABPE time series

As expected from our earlier research, both admin-based population estimate (ABPE) v2.0 and ABPE v3.0, are broadly in line with the mid-year estimates over the 2016 to 2020 time series for England and Wales as a whole.

ABPE v2.0 is consistently higher than the mid-year estimates throughout the series (up to 2% higher in 2020) and ABPE v3.0 remains lower than the mid-year estimates (MYEs) (up to 2.4% lower in 2018). However, this hides patterns of over- and undercoverage shown in results by age, sex and local authority. When we look at the estimates over time there is clear evidence of some specific issues with each method.

ABPE v2.0

Our previous research showed that ABPE v2.0 typically overestimates the population of England and Wales as a whole compared with official estimates. This was particularly visible in males aged 30 to 54 years. This remains the case across the 2016 to 2020 time series.

It is important to recognise the differences in coverage between England and Wales separately to build a method that is fit for both. The coverage differences of ABPE v2.0 for England are similar to those for England and Wales as a whole. For Wales there is greater undercoverage for 18- to 30-year-olds, which means that overall ABPE v2.0 slightly underestimates the population there.

Figures 1 and 2 show the percentage difference between ABPE v2.0 and the MYEs for 2016 to 2020 by age. They show how overcoverage in ABPE v2.0 increases over the time series for 25- to 50-year-olds, particularly for England.

Figure 3 demonstrates the effect of this on national coverage patterns by cohort.

When we compared ABPE v2.0 with MYEs by year of birth (Figure 3) we identified that each year ABPE v2.0 introduced more people born in the same year. While we would expect some net migration in the younger working age population (aged 20 to 40 years), the levels in ABPE v2.0 were much higher than in the MYEs and suggest that the method does not remove everyone who leaves England and Wales.

This may occur because the ABPE v2.0 method only requires people to appear on two data sources and does not check if they have any contact with those sources that would confirm their presence in the UK. Our analysis of flows into and out of our ABPEs found a considerably lower level of flows out of ABPE v2.0 than ABPE v3.0, particularly for males.

ABPE v3.0

Figures 4 and 5 show how ABPE v3.0 typically underestimates the populations of England and Wales separately, compared with the MYEs, a consequence of stricter rules for including people than in ABPE v2.0. The undercoverage in Wales that was present in ABPE v2.0 for 18- to 30-year-olds is also visible in ABPE v3.0. We suspect a lack of interaction with admin data sources drives undercoverage patterns, and we need to do further research to understand and find a way to address this effectively. Across the time series, there is notable undercoverage of working age populations up to age 65 years. The improved coverage patterns after this point are likely to be a result of interactions with the State Pension system.

While some overcoverage remains in ABPE v3.0, Figure 6 shows that the numbers born in each year remain fairly stable over time. This suggests the activity-based rules are more likely to remove people who have left England and Wales, which is further highlighted in our flows analysis. We need to explore how using recent activity data may keep people in the population after they have left England and Wales, and whether this is offset by delays in including people when they first arrive.

Differences between the ABPEs and MYEs are a net effect of both:

  • overcoverage and undercoverage of different population groups in the ABPE

  • greater uncertainty in the MYEs as we get further away from the 2011 Census

The next sections look at the differential coverage patterns for males and females within each cohort and specific local populations.

Back to table of contents

4. National coverage differences between sexes in the ABPEs

A limitation of comparisons with the mid-year estimates (MYEs) is that this period is now some distance from the 2011 Census and uncertainty around the MYEs increases the further from the census base. We have therefore analysed sex ratios to further understand the plausibility of the admin-based population estimates (ABPEs).

Sex ratios are quite a sensitive measure. For ages most affected by international migration they can indicate where the ABPEs imply different net migration or coverage patterns between males and females. More information on how we use sex ratios appears in the Glossary.

Figures 7 and 8 show the sex ratio for each ABPE version and MYE by year of birth. Both ABPEs appear to underestimate males relative to females in cohorts born between 1985 and 2000, suggesting males of this age are less likely to appear on or interact with our administrative sources.

There are also important differences in the coverage of the ABPEs between males and females born between 1955 and 1980, suggesting underlying sex biases in the ABPE methods, which we need to address as we develop the ABPE rules.

We can see how overestimating working-age populations in ABPE v2.0 affects the sex ratio against the MYEs over the time series. For most working-age cohorts, the sex ratio increases towards males over the time series suggesting the year-on-year population increases in those cohorts is driven mostly by males. So, there may be differences between the sexes in the presence on admin data sources and we find ABPE v2.0 is better at removing females that have left England and Wales than males.

We see the impact the activity-based approach of ABPE v3.0 has on reducing overcoverage in ABPE v2.0 and bringing the sex ratios more in line with the MYEs for working ages. We also see the net effect of the underestimation of both males and females in ABPE v3.0 on the sex ratios for the younger working ages. The downward skew suggests an underestimation of males in the ABPEs is driving the lower sex ratios.

In 2020, there is a notable exception to the broadly similar patterns with MYEs. The 2020 ABPE v3.0 was produced using limited data for tax credits and housing benefit. More females than males claim these benefits and therefore we see the impact of the missing data in the male skew of the sex ratio.

Back to table of contents

5. Local authority-level analysis

Our analysis has also enabled us to extend earlier published work to further understand the quality of our admin-based population estimates (ABPEs) for local authority (LA) populations over time.

At the LA level in many areas, the ABPEs generally show similar coverage patterns to those identified in the national analysis, such as overcoverage for working-age males in ABPE v2.0 and undercoverage across many age groups in ABPE v3.0. However, our previous analysis has shown that there is considerable variation at LA level. This occurs as people move from one area to another and there are associated lags in updating addresses in the administrative sources. We see the combined effect of the overall coverage and address lags in the different coverage patterns in our ABPEs. Figure 9 shows our ABPE counts for both versions by sex and age for each Local Authority in England and Wales.

Figure 9: The coverage of our admin-based population estimates shows some variation at local authority level

Admin-based population estimates for both versions by Local Authority, age and sex against official mid-year estimates

Embed code

Download the data

Both ABPE versions showed relatively stable coverage patterns within LAs over the time series in comparison with the MYEs. However, there were considerable differences in the coverage patterns seen across different LAs, and where LAs showed extreme differences these tended to be seen across the whole time series.

This suggests that the most challenging population groups and areas to measure, identified in earlier research, were still relevant when looking over the whole time series. This will help us focus our research on the most difficult to measure populations and areas. Over the last year, we have begun to explore some of the population groups where there are the biggest challenges.

Students

Students are typically a young, mobile population group, which can considerably influence local populations during term-time.

Figure 10 shows the contribution that student-age individuals make to the population of Ceredigion. The mid-year estimates (MYEs) show a larger population of people in their late twenties than the ABPEs, a pattern observed in many LAs with a student presence. Further research is needed to confirm whether the methods used to adjust internal migration for student leavers is causing this, or whether the ABPEs are influenced by the effect of lags in recent graduates updating their addresses in admin data after leaving their student locations. We have targeted research to ensure we accurately identify and measure students' activity in admin data, to inform our ABPEs.

Figure 10: Our admin-based population estimates show variable coverage patterns in Ceredigion around the student and younger working ages

Admin-based population estimates v2.0 and v3.0 against mid-year estimates, for Ceredigion, Wales

Embed code

Download the data

Communal establishments

LAs that contain communal establishments, where accommodation is under part- or full-time supervision, such as halls of residence or military bases, are also areas where it is consistently challenging to measure population change accurately.

Population groups that live in communal establishments often have unique interactions with admin data, which can make it difficult identifying and placing these residents in the ABPEs.

Figure 11 shows how inaccurately capturing the military bases in Richmondshire in ABPE v2.0 has influenced the LA-level estimate. ABPE v2.0 underestimates the high number of late-teen to early-40s males present in the official MYEs. Armed forces personnel are not covered by the NHS Patient Register data, so these individuals are very unlikely to appear on the two data sources required for inclusion in ABPE v2.0. By comparison, ABPE v3.0 requires interaction with just a single source and has much better coverage of Richmondshire.

Figure 11: Admin-based population estimate v2.0 underestimates the large presence of males aged late-teen to early-40s in Richmondshire

Admin-based population estimates v2.0 and v3.0 against mid-year estimates for Richmondshire, England

Embed code

Download the data

Population churn

Coverage of LAs characterised by high population churn, such as some London boroughs, can be complicated by the changing demographics and lags in updating administrative sources with a new address. Figure 12 shows the population pyramid for Haringey, a typical LA in London with high levels of population churn. We can see evidence of these lags in the overcoverage in both ABPE versions of adults aged 20 to 40 years.

Figure 12: Both admin-based population estimate versions show overcoverage of some age groups in Haringey, particularly younger females

Admin-based population estimates v2.0 and v3.0 against mid-year estimates, for Haringey, England

Embed code

Download the data

Back to table of contents

6. Estimating flows from ABPE v2.0 and v3.0

Having a consistent admin-based population estimate (ABPE) time series means that we can link consecutive years to look at moves into (appearances) and out of (drop-offs) the ABPE between years.

Figures 13 and 14 show that appearances and drop-offs of individuals in both versions between pairs of years are highest around the most mobile age groups.

Appearances and drop-offs from the ABPEs could be the combined effect of:

  • our methods capturing genuine international migration and cross-border movement

  • inconsistencies in our inclusion rules

  • a small amount of linkage error between years

We see peaks around the age individuals typically join or leave further or higher education or start work. This suggests that for both ABPE versions, there are lags in people registering and interacting with the sources, which will need to be understood and accounted for in our methods.

There is a higher proportion of both “appearances” and “drop-offs” in ABPE v3.0. This is likely to be driven by the activity rules. We will explore these patterns for particular age groups in more detail.

This analysis gives further insight into the patterns of over- and undercoverage that we have seen earlier in this article. Our transformation research update introduces how we plan to use ABPEs and information about their quality within a Bayesian Demographic Accounting framework to bring coherence to our statistics.

Back to table of contents

7. Measuring movement between local authorities in the ABPEs

Linking consecutive years of the time series also enables us to look at the movement of people remaining within the admin-based population estimate (ABPE).

We have measured "transitions" from one local authority (LA) to another over two consecutive reference years. The ABPEs do not provide information on address changes within the year, so it is not possible to directly compare them with the official estimates for internal migration within the UK, which aim to include all within-year address changes between LAs. However, comparing trends can provide useful insight to support our understanding of admin-based estimates.

Proportions of internal moves in the ABPE v2.0 and ABPE v3.0 counts shows a similar age distribution to the official estimates, in Figure 15, suggesting that our methods broadly capture moves at the most mobile ages. We see the greatest movement at student ages (18 to 24 years) and peaks in transitions seen in 19- and 22-year-olds can be attributed to expected movement of to and from university, respectively. We would not expect the proportions of internal migration in the ABPE to match official estimates because of the different methods.

As percentages of their population counts, ABPE v3.0 produces more transitions than ABPE v2.0 for most ages and both sexes. As ABPE v3.0 includes records based on activity and results in lower counts than ABPE v2.0, and one of the indicators of activity used is an address change, this is not surprising. However, further exploration of moves within the ABPEs are needed to fully understand the patterns observed.

2021 admin-based population estimate and the impact of the pandemic on admin data

So far, we have compared the ABPE time series with mid-year estimates (MYEs) from the final five years of the intercensal period. Toward the end of this period, the MYEs have increased uncertainty. We will compare our 2021 ABPEs with estimates from the 2021 Census once both are available. We also plan to link the 2021 Census to administrative data sources in a sample of areas, so that we can explore the coverage errors that remain after we have improved our methods in more detail.

To understand possible impacts of the coronavirus (COVID-19) pandemic on our ABPEs, we have begun researching how admin data over the pandemic period behave compared with pre-coronavirus trends.

Currently the Personal Demographic Service (PDS) is the only source where we have enough data to analyse. We have linked records in the PDS month-by-month to identify trends in internal moves and monthly rates of interactions (measured by general practitioner (GP) registrations) over the pandemic period. This linkage shows a considerable decrease in moves and GP registrations to below pre-coronavirus levels from April 2020, the start of the pandemic. Net inflows and outflows for all regions fall except for flows into Wales. We expect movement to have been restricted over 2020 by national lockdowns and the closure of the housing market for a period.

However, internal moves and GP registrations both rise above pre-coronavirus levels in early 2021. This may be linked to the roll-out of the vaccination programme.

These findings indicate we might expect some volatility in our activity-based ABPEs over the pandemic period. We plan to extend this research, once later months of PDS data and additional data sources become available. We expect that this insight will allow us to better understand our 2021 ABPEs and how to interpret any changes we see since 2020.

Back to table of contents

8. Glossary

Administrative data

Collections of data maintained for administrative reasons, for example, registrations, transactions, or record-keeping. They are used for operational purposes and their statistical use is secondary. These sources are typically managed by other government bodies.

Usually resident population

We are currently adopting the UN definition of "usually resident" -- that is, the place at which a person has lived continuously for at least 12 months, not including temporary absences for holidays or work assignments, or intends to live for at least 12 months (United Nations, 2008).

Sex ratio

The sex ratio (male:female) tells us how many males are present for each female in the population count, derived from actual counts of each sex. For example, there are 406,000 males and 403,000 females aged 36 years in our 2016 ABPE v2.0 count. This gives a sex ratio of 1.01.

RAPID

Registration and Population Interaction Database (RAPID) is a database created by the Department for Work and Pensions. It provides a single coherent view of interactions across the breadth of benefits and earnings datasets for anyone with a National Insurance number (NINo).

Personal Demographic Service (PDS)

This is the national electronic database of NHS patient details such as name, address, date of birth and NHS number (known as demographic information).

Back to table of contents

9. Data sources and quality

We have previously produced research-based population estimates using administrative data by linking pseudonymised records between multiple datasets and applying a simple set of rules in an attempt to replicate the usually resident population.For the previous method to produce admin-based population estimates (ABPE) (previously Statistical Population Dataset (SPD) V2.0), four data sources were used:

  • the NHS Patient Register (PR)

  • the Department for Work and Pensions (DWP) Customer Information System (CIS)

  • data from the Higher Education Statistics Agency (HESA)

  • School Census data

Records found on two or more of these four data sources were included in the population.

For the development of ABPE v3.0 we have taken a data-driven approach to building our rules, identifying the data sources that provide the best coverage for a given age group. Also available is further information on the ABPE v3.0 data sources and the hierarchy with which they are applied.

Back to table of contents

10. Future developments

We will be developing a new version of the admin-based population estimate (ABPE), using the strengths of ABPE v2.0 and ABPE v3.0 as well as introducing new methods and data sources. Alongside the analysis outlined in this article, this will be informed by our deeper understanding of internal moves, special populations and research into how life transitions are captured in admin data as well as how the coronavirus (COVID-19) pandemic has impacted this.

We will link the 2021 Census with administrative data sources to help us explore the quality of the administrative data, allow us to define the coverage adjustment problem that we need to solve, and allow us to further refine our ABPE rules.

We will also continue to develop our understanding of new data sources and how they might improve our ABPEs and address the coverage patterns outlined in this article.

Back to table of contents

Contact details for this Article

Ann Blake
pop.info@ons.gov.uk
Telephone: +44 1329 444661