Table of contents
- Main points
- Overview
- National findings from the ABPE time series
- National coverage differences between sexes in the ABPEs
- Local authority-level analysis
- Estimating flows from ABPE v2.0 and v3.0
- Measuring movement between local authorities in the ABPEs
- Glossary
- Data sources and quality
- Future developments
- Related links
1. Main points
We have produced admin-based population estimates (ABPEs) for 2016 to 2020, based on our two most recent ABPE methods (v2.0 and v3.0). Our analysis shows that both have strengths and limitations but produce results for the population of England and Wales that are broadly in line with official mid-year estimates (MYEs).
ABPE v2.0 is consistently higher than the mid-year estimates over 2016 to 2020 (up to 2% higher in 2020) and appears to inflate the population over time.
ABPE v3.0 remains lower than the MYEs (up to 2.4% lower in 2018) and although inflation is not seen in ABPE v3.0 this is at the expense of underestimating the population in some groups.
Both ABPE versions showed coverage patterns with local authorities (LAs) that were relatively stable over time in comparison with the MYEs. However, there were considerable differences in the coverage patterns across different LAs. Our analysis explores what might be contributing to the variation in coverage patterns, for example, student moves to and from university.
Linking consecutive years of the ABPE time series enables us to look more closely at records that appear and drop-off our datasets between years. Fewer people generally leave ABPE v2.0 between years, suggesting our method does not remove enough people when they leave England and Wales.
We see spikes in drop-offs and appearances around life transition ages (such as leaving school or reaching retirement) in both versions but this is more pronounced in ABPE v3.0 suggesting our method is very sensitive to activity changes.
Future ABPE versions will build on the strengths of these methods, and we will explore new approaches and data sources to address their limitations.
2. Overview
We are transforming the way we produce population and migration statistics to better meet the needs of our users. We aim to produce the best statistics from the best available data at any given point in time. Our research update provides background to all our latest research. The objective of our admin-based population estimate (ABPE) research is to approximate the usually resident population1 down to small areas with admin data. This article describes the progress we have recently made in understanding our ABPEs in more depth.
For the first time, we have produced a time series of our two current ABPE methods, referred to as ABPE v2.0 and ABPE v3.0. Each method uses a different set of rules to decide whether to include an individual in the usually resident population.
The different rules for each ABPE mean that they have different properties and we have previously published several articles describing their development. We have been able to analyse estimates from both methods and make comparisons with 2011 Census and official mid-year estimates (MYEs) to help us understand how well they relate to the target population.
In this article we explain how the time series has enabled us to extend our understanding of the stability and behaviour of the ABPEs over time.
This article:
recaps previous research and methods to produce the ABPEs
describes the findings observed in the time series
examines how moves within England and Wales in the ABPEs help us understand more about quality
explains the areas where we intend to focus our development next
More about the population and migration
- Research on the way we produce population and migration statistics.
- Latest estimates on migration into and out of the UK.
- View estimates of the UK population by country of birth and nationality.
- Find out about the future design of migration estimates.
- Research into estimating the student population.
Notes for: Overview
- We are currently adopting the UN definition of "usually resident" – that is, the place at which a person has lived continuously for at least 12 months, not including temporary absences for holidays or work assignments, or intends to live for at least 12 months (United Nations, 2008).
3. National findings from the ABPE time series
As expected from our earlier research, both admin-based population estimate (ABPE) v2.0 and ABPE v3.0, are broadly in line with the mid-year estimates over the 2016 to 2020 time series for England and Wales as a whole.
ABPE v2.0 is consistently higher than the mid-year estimates throughout the series (up to 2% higher in 2020) and ABPE v3.0 remains lower than the mid-year estimates (MYEs) (up to 2.4% lower in 2018). However, this hides patterns of over- and undercoverage shown in results by age, sex and local authority. When we look at the estimates over time there is clear evidence of some specific issues with each method.
ABPE v2.0
Our previous research showed that ABPE v2.0 typically overestimates the population of England and Wales as a whole compared with official estimates. This was particularly visible in males aged 30 to 54 years. This remains the case across the 2016 to 2020 time series.
It is important to recognise the differences in coverage between England and Wales separately to build a method that is fit for both. The coverage differences of ABPE v2.0 for England are similar to those for England and Wales as a whole. For Wales there is greater undercoverage for 18- to 30-year-olds, which means that overall ABPE v2.0 slightly underestimates the population there.
Figures 1 and 2 show the percentage difference between ABPE v2.0 and the MYEs for 2016 to 2020 by age. They show how overcoverage in ABPE v2.0 increases over the time series for 25- to 50-year-olds, particularly for England.
Figure 1: Admin-based population estimates v2.0 typically overestimate the population of England, particularly pronounced in older working ages; this percentage increases year-on-year
Percentage difference of admin-based population estimates v2.0 relative to mid-year estimates, England
Source: Office for National Statistics
Download this chart Figure 1: Admin-based population estimates v2.0 typically overestimate the population of England, particularly pronounced in older working ages; this percentage increases year-on-year
Image .csv .xls
Figure 2: Admin-based population estimates v2.0 for Wales contain greater undercoverage than England for 18- to 30-year-olds
Percentage difference of admin-based population estimates v2.0 relative to mid-year estimates, 2016 to 2020, Wales
Source: Office for National Statistics
Download this chart Figure 2: Admin-based population estimates v2.0 for Wales contain greater undercoverage than England for 18- to 30-year-olds
Image .csv .xlsFigure 3 demonstrates the effect of this on national coverage patterns by cohort.
When we compared ABPE v2.0 with MYEs by year of birth (Figure 3) we identified that each year ABPE v2.0 introduced more people born in the same year. While we would expect some net migration in the younger working age population (aged 20 to 40 years), the levels in ABPE v2.0 were much higher than in the MYEs and suggest that the method does not remove everyone who leaves England and Wales.
This may occur because the ABPE v2.0 method only requires people to appear on two data sources and does not check if they have any contact with those sources that would confirm their presence in the UK. Our analysis of flows into and out of our ABPEs found a considerably lower level of flows out of ABPE v2.0 than ABPE v3.0, particularly for males.
Figure 3: The percentage difference of admin-based population estimates v2.0 populations by year of birth to the mid-year estimates increases in each year of the time series
Percentage difference of admin-based population estimates v2.0 relative to the mid-year estimates, 2016 to 2020 for England and Wales
Source: Office for National Statistics
Download this chart Figure 3: The percentage difference of admin-based population estimates v2.0 populations by year of birth to the mid-year estimates increases in each year of the time series
Image .csv .xlsABPE v3.0
Figures 4 and 5 show how ABPE v3.0 typically underestimates the populations of England and Wales separately, compared with the MYEs, a consequence of stricter rules for including people than in ABPE v2.0. The undercoverage in Wales that was present in ABPE v2.0 for 18- to 30-year-olds is also visible in ABPE v3.0. We suspect a lack of interaction with admin data sources drives undercoverage patterns, and we need to do further research to understand and find a way to address this effectively. Across the time series, there is notable undercoverage of working age populations up to age 65 years. The improved coverage patterns after this point are likely to be a result of interactions with the State Pension system.
Figure 4: Admin-based population estimates v3.0 typically underestimate the population of England
Percentage difference of admin-based population estimates v3.0 relative to mid-year estimates, England
Source: Office for National Statistics
Download this chart Figure 4: Admin-based population estimates v3.0 typically underestimate the population of England
Image .csv .xls
Figure 5: Admin-based population estimates v3.0 typically underestimate the population of Wales, particularly in those aged 18 to 30 years
Percentage difference of admin-based population estimates v3.0 relative to the mid-year estimates, 2016 to 2020, Wales
Source: Office for National Statistics
Download this chart Figure 5: Admin-based population estimates v3.0 typically underestimate the population of Wales, particularly in those aged 18 to 30 years
Image .csv .xlsWhile some overcoverage remains in ABPE v3.0, Figure 6 shows that the numbers born in each year remain fairly stable over time. This suggests the activity-based rules are more likely to remove people who have left England and Wales, which is further highlighted in our flows analysis. We need to explore how using recent activity data may keep people in the population after they have left England and Wales, and whether this is offset by delays in including people when they first arrive.
Figure 6: The populations of cohorts by year of birth in admin-based population estimates v3.0 remain fairly stable over the time series
Percentage difference of admin-based population estimates v3.0 relative to mid-year estimates, England and Wales by year of birth
Source: Office for National Statistics
Download this chart Figure 6: The populations of cohorts by year of birth in admin-based population estimates v3.0 remain fairly stable over the time series
Image .csv .xlsDifferences between the ABPEs and MYEs are a net effect of both:
overcoverage and undercoverage of different population groups in the ABPE
greater uncertainty in the MYEs as we get further away from the 2011 Census
The next sections look at the differential coverage patterns for males and females within each cohort and specific local populations.
Back to table of contents4. National coverage differences between sexes in the ABPEs
A limitation of comparisons with the mid-year estimates (MYEs) is that this period is now some distance from the 2011 Census and uncertainty around the MYEs increases the further from the census base. We have therefore analysed sex ratios to further understand the plausibility of the admin-based population estimates (ABPEs).
Sex ratios are quite a sensitive measure. For ages most affected by international migration they can indicate where the ABPEs imply different net migration or coverage patterns between males and females. More information on how we use sex ratios appears in the Glossary.
Figures 7 and 8 show the sex ratio for each ABPE version and MYE by year of birth. Both ABPEs appear to underestimate males relative to females in cohorts born between 1985 and 2000, suggesting males of this age are less likely to appear on or interact with our administrative sources.
There are also important differences in the coverage of the ABPEs between males and females born between 1955 and 1980, suggesting underlying sex biases in the ABPE methods, which we need to address as we develop the ABPE rules.
Figure 7: There are differences in the coverage of males and females in admin-based population estimate v2.0
The sex ratios for admin-based population estimate v2.0 and the mid-year estimates, England and Wales
Source: Office for National Statistics
Download this chart Figure 7: There are differences in the coverage of males and females in admin-based population estimate v2.0
Image .csv .xlsWe can see how overestimating working-age populations in ABPE v2.0 affects the sex ratio against the MYEs over the time series. For most working-age cohorts, the sex ratio increases towards males over the time series suggesting the year-on-year population increases in those cohorts is driven mostly by males. So, there may be differences between the sexes in the presence on admin data sources and we find ABPE v2.0 is better at removing females that have left England and Wales than males.
Figure 8: There are differences in the coverage of males and females in admin-based population estimate v3.0
The sex ratios for admin-based population estimate v3.0 and the mid-year estimates, England and Wales
Source: Office for National Statistics
Download this chart Figure 8: There are differences in the coverage of males and females in admin-based population estimate v3.0
Image .csv .xlsWe see the impact the activity-based approach of ABPE v3.0 has on reducing overcoverage in ABPE v2.0 and bringing the sex ratios more in line with the MYEs for working ages. We also see the net effect of the underestimation of both males and females in ABPE v3.0 on the sex ratios for the younger working ages. The downward skew suggests an underestimation of males in the ABPEs is driving the lower sex ratios.
In 2020, there is a notable exception to the broadly similar patterns with MYEs. The 2020 ABPE v3.0 was produced using limited data for tax credits and housing benefit. More females than males claim these benefits and therefore we see the impact of the missing data in the male skew of the sex ratio.
Back to table of contents6. Estimating flows from ABPE v2.0 and v3.0
Having a consistent admin-based population estimate (ABPE) time series means that we can link consecutive years to look at moves into (appearances) and out of (drop-offs) the ABPE between years.
Figures 13 and 14 show that appearances and drop-offs of individuals in both versions between pairs of years are highest around the most mobile age groups.
Appearances and drop-offs from the ABPEs could be the combined effect of:
our methods capturing genuine international migration and cross-border movement
inconsistencies in our inclusion rules
a small amount of linkage error between years
We see peaks around the age individuals typically join or leave further or higher education or start work. This suggests that for both ABPE versions, there are lags in people registering and interacting with the sources, which will need to be understood and accounted for in our methods.
There is a higher proportion of both “appearances” and “drop-offs” in ABPE v3.0. This is likely to be driven by the activity rules. We will explore these patterns for particular age groups in more detail.
Figure 13: Drop-offs of indviduals in both versions of admin-based population estimates are highest around the typically most mobile ages and life transitions
Percentage of records of the admin-based population estimates present in the earlier of two years only, v2.0 and v3.0, England and Wales
Source: Office for National Statistics
Download this chart Figure 13: Drop-offs of indviduals in both versions of admin-based population estimates are highest around the typically most mobile ages and life transitions
Image .csv .xls
Figure 14: Appearances of individuals in both versions of admin-based population estimates are highest around the typically most mobile ages and life transitions
Percentage of records of the admin-based population estimates present in the later of two years only, v2.0 and v3.0, England and Wales
Source: Office for National Statistics
Download this chart Figure 14: Appearances of individuals in both versions of admin-based population estimates are highest around the typically most mobile ages and life transitions
Image .csv .xlsThis analysis gives further insight into the patterns of over- and undercoverage that we have seen earlier in this article. Our transformation research update introduces how we plan to use ABPEs and information about their quality within a Bayesian Demographic Accounting framework to bring coherence to our statistics.
Back to table of contents8. Glossary
Administrative data
Collections of data maintained for administrative reasons, for example, registrations, transactions, or record-keeping. They are used for operational purposes and their statistical use is secondary. These sources are typically managed by other government bodies.
Usually resident population
We are currently adopting the UN definition of "usually resident" -- that is, the place at which a person has lived continuously for at least 12 months, not including temporary absences for holidays or work assignments, or intends to live for at least 12 months (United Nations, 2008).
Sex ratio
The sex ratio (male:female) tells us how many males are present for each female in the population count, derived from actual counts of each sex. For example, there are 406,000 males and 403,000 females aged 36 years in our 2016 ABPE v2.0 count. This gives a sex ratio of 1.01.
RAPID
Registration and Population Interaction Database (RAPID) is a database created by the Department for Work and Pensions. It provides a single coherent view of interactions across the breadth of benefits and earnings datasets for anyone with a National Insurance number (NINo).
Personal Demographic Service (PDS)
This is the national electronic database of NHS patient details such as name, address, date of birth and NHS number (known as demographic information).
Back to table of contents9. Data sources and quality
We have previously produced research-based population estimates using administrative data by linking pseudonymised records between multiple datasets and applying a simple set of rules in an attempt to replicate the usually resident population.For the previous method to produce admin-based population estimates (ABPE) (previously Statistical Population Dataset (SPD) V2.0), four data sources were used:
the NHS Patient Register (PR)
the Department for Work and Pensions (DWP) Customer Information System (CIS)
data from the Higher Education Statistics Agency (HESA)
School Census data
Records found on two or more of these four data sources were included in the population.
For the development of ABPE v3.0 we have taken a data-driven approach to building our rules, identifying the data sources that provide the best coverage for a given age group. Also available is further information on the ABPE v3.0 data sources and the hierarchy with which they are applied.
Back to table of contents10. Future developments
We will be developing a new version of the admin-based population estimate (ABPE), using the strengths of ABPE v2.0 and ABPE v3.0 as well as introducing new methods and data sources. Alongside the analysis outlined in this article, this will be informed by our deeper understanding of internal moves, special populations and research into how life transitions are captured in admin data as well as how the coronavirus (COVID-19) pandemic has impacted this.
We will link the 2021 Census with administrative data sources to help us explore the quality of the administrative data, allow us to define the coverage adjustment problem that we need to solve, and allow us to further refine our ABPE rules.
We will also continue to develop our understanding of new data sources and how they might improve our ABPEs and address the coverage patterns outlined in this article.
Back to table of contents