1. Main points

  • We have produced a new set of admin-based ethnicity statistics for 2016 (version 2) as part of our feasibility research.

  • Through combining ethnicity data from five administrative data sources (previously three), we were able to establish an ethnicity for 74.7% of individuals in the 2016 admin-based population base, up from 70.2% in version 1.

  • Including the 2011 Census as a data source further increases the proportion of the admin-based population with a stated ethnicity, to 84.7%.

  • There have been improvements in the admin-based ethnicity statistics for the Other and Chinese ethnic groups.

!

These research outputs are not official statistics on the population by ethnic group, nor are they used in the underlying methods or assumptions in the production of official statistics. Rather, they are published as outputs from research into a methodology, which is different to what is currently used in the production of ethnicity statistics. These outputs should not be used for policy or decision-making.

Back to table of contents

2. About our transformation research

The Office for National Statistics (ONS) does not currently produce annual statistics by local authority on the population by ethnic group, and the last official statistics available were from the 2011 Census. Our experimental statistics that primarily used Annual Population Survey (APS) data were published in December 2021, and Census 2021 estimates will be released later this year.

In August 2021, we published findings from our initial feasibility research on producing statistics on the population by ethnic group for England from administrative data. The research was based on linking ethnicity data from Hospital Episode Statistics (HES), English School Census (ESC) and Improving Access to Psychological Therapies (IAPT) to a 2016 admin-based population base, and implementing a set of rules to deal with multiple recorded ethnicities for an individual. A set of admin-based ethnicity statistics was produced for 2016 based on the proportion of people in each ethnic group; we refer to these as version 1. Full details of the previous method can be found in our methods article published alongside the research outputs.

Our initial admin-based ethnicity statistics showed early promise, but had the following issues:

  • the proportion of people with a stated ethnicity varied greatly by age, sex and local authority

  • the proportion of people in the Other ethnic group was much higher in the admin-based ethnicity statistics than was expected based on comparisons with the 2011 Census

  • there was under-representation of the Asian, Black and Mixed ethnic groups in those aged 20 to 24 years, those aged 25 to 29 years, and those aged 30 to 34 years

Since August 2021, we have been working to improve the 2016 admin-based ethnicity statistics by:

  • incorporating Higher Education Statistics Agency and Birth Notifications data, and linking on additional ethnicity information from Hospital Episode Statistics and the English School Census

  • trialling alternative approaches for dealing with multiple recorded ethnicities for an individual, and implementing an additional step in the process for those with a most recently recorded ethnicity of Any other ethnic group; this affected 1.0% of people with a stated ethnicity in 2016

  • refining our use of date information and the approach for combining data sources, to capture the most recently recorded ethnicity more accurately

Full details of the changes are available in our accompanying methodology changes article.

In this article, we present a new set of admin-based ethnicity statistics for 2016 (version 2) and compare them with those produced in version 1. We have also produced a time series of admin-based ethnicity statistics covering 2016 to 2020; these are presented in an accompanying article.

The research has been conducted for England only while we continue to work with the Welsh Government to acquire additional data for Wales. Scotland and Northern Ireland have devolved responsibility for producing ethnicity statistics, so are not covered by this research. However, we will proactively engage with colleagues in the devolved administrations who are also researching this topic.

This research forms part of our population and social statistics transformation programme, which aims to provide the best insights on population, migration and society using a range of data sources. The findings will form part of the evidence base for the 2023 National Statistician's Recommendation on the future of population and social statistics.

Back to table of contents

3. Population coverage

In version 2, we were able to establish an ethnicity for 74.7% of individuals in the admin-based population estimates (ABPE) v3.0 for 2016, up from 70.2% in our previous publication (version 1).

By age, the biggest improvements in coverage were for those aged under one year and those aged 17 to 26 years. For those aged under one year, this was on account of incorporating Hospital Episode Statistics data covering 1 April to 30 June 2016 and incorporating Birth Notifications data. For those aged 17 to 26 years, this was because of the inclusion of Higher Education Statistics Agency (HESA) data and amendments to the linkage methodology. This was to enable us to link on ethnicity data from older English School Census records.

Coverage remains significantly better for females than males. Coverage in version 2 is lowest for males aged 28 to 49 years at less than 60% for each single year of age in this age range.

The proportion of individuals in the ABPE with a stated ethnicity has increased in all local authorities. Cities with large student populations saw the biggest increases, with the proportion of the ABPE population with a stated ethnicity increasing the most in Oxford and Cambridge, with increases of 13.1 and 15.3 percentage points respectively. This was on account of incorporating HESA data. The City of London still has the lowest proportion of individuals with a stated ethnicity, but this has increased from 37.2% to 45.6%.

Figure 2: All local authorities have seen improvements in the proportion of people with a stated ethnicity

Proportion of people in the 2016 ABPE with a stated ethnicity in version 1 and version 2 of the admin-based ethnicity statistics, by local authority, England

Embed code

Notes:
  1. "Stated" refers to those with a stated ethnicity and no refusal on their most recent administrative data record.

  2. Version 1 refers to the admin-based ethnicity statistics produced using HES, IAPT and ESC data and published in August 2021.

  3. Version 2 refers to the admin-based ethnicity statistics produced using HES, IAPT, ESC, HESA and Birth Notifications data and with the new ethnicity selection rules.

  4. Local authority boundaries are as of 2021.

Download this chart

.xlsx

Back to table of contents

4. Ethnic breakdown

As there are no official statistics on the population by ethnic group for 2016, we have used the 2011 Census estimates as a comparator for the admin-based ethnicity statistics. We will use 2021 Census data when available.

When comparing the admin-based ethnicity statistics with the 2011 Census, it is important to bear in mind that differences could be caused by any of the following:

  • population change between 2011 and 2016

  • differences in reporting, recording, and mode of collection

  • lack of representativeness of the administrative data used to assign an ethnicity

  • differences in response options, particularly for the Gypsy, Roma, Irish Traveller and Arab ethnic groups

  • bias in the admin-based population estimates (ABPE)

Through implementing the new ethnicity selection rules (as detailed in our methodology changes article), the proportion of people in the five-category Other ethnic group in the admin-based ethnicity statistics decreased from 2.1% in version 1 to 1.2% in version 2. This is much closer to the 2011 Census proportion of 1.0%. The proportion of people in the Asian, Black and Mixed ethnic groups increased, particularly the Asian Other and Mixed Other ethnic groups.

In the admin-based ethnicity statistics version 1, the Asian ethnic group was under-represented in those aged 20 to 24 years, those aged 25 to 29 years, and those aged 30 to 34 years. This has improved in version 2 but there still appears to be some under-representation of the Indian ethnic group.

The proportion of people in the Chinese ethnic group in the admin-based ethnicity statistics has increased from 0.4% in version 1 to 0.7% in version 2, bringing it closer to the 2016 Annual Population Survey (APS) and 2011 Census proportions (Table 1). For those aged 20 to 34 years, the admin-based ethnicity data used previously were not capturing a large number of people of Chinese ethnicity. This has greatly improved through adding in the Higher Education Statistics Agency (HESA) data (Figure 4).

HESA data do not include a full breakdown of the White ethnic group (see our accompanying article for more information). Those recorded as being of White ethnicity in HESA were re-coded as White not specified and their ethnicity taken from another data source if available. However, 1.5% of people with a stated ethnicity ended up with a final ethnicity of White not specified. This means that the admin-based ethnicity statistics version 2 can only be used at the five-category level for the White ethnic group.

From August 2022, the ethnicity categories used in the HESA data collection are changing to align with the latest Census data collections in each country of the UK. In future, it will therefore be possible to incorporate data for students and still provide a breakdown of the White ethnic group.

The admin-based ethnicity statistics version 2 are an improvement on version 1 for the Other and Chinese ethnic groups and look promising overall. However, there are still some ethnic groups that appear to be over-represented in the admin data, such as Mixed Other, or under-represented, such as Irish.

Back to table of contents

5. Incorporating 2011 Census

In addition to producing the admin-based ethnicity statistics using administrative data only, as described in our accompanying article, we have produced a set of figures based on incorporating 2011 Census as an additional data source. This is because we want to make the best use of all available data sources and the 2011 Census is the most complete source of ethnicity data as at Census Day. It also demonstrates what may be possible in future using Census 2021 data to ensure we maximise the utility of this rich data source.

Incorporating the 2011 Census increased the proportion of people in the 2016 admin-based population estimates (ABPE) v3.0 with a stated ethnicity from 74.7% to 84.7%. There are two main drivers for this. One is a decrease in the number of people with an unknown ethnicity from 2.1 million to 470,000. The second is a decrease in the number of people not linked to an ethnicity data source from 6.7 million to 3.0 million.

Incorporating 2011 Census data improved coverage for males more than females. By age, the proportion with a stated ethnicity increased the most for those in their 40s, 50s and 60s.

Looking by local authority, after including 2011 Census data, the proportion of people in the 2016 ABPE with a stated ethnicity ranged from 53.9% in the City of London to 94.2% in North East Lincolnshire. The biggest increase was in South Cambridgeshire, where the proportion of people with a stated ethnicity increased by 16.7 percentage points, from 69.5% to 86.1%.

Figure 6: Incorporating 2011 Census data has improved coverage levels by local authority

Proportion of people in the 2016 ABPE with a stated ethnicity in the admin-based ethnicity statistics with and without 2011 Census data, by local authority, England

Embed code

Notes:
  1. "Stated" refers to those with a stated ethnicity and no refusal on their most recent administrative data record
  2. Local authority boundaries are as of 2021

    Download this chart
    .xlsx

These findings demonstrate the value of supplementing administrative data with Census data in terms of obtaining ethnicity data for more people. However, even after including 2011 Census data, we were still unable to assign an ethnicity to 15.3% of people. This shows the need to continue to expand the admin data sources used and explore methods to adjust for missingness.

Another limitation of supplementing administrative data with Census data, which applies to admin data too, is that changes in ethnic identity are not reflected until an individual has an updated admin data record. However, research by Simpson et al. found that 96% of people chose the same ethnic group in the 2011 Census as in the 2001 Census. The research also estimated that conscious change in identity only accounted for 5 to 10% of the changes. This suggests that this issue should be minimal. However, there may have been greater changes in ethnic identity in more recent years than between 2001 and 2011; this can be explored further once Census 2021 data are available.

Ethnic breakdown

Incorporating 2011 Census data had the biggest impact on the admin-based ethnicity statistics for the White British, White Irish, White not specified and Any other ethnic groups (Table 2). Although the change in the proportion of people in the White Irish group in Table 2 looks small, the number of people in the ABPE recorded as White Irish increased by 25.5%. The overall increase in the number of people with a stated ethnicity was 13.4%, so the White Irish ethnic group was disproportionately affected by the incorporation of 2011 Census data. The number of people recorded as White British increased by 16.1%.

The changes in the White not specified and Any other ethnic groups were mainly because of people in these ethnic groups being assigned their 2011 Census ethnicity rather than the ethnicity recorded in the admin data. This was part of the additional steps in our ethnicity selection methodology. This affected 0.7% of people in the ABPE. Those in the Any other ethnic group before including 2011 Census data were most commonly assigned to the Asian Other (20.1%), White British (26.2%) and White Other (23.9%) ethnic groups. Those in White not specified before including 2011 Census data were most commonly assigned to the White British (92.0%), White Other (6.5%) and White Irish (1.1%) ethnic groups.

Back to table of contents

6. Developing admin-based ethnicity statistics for England: 2016 data

Developing admin-based ethnicity statistics for England
Dataset | Released 23 May 2022
Data on population coverage and ethnic breakdowns, comparing different versions of the feasibility research on producing statistics on the population by ethnic group for England from administrative data.

Back to table of contents

7. Glossary

Ethnic group

The self-reported ethnic group of the individual, according to their own perceived ethnic group and cultural background.

Ethnicity refused

In the English School Census (ESC), it is recorded as "refused" if a parent or guardian, or pupil has declined to provide ethnicity data. In Hospital Episode Statistics (HES), Birth Notifications and Improving Access to Psychological Therapies (IAPT), where a patient chooses not to state their ethnicity, the code "Z - Not Stated" is recorded. In the Higher Education Statistics Agency (HESA) data, the code "98 Information Refused" is recorded.

Ethnicity stated

Ethnicity stated refers to the ethnicity being recorded as a specific ethnic group and not refused or unknown.

Ethnicity unknown

In the ESC, where the ethnicity has not yet been collected, this is recorded as "NOBT" (information not yet obtained). In HES, IAPT, and Birth Notifications, the default code "99 Not known" is used where the person's ethnicity is unknown. All blank and null ethnicity values in Birth Notifications were also treated as unknown. In HESA, “90 Not known” is used.

In this article, the unknown category also includes individuals with multiple recorded ethnicities where the rules did not lead to a final ethnicity being selected. These have been termed "ethnicity unresolved".

Ethnicity unresolved

Where multiple ethnicities were recorded on the latest date, these have been coded as "unresolved" and grouped into the "unknown" category for the analysis in this article.

Not linked

This refers to individuals who are in the admin-based population estimates (ABPE) v3.0 but have not been linked to any sources of ethnicity data.

Usually resident

As defined in our latest ABPE publication, we are currently adopting the UN definition of "usually resident". This is the place at which a person has lived continuously for at least 12 months, not including temporary absences for holidays or work assignments, or intends to live for at least 12 months (United Nations, 2008).

Version 1

Version 1 refers to the admin-based ethnicity statistics produced using HES, IAPT and ESC data and published in August 2021.

Version 2

Version 2 refers to the admin-based ethnicity statistics produced using HES, IAPT, ESC, HESA and Birth Notifications data and with the new ethnicity selection rules.

Back to table of contents

8. Data sources and quality

The admin-based ethnicity statistics were produced using administrative data sources. These are:

Ethnicity records from these admin data sources were linked to the 2016 admin-based population estimates (ABPE) v3.0 dataset based on a unique identifier. Records that did not link to the ABPE were dropped. See our accompanying methodology changes article for more information. Of those in the 2016 ABPE who could be linked to at least one of the ethnicity data sources, 77.7% of individuals had the same ethnicity on all records in the data and 13.4% had multiple recorded ethnicities within and across datasets. The remaining 8.9% of individuals only had "Unknown" or "Refused" on all ethnicity records. A method to select a final ethnicity per person was implemented, as described in the accompanying article.

Records where the final ethnicity was unknown or refused have been excluded when calculating the proportion of people in each ethnic group.

Population base

The 2016 ABPE v3.0 was used as the population base for the admin-based ethnicity statistics. It aims to approximate the usually resident population as at 30 June 2016. The quality of the population base will have an impact on the quality of the admin-based ethnicity statistics. More information about the coverage of the population base can be found in a previous report.

Annual Population Survey (APS)

The 2016 APS is a continuous household survey, comprising the Labour Force Survey (LFS) supplemented by sample boosts in England, Wales and Scotland to ensure small areas are sufficiently sampled. The APS does not include most people living in communal establishments (such as care homes or prisons) or anyone else living outside private households. Information on some students living in halls of residence is collected where the students' parents live in a sampled household.

Further information on the methods and data sources can be found in our accompanying article and in the previous publication.

Back to table of contents

9. Future developments

The research presented in this and our accompanying article continues to show promise for the ability to produce ethnicity statistics down to local authority level from administrative data. This would be an improvement on using survey data, where estimates can be unreliable at lower geographic levels because of small sample sizes. We will continue to explore how we can further improve upon the admin-based ethnicity statistics through:

  • incorporating additional data sources to improve the population coverage for England and expand coverage to Wales

  • exploring the potential to produce multivariate statistics on ethnicity by other characteristics

  • exploring methods to adjust for missingness in the admin data

  • exploring the potential to produce admin-based ethnicity statistics for smaller geographic areas

  • engaging with data suppliers to better understand and improve data collection practices

  • combining the administrative data with survey data using the Generalised Structure Preserving Estimator (GSPREE), building on previous work using this method

  • conducting public acceptability testing on ethnicity selection methods

  • using Census 2021 data to further assess the quality of the admin-based ethnicity statistics

Feedback

We welcome feedback on the admin-based ethnicity statistics and the planned future developments. Please email your feedback to Admin.Based.Characteristics@ons.gov.uk.

Back to table of contents

Contact details for this Article

Alison Morgan
Admin.Based.Characteristics@ons.gov.uk
Telephone: +44 1329 447187