These data source overviews are intended to give a high-level view of new data sources included in our Administrative Data Research Outputs. The emphasis is on the statistical quality of the source and how this affects the scope of its use in producing research outputs, rather than the operational quality of the data source. It is anticipated that this overview will be updated in future years as our understanding and use of the data progresses.
1. Overview
A feasibility version of the All Education Dataset for England (AEDE) was created and supplied by the Department for Education (DfE) to enable the Office for National Statistics (ONS) to investigate the potential of these administrative data to provide information on educational qualifications. This information requirement is currently met through statistics from the census, social surveys and statistics produced by the DfE.
Data on qualifications are used to:
inform government policy on education, for example, evidence-based policy making in relation to disadvantaged population groups
allocate government resources
help target employment and training schemes such as targeting educational interventions to areas with low skill levels
identifying groups that lack the skills necessary to join the workforce
improving the quality of occupation coding
The feasibility AEDE supplied to the ONS has been created from three main sources: the national pupil database (NPD), Individualised Learner Record data (ILR) and higher education data collected by Higher Education Statistics Agency (HESA).
The national pupil database
The national pupil database (NPD) is an administrative datastore that is held by the DfE and includes school census and attainment information from the Young Person's Matched Administrative Dataset (YPMAD). Students' socio-demographic characteristics are obtained from the termly school census, pupil referral unit and alternative provision censuses. These are linked to attainment data recorded by awarding bodies.
Individualised Learner Record
Individualised Learner Record (ILR) data are collected by the DfE. These data are primarily used to underpin funding and commissioning decisions; all providers of further education in England must return ILR data for learners who receive funding from the government. ILR data includes socio-demographic characteristics of individuals in further education and work-based learning in England and attainment information.
HESA higher education data
Higher education data are collected by Higher Education Statistics Agency (HESA). All government-funded higher education institutes in the UK are required to send data to HESA as well as further education institutes where higher education is delivered. Information on the socio-demographic characteristics of students and any qualifications obtained are recorded. Higher education data included in the feasibility AEDE are for Great Britain only.
The feasibility AEDE supplied includes multiple entries for the same student since it includes a record for every academic year of study; these entries have been linked by the DfE. The matching exercises undertaken achieved very high linkage rates, which varied between the different data sources and cohorts and were around 95% on average.
In line with the approach set out in Beyond 2011: Matching Anonymous Data (PDF, 319KB), all personal identifiers in the feasibility AEDE held by the ONS are anonymised (made non-identifiable) to ensure confidentiality; the method used ensures identifiable information is not revealed but can be used for data linkage.
2. Data sharing arrangements
The feasibility All Education Dataset for England (AEDE) supplied to the Office for National Statistics (ONS) was delivered via a secure transfer facility in November 2017. Access to the data is only given to ONS analysts who meet a set of security standards. We publish information on how the ONS look after and use data for public benefit.
The legal gateways for sharing these data from the Department for Education (DfE) to ONS were:
Section 537A of the Education Act 1996 and Regulation 6(d) of the Education (Individual Pupil Information) (Prescribed Persons) (England) Regulations 2009
Section 47 of the Statistics and Registration Service Act 2007 and the Statistics and Registration Service Act 2007 (Disclosure of Pupil Information) (England) Regulations 2009
Sections 87 and 89 of the Education and Skills Act 2008
A feasibility AEDE was provided for the purposes of supporting research and outputs relating to:
the production of population statistics under Section 20 of the Statistics and Registration Service Act 2007
the making of arrangements for a census under Section 2 of the Census Act 1920
the assessment of the census returns
From the end of November 2018, it was agreed between the DfE and ONS that the feasibility AEDE held by the ONS could be used for wider research beyond that agreed initially. The legal gateway for sharing became Section 45A of the Statistics and Registration Service Act 2007 (as inserted by the Digital Economy Act 2017).
The Digital Economy Act 2017 amended the Statistics and Registration Service Act 2007 to provide the ONS with greater and easier access to a range of data sources held within the public and private sectors, improving the quality and usability of official statistics and National Statistics. The Act creates a legal gateway for data owners to provide access to data they hold for us to fulfil our statistical functions. The amended legislation also established the statutory conditions to enable us to work in partnership with data holders to identify and address the main security, privacy and resource implications of our access to data.
In addition to setting out strict limitations on the use of data provided in this way, the legislation also reinforced sanctions for the misuse of data and the main protections set out in the Data Protection Act 1998. These safeguards collectively ensure that data holders and the public can be confident that data will be used in a proportionate and accountable fashion to support the production of statistics and statistical research for the public good.
3. Content
The feasibility All Education Dataset for England (AEDE) held by the Office for National Statistics (ONS) links individuals across data sources. All data that enables personal identification such as name, postcode of residence, gender, and date of birth are obscured and replaced with anonymous identifiers to ensure confidentiality.
The national pupil database (NPD) provides further characteristics such as the ethnicity and first language of the student, cumulative attainment by academic year (number and level of qualifications obtained), and information on whether the student was on an apprenticeship.
Individualised Learner Record (ILR) data provides further characteristics such as the ethnicity of the student, type of qualification being studied, learning outcome and outcome grade.
Higher Education Statistics Agency (HESA) data provide further characteristics such as ethnicity; location; mode of study (full-time or part-time); term-time accommodation; whether exchange student; highest qualification on entry; qualification aim; whether qualification awarded; classification of degree; and the subject studied.
4. Coverage
The feasibility All Education Dataset for England (AEDE) held by the Office for National Statistics (ONS) contains the national pupil database (NPD) for the academic year ending 2002 to the academic year ending 2015; Individualised Learner Record (ILR) and Higher Education Statistics Agency (HESA) data are included for the academic year ending 2003 to the academic year ending 2015.
The dataset includes individuals aged between 14 and 29 years on 31 August 2015. For older learners, data for their school activity is not held in a way that enables consistent linkage with the other datasets; older learners are recorded in HESA and ILR data but cannot be linked back to their school record.
NPD coverage in feasibility AEDE
The NPD data included covers pupils in schools maintained by local authorities, academies and free schools in England in an academic year. Pupils who have never interacted with government-funded education will not be included, for example, those who have only ever attended independent schools or been electively home-educated.
ILR Coverage in feasibility AEDE
All providers of further education in England must return ILR data for learners who receive funding from the government, even if they have only attended one episode1 of learning.
Further education colleges must send data for all learners, including those that are not funded by the government. Consequently, some learners who are not funded by government are included in ILR data, such as those undertaking learning subcontracted-in to the college by a local authority or on behalf of another training provider, for example, adult education programmes that help people gain sustainable employment and courses that form part of an apprenticeship.
All higher education institutes must return ILR data for learners funded through Advanced Learner Loans. These are adults who receive loans to cover tuition fees for a range of courses including A levels, general and vocational qualifications, and access to Diplomas of Higher Education.
HESA coverage in feasibility AEDE
All government-funded higher education institutes are required to send data to HESA as well as further education institutes where higher education is delivered. The data covers students who were actively following a course in Great Britain at some time during each HESA reporting period, starting in August and finishing in July the following calendar year. Students awarded a qualification in an academic year when they are not actively following a course (they may have studied in the previous academic year) will also be present in HESA data the academic year their qualification was awarded.
Students studying outside of the UK for the whole of their course are not recorded.
5. Statistical use in Administrative Data Research Outputs
The feasibility All Education Dataset for England (AEDE) held by the Office for National Statistics (ONS) is being used to explore the potential of administrative data to provide information on educational qualifications. We are exploring its use as a replacement for collecting such information in censuses and surveys. Admin-based qualification statistics, feasibility research: England outlines research comparing information on educational qualifications, specifically highest level of qualification, from administrative data, the 2011 Census and the Annual Population Survey.
6. Next steps
We continue to work closely with the Department for Education (DfE), as the data supplier, to understand the potential for ongoing supply of these data and how we can use an All Education Dataset for England (AEDE) to improve our statistics. The Office for National Statistics (ONS) needs to demonstrate that the feasibility AEDE can provide statistical information on educational attainment to the required quality for the development of administrative data-based outputs for the characteristics of the population. This is part of our ambitious programme of work to use administrative data and non-survey sources to better understand society and to ensure we are ready to make recommendations to government on the future of the census and population statistics in 2023.
Notes for ILR Coverage in feasibility AEDE:
- If a learner withdraws without completing one episode of learning, for example, without attending the first class, then they will not be included in the ILR.