1. Overview

Official statistics have traditionally been based on national surveys, censuses or, more recently, the operational data collected by government in the administration of its services such as welfare, tax records, GP patient register etc.

With the aim of reducing data collection costs, government is seeking to increase the role of administrative data, including that found in the private sector or on the internet. One of the data sources of interest to producers of official statistics is that generated from mobile phones as their ownership and use are relatively ubiquitous across the UK population.

When mobile phones are switched on they passively generate digital information such as geo-location, timestamp and other call details, which are transmitted to the mobile network operator (MNO) to which the mobile is subscribed. As mobile phones tend to be carried around by their users, there is great research interest in the potential of using mobile phone data (MPD) to inform on a variety of population-based characteristics. The advantages of MPD over survey data is that it is potentially available in close to real time and for small areas.

There are many challenges to accessing MPD for research as it is regarded as personal data and subject to a high level of data protection. The legislation governing this varies from country to country, and for the UK, there are additional restrictions borne by EU data privacy regulations.

Favourably, there are precedents in making MPD available for research in a variety of non-disclosive formats, each raising varied emphasis on issues around data protection, processing capability and ethics.

This paper summarises some of the international research using MPD with a focus on its relevance to official statistics. The potential use of new data sources such as MPD is not restricted to replacing existing statistics. This data may be used to create new indicators, or to improve validation, timeliness or calibration for existing ones. It might offer new insights into population behaviour in time and space that lead to improved official statistics. Used in combination with traditional surveys, this data may provide the opportunity to reduce sample sizes and result in savings in cost and respondent burden.

Recent comprehensive reviews into MPD research are a study by Blondel et al 1 covering recent advances in the study of mobile phone datasets and a study commissioned by Eurostat [28] on the potential of using mobile phone positioning data within tourism statistics. Both these studies discuss the strengths and weaknesses of MPD in detail and although these may vary according to each specific application, they can be summarised as follows:

Strengths

Passive data collection

MPD is a by-product of normal mobile phone use and using it within the production of estimates incurs no burden for the user in contrast to responding to a traditional survey.

Superior coverage of the population

Ofcom statistics estimate that at the beginning of 2015, 93% of the adult population owned a personal mobile phone.

Such a high coverage of the population gives almost total insight into some population mobile-phone-use behaviours and enables observation of behaviours that may be missed with traditional surveys.

Timeliness

Statistics derived using MPD could be more timely than those based on a survey or census. This might lead to quick indicators or nowcasting approaches for existing statistics.

Accuracy of data collection

Location and time measurement can be more accurate than that collected through a travel survey (or similar).

Small geographies

Due to high population coverage, MPD is capable of generating insights into population behaviour at much smaller geographies than traditional surveys (research literature suggests reliable call or population density MPD estimates for around MSOA1 level, but might be as small as LSOA2 in areas with densely distributed cell towers).3

Multiple applications

MPD lends itself to many applications, from observing networks based on social and spatial interactions, to population densities and mobility. Uses include urban and transport planning, tourism statistics and epidemiological research.

Consistent

Tourism statistics generated from MPD have been found to be consistent with official estimates over time (Eurostat report [28]).

Weaknesses

Data access

Without sufficient data access legislation, the complexity of access to MPD and uncertainty around continued access is the overriding concern amongst National Statistical Institutes in the EU (Eurostat report [28]).

Issues include:

  • personal data protection

  • data security

  • ethical concerns

  • commercial sensitivities

  • cost

  • technical infrastructure required for processing and warehousing

Inference

MPD is observational and inference is needed to convert it into a defined concept, as required for official statistics. For example, when observing that a mobile phone tends to make the same journey every workday, it may be inferred that this is a “commuting” journey. However, this is not known for certain and so some error in the inference will be present.

Bias

Under and over coverage issues due to mobile phone behaviours. For example, some users text multiple times in an hour whilst others do not for days on end.

Uncertain quality of estimates

It is difficult to assess the quality of MPD statistics as there may not be comparator data. For example, MPD has been used to derive population estimates by time of day. Weighting of the MPD estimates typically involves modelling the relationship between MPD night-time densities with official data on the residential population. Daytime or seasonal MPD population estimates are then produced using the same MPD densities or population relationships but cannot easily be cross-referenced with other data.

The Eurostat report [28] on using MPD to develop tourism estimates indicates that they cannot currently be fully compliant with the principles and procedures stipulated in the Code of Practice for Official Statistics relating to any EU country. The report concludes that MPD can currently only supplement official statistics on tourism.

This paper starts with a background to the type of data being used in MPD research before summarising the main findings from the review, followed by a brief conclusion and next steps. There follows an appendix that is organised in sections that relate to broad topic areas identified as having the most relevance to official statistics. Given the large number of research papers and the wide scope for official statistics, the references represent only a selection of what is publically available.

In conclusion to this research it is observed that, without legislative data access or funding to purchase this data, partnering with MNOs is the only feasible way forward to test the quality of MPD statistical products. Eurostat research [28] across other National Statistical Institutes (NSIs) recommends that this might start with small projects, in order for the NSI to demonstrate value to the MNO and to build up trust that may lead to bigger collaborations.

It is recommended that the focus for research should be in areas that are of high interest to both official statistics and to the MNO. There are 2 suitable applications, internationally acknowledged across statistical organisations: using MPD to derive population estimates and population flows. In the UK, the main MNOs are already producing MPD population estimates by time of day and are also modelling transport flows for transport planning projects.

Census data produces information on the commuting patterns of workers. Specifically, it produces flows of workers between their area of residence and workplace. This flow data is called the travel-to-work origin-destination data and is available by main mode of transport along with other demographic breakdowns. One of the main limitations of census data is that it is only produced every 10 years and it is our priority to assess the potential of using MPD-derived travel-to-work flows so that more frequent and timely estimates might be made available.

So as a first project, we will seek counts of worker flows derived from MPD to compare with equivalent flows from census. It is proposed that we will use this research to help form recommendations on how to improve the modelling of worker flows.

We will also seek to collaborate with other government bodies including the Government Statistical Service (GSS), MNOs and other organisations who may wish to acquire and utilise such data.

Notes for Overview

  1. Middle Super Output Area
  2. Lower Super Output Area
  3. Research literature indicates that MPD cannot currently provide sufficient reliability at spatial scales obtained in the population census (that is, output area (OA) level). This is primarily due to difficulties in mapping cell areas to such small geographies. Another factor is that there are multiple MNOs, each having only a share of the total mobile phone market. These shares vary by area, and for small geographies such as OA, any MNO might have too few subscribers to produce reliable statistical estimates.
Back to table of contents

2. Background

The UK’s Statistics and Registration Service Act 2007 defines “official statistics” as all those statistical outputs produced by the UK Statistics Authority's executive office (the Office for National Statistics - ONS), by central government departments and agencies, by the devolved administrations in Northern Ireland, Scotland and Wales, and by other Crown bodies (over 200 bodies in total). Secondary legislation may allow for more statistical outputs to be eligible for official statistics status.

Traditionally, official statistics have been based on data collected through censuses, surveys and, more recently, the operational data collected by government in the administration of its services such as welfare, tax records, GP patient register etc.

There is a current ambition within government to increase the role of this so-called ‘admin’ data within the production of official statistics as it is believed that there will be significant cost savings by reducing the size or need for expensive surveys. Aligned with this is the desire to use more novel sources of data, typically found in the commercial sector or on the internet. The data generated from mobile phones are of high interest to producers of official statistics.

Statistics collected by the UK communications regulator Ofcom show that at the beginning of 2015 there were around 90 million mobile phone subscriptions in the UK. Ofcom further estimate that 93% of adults own a personal mobile and that ownership by children is also high, especially in the older teens. More generally, mobile phone penetration in the developed world is estimated to be 128% of the population (as some people have more than 1 mobile) but it is also high in developing countries with 90% penetration estimated1.

When mobile phones are switched on they generate digital information which is transmitted to the mobile network operator (MNO) to which the mobile is subscribed. This information includes call data records (CDRs) that detail active events such as calls or texts made or received. Another form of data is generated by passive location updates, either by a periodic transmission of location back to the MNO or by a mobile switching the cell tower it is primarily connected to, which may signify movement. Also available within the MNO itself is contractual information that may include age, gender and address information as well as any profile settings such as language. It is also possible to identify mobiles that are subscribed to foreign networks. Data on the content of messages and phone calls is not recorded by the MNO with the exception for phone tapping circumstances under strict legislation for applications such as crime detection.

As it is the norm for mobiles to be carried around by their users, there is great research interest in the potential of using MPD to inform on a variety of population-based characteristics. Some attractions of MPD are that it is potentially available in close to real time and for small areas. Another feature of MPD is that it represents the actual observation of the communication or location behaviour of millions of mobile phone users, rather than subjective accounts based on the self-reported information of a much smaller population (as in a traditional survey). These characteristics of MPD might potentially give rise to a much richer understanding of population dynamics than is currently possible with traditional data sources used within official statistics.

There are many challenges to accessing MPD for research as it is regarded as personal data and subject to a high level of data protection. The legislation governing this varies from country to country, and for the UK, there are additional restrictions borne by EU data privacy regulations although adherence to these is now uncertain given that the UK is to leave the EU.

Favourably, there are precedents in making CDR data available for research in a variety of formats ranging from simple counts within an area to individual-level CDRs. Suitable applications for research varies according to the format of the data. For example, counts of calls made in an area have been used within population density applications but cannot reveal underlying mobility patterns. The format of the data also raises varied emphasis on issues around data protection, processing capability and ethics.

Some academic studies have managed to source CDR data from individual MNOs. In recent years there have also been initiatives to allow access to research data to help developing world countries: the global telecommunications operator Orange has made millions of anonymised CDRs from Senegal and Ivory Coast available for research, under its Data for Development initiative. The charity Flowminder also sources anonymous individual-level MPD from low- and middle-income countries, which, in combination with satellite imagery and household surveys, helps to map the distributions and characteristics of vulnerable populations. Another data-releasing initiative to encourage research into the uses of MPD includes the Telecom Italia Big Data Challenges run in 2014 and 2015.

The Eurostat paper [28] conducted a review of activity across various EU statistical organisations and concluded that access to data requires trust building cooperation between all of the parties involved, to allow projects to grow from small-scale pilot projects to wider collaborations.

Analysis using CDR is generally held to be challenging due to the widely differing behaviours exhibited by users: for example, some users text or make calls many times a day whilst others do not make calls for days on end. For research applications related to mobility, there is greater interest in the use of the location data generated from passive updates.

This data is not usually shared with a third party, predominately due to the huge technical challenges in processing the data, but also in that this data holds the highest level of concern around privacy and ethics. However, there are cases of MNOs forming collaborations to explore the potential of this type of data. NTT Docomo, the largest MNO in Japan, has collaborated with academia to produce population densities using this data [6]. Within the UK, the 3 main MNOs are in various stages of using this data to develop statistical products such as population densities and transport flows as they see there is a market to sell this information. Some UK MNOs have formed collaborations with analytical companies to provide the processing capability and also sought industry expertise.

Notes for Background

  1. The world in 2014: ICT facts and figures. International Telecommunication Union.
Back to table of contents

3. Main findings of the review

Mobile Phone Data (MPD) is leading to many discoveries about human behaviour that have application within official statistics. These applications benefit from the high population coverage and tendency to carry mobiles on the person, leading to the generation of a rich source of geolocation and time information that can inform on population densities and mobility.

MPD are personal data and therefore subject to a high degree of data protection, with different countries having different legislative arrangements. However, there is a large body of research, primarily within academia, using anonymised samples of call data records (CDR) data and in some cases collaborations have been formed to provide access to passive location updates. Mobile Network Operators (MNOs) are increasingly making their data available for use by researchers within developing world applications and for epidemiological situations.

The review has considered applications within 6 main topic areas and the findings from these topics are summarised as follows.

Population estimates

As mobile phones are owned by such a large proportion of the population, counting the number of connections to a single cell tower could theoretically have a relationship to the true number of people in the cell1 associated with that cell tower. Research papers in this section investigate this relationship with official estimates of the resident population and consider the impact of contributing factors such as land use, an area’s socio-economic level and an MNO’s market share.

Using data from an individual MNO, counts of call and text volumes originating in an area for a given time are shown to be proportional to counts of mobile phone subscribers in Ma et al [5]. However, also within Ma et al [5] and Douglass et al [3], it is found that there is no constant proportional relationship between mobile phone subscribers and actual population as subscription rates vary from area to area.

However, a number of papers have been produced in recent years to investigate methods to estimate population density from call or subscriber volumes using a variety of data formats from aggregated and individual level CDR to passive update data. It is a main assumption in Deville et al [2] and Sterly et al [4] that night-time population has the most direct relationship with residential population, the population basis for most official estimates. However, Douglass et al [3] find a better relationship with call out volumes at different time periods.

All these studies require a training set of small area estimates of population density to identify the relationship with call volumes at a given date and time and the spatial scales investigated can be as small as 100 metres x 100 metres. Generally, once a reliable relationship is found, it is used to produce population estimates directly from call volumes. This may be for different times of the day or year and in different areas, although validation for these more novel population estimates is difficult due to the absence of official data for comparison. Deville et al (2014) [2] use daily counts of CDRs in such a way to downscale a known national census population to smaller areas (100 metres x 100 metres).

The main factors that affect the call volume relationship with population counts include land use and topography of an area. For this reason Deville et al [2] and Douglass et al [3] suggest that optimal results are found when merging MPD with satellite imagery. As mobile phone market penetration and usage patterns change over time, it is proposed in Douglass et al [3] that this needs to be regularly monitored so that adjustments may be made to any baseline linear relationship between official population data and call volumes.

Of particular interest are the published papers by NTT Docomo, the predominant mobile phone operator in Japan. They have conducted research projects jointly with Japanese universities, giving access to passive location measurements from individual mobiles, as well as age and gender information. Their research includes the development of population statistics, which they call mobile spatial statistics (MSS). The methods employed to develop these MSS are discussed in Terada et al [6], including the correction methods for age, gender and regional bias as well as the problems encountered if mobile phone users switch their phones off. Also discussed is the difficulty in mapping cell areas to alternative geographies. This is examined further in Oyabu et al [7] by comparing MSS to official estimates of residential population at different spatial scales. It is found that MSS are reliable comparators at around a 10 x 10 kilometre (km) grid but reliability holds for 1 x 1 km areas only in heavily populated areas where cell towers are tightly packed. Oyabu et al [7] also discuss the differences at mountainous and coastal areas with the suggestion that this might be attributed to definitional differences between the residential population and the population that is based there (for example, tourists, other visitors etc).

In conclusion, the research literature on population densities indicates that there is clear potential for the use of MPD to produce estimates for different population bases.

Urban planning (Land use)

These studies mainly consider the relative densities of CDRs, either across different areas or across time (trend) – but do not develop population estimates.

As well as having application directly within the production of land use statistics, intelligence on where and at what locations people are to be found in greater numbers may be of use within operational processes such as conducting survey fieldwork. By extension, the use of this type of information in close to real time can provide valuable intelligence for other operational uses such as the optimisation of crowd control procedures and other services such as site location for ambulances etc.

Rubio et al [9] investigate an algorithm to detect abnormally dense population areas or “hotspots” whilst Louail et al [11] find a sublinear relationship between the number of “activity centres” and the population size in urban centres. Reads et al [10] and Ratti et al [13] examine the patterns of call density across time and propose that these patterns can reveal different types of urban activity such as business, residential, retail or nightlife areas.

Whilst Odawara et al [12] also examine the different profiles of density over time in central, residential and rural areas, their research benefits from access to individual-level data that can identify the areas from which visitors to a city centre originate. This research has broader application within understanding mobility and provides a useful start to the development of various statistics on population flow such as those for commuting and other transport flows.

Mobility

Understanding how and where people move to can help to target services to areas of greatest need. For this reason mobility analysis has been used extensively in urban planning and transport applications. The studies presented in this section mainly consider the relative densities of CDRs, either across different areas or across time (trend) although several projects deal with the analysis of human mobility and MPD using anonymised individual-level data.

Researchers have found that individuals’ daily routines are highly predictable. Gonzalez et al [16], Song et al [17] and Lu [18] confirm that human mobility is highly dependent on historical behaviour. Even with data at aggregate cell-tower level to ensure anonymity, the number of calls between 2 locations and their distance appear to be a good predictor of the frequency of travel between them as demonstrated in Palchykov et al [20]. Good results are achieved at 2 levels of coarse-graining: between tower locations in a major city and between cities.

In Louail et al [11] a single “urban rhythm” is shown to be common to several cities, and in Trasarti et al [23] an analytical process aimed at extracting interconnections between different areas is proposed. In the context of traffic analysis, researchers in Jarv et al [27] use MPD to gain new insights about the composition of traffic flows, and to reveal how and to what extent suburbanites' travelling affects rush hour traffic. Researchers in Martino et al [21] define a measure to identify groups of people that behave similarly from a mobility point of view.

Research using a time series of passive mobile phone positioning data at individual level can identify locations that are repeatedly visited by mobile phone users. Ahas et al [14] and Csaji et al [19] propose that meaningful locations such as “home” and “work” can be determined in such analyses.

By aggregating individual-level data for which home and work have previously been derived, it is possible to produce home-work origin-destination flows. Such flows have been compared, at different spatial scales, with Spanish census commuting data in research by Lenormand et al [15], which was found to be reliable at “municipal” level. Results for smaller geographies are less robust, although work well if population densities are high (as low as 100 * 100 metres in dense urban centres). Similar results are found in comparisons between census data and mobile-phone-derived commuting flows in Csaji et al [19].

A number of studies present powerful mining engines that allow direct interaction with MPD as proposed in Martino et al [21], Calabrese et al [22] and Angelakis et al [25]. Furthermore, Calabrese et al [22] visualises mobility patterns in real-time.

Finally, IBM researchers in Berlingerio et al [24] demonstrate a system that integrates such a mobility analysis engine with an optimisation module and an interactive user interface. The application is tested using data from the “Data for Development Senegal” (D4D) challenge where 4 new bus routes were proposed by the optimisation module, with an expectation that there would be a reduction of 10% in city-wide travel times.

In the UK, the use of MPD for transport applications is recognised as a priority area for MNOs as there is a large commercial opportunity for statistical products on transport flows and travel-to-work type statistics. Customers for this data range from government and large public transport bodies down to transport planners within all local authorities. These customers have a strong need for this data, as the cost of procuring it offers substantial cost savings on traditional data collection via roadside surveys. These customers still have concern about the validity of the data they have purchased and seek guidance on its fitness for purpose. The Department for Transport (DfT) is the government department that has the greatest role in providing this guidance, although we also have an interest as travel to work statistics are produced through the population census.

Tourism

Mobility analysis also has an application within tourism statistics. A detailed study by Eurostat [28] assesses the feasibility of using mobile positioning data within inbound, outbound and domestic tourism statistics from several aspects including access, cost, trust, and the technological and methodological challenges. The study concludes that MPD, at present, can only supplement official tourism indicators.

Tourism indicators include: number of trips or visits; number of nights spent; number of days spent and; number of unique visitors. These indicators may be broken down further by: country of residence; aggregations by time (day, week, month); aggregations by geography; duration of trip or stay (same-day or overnight trip); main destination; secondary destination, transit pass-through; collective movement patterns and repeat visits.

Mobile phone roaming data in Estonia was used by Ahas [29, 30] to identify foreign phones leading to an understanding of tourist destinations, their seasonal variability and the dominant nationality of their visitors. Correlations with conventional accommodation statistics were very high in the most commonly visited tourist regions.

Bajardi et al [31] investigate the international country codes of every call or SMS made and received by mobile phone users in Milan, Italy, between November and December 2013, with a spatial resolution of about 200 metres. The researchers show that the observed spatial distribution of international codes well matches the distribution of international communities reported by official statistics. They also investigate robust clustering patterns that can be used to identify the touristic hotspots.

Ethnicity and community

There are a number of studies examining the formation of communities or regions that may have association with ethnicity, language or other social differences.

Some of this research has been generated by examination of the spatial networks present when considering features such as call frequency and duration made by individuals and between areas. Blondel et al [33] in such analysis show that Belgium can be split into 17 self-contained areas comprising of adjacent municipalities. They extend their analysis to detect north and south regions, between which there is minimal communication. These regions are shown to resemble the predominately French and Flemish speaking populations respectively.

Similar network studies to identify communities and regions through telecommunications data are found in Expert et al [34], Blumenstock [36] and Ratti [35].

An approach to assigning ethnicity is to use the language setting on the mobile phone as a proxy indicator, as in research in Estonia by Silm et al [32] and in a south Asian country by Blumenstock et al [36]. Both these studies declare a high correlation is found between the language setting and underlying ethnicity. The studies then go on to examine the mobility behaviour of ethnic minority groups and find differences in segregation to the majority ethnic group and also to census data that is based on residential population rather than day-time population.

If shown to be capable of being replicated in the UK, these research approaches might be used in similar official statistics applications such as in the identification of some ethnic minority populations and how integrated these populations are with the population as a whole. However, inferring measures of identity such as nationality, religion and ethnicity is highly contentious and raises strong ethical issues.

Socio-economic status and economic levels

The papers in this section focus on using CDRs to identify the socio-economic status and economic levels in a population. Although unlikely to produce estimates of the standard required for official statistics, they might be useful as co-variate information for use in small-area modelling within official statistics of income. Most of the papers use data from developing countries where reliable census or survey data is sparse.

Soto et al [37] compare MPD-derived variables to socio-economic levels produced by that country’s National Statistical Institute. These variables included the number of calls made and received, the average distance covered whilst making calls and the distance calls are made to or received from. This research proposes that such variables can indicate an area’s socio-economic level with good accuracy.

Similar research by Mao et al [39] lacks reliable data with which to compare socio-economic levels so instead relies on informal knowledge of the country to inform its models. Mao et al [39] makes the point that MPDcould provide near real-time and cheaper information about a country’s development in the absence of reliable information from expensive surveys.

Eagle et al [38] and Smith-Clarke et al [40] use mobile phone network analysis to make the observation that more insular communities are likely to be poorer, a relationship that appears to hold in the developed country of England and the developing country of the Ivory Coast. Mao et al [39] find that richer areas are more likely to communicate with each other often.

Infectious diseases, monitoring and prevention

All authors in this section use MPD to track patterns of movement across countries to model disease spread. Although less relevant to official statistics, some of these studies compare mobile phone derived mobility patterns with standard approaches using official data to highlight the strengths and limitations between them.

Wesolowski et al [41] and [44] note that using MPD could provide fine-grained details of human movement, particularly in developing countries such as Kenya where the traditional method has been to rely on average travel times between towns and cross-sectional surveys. Wesolowski et al [44] use the same data to examine seasonal travel patterns to understand the spread of diseases with a seasonal flux, such as rubella.

Tizzoni et al [42] use MPD in the same way as Wesolowski et al [41] and [44] in developed countries in Europe instead. They compare the accuracy of MPD with census data in modelling disease spread and find that MPD are likely to be more accurate in rural settings, but are likely to overestimate the speed of disease spread in cities.

Lu et al [43] examined population movements after the 2010 Haiti earthquake and found that the movements of individuals were surprisingly predictable, which could help target aid after such a catastrophic event.

Understanding how the population moves in emergency situations does not have a direct application to current official statistics, although insights into these patterns of mobility may improve guidance on how to use official data on population mobility when planning a response to these situations.

Notes for Main findings of the review

  1. The cell area around a cell tower varies greatly depending on the expected call volumes to be had at that cell tower’s location. With an average range of around 500 metres, the range can vary from around 100 metres in densely populated urban areas to a range of several kilometres in rural areas.
Back to table of contents

4. Conclusions

Mobile Phone Data (MPD) are subject to a high degree of data protection and different countries have different legislative access arrangements. However, there is a large body of research on the use of MPD primarily using anonymised samples of CDR data and in some cases collaborations have formed to provide access to passive location updates. MNOs are increasingly making their data available for use within applications in the developing world and for epidemiological situations.

The strengths of MPD include its passive and timely collection and, in the UK, its almost complete coverage of the adult population. This high coverage might lead to estimates for sub-national geographies. MPD variables denoting mobility, such as geolocation and timestamps are conceptually more accurate than might be returned in a survey and, more generally, research applications are varied. On the other hand, the main weaknesses of MPD include the complexity and uncertainty of future data access, bias related to over- and under-coverage issues, difficulty in assessing the quality and accuracy of obtained statistics, and a lack of information on what an observed behaviour truly represents.

Social network analysis has shown that gender and other socio-economic features might be derived for individuals and for an area. Using the geolocation data, there are studies to produce population estimates and to detect call density “hotspots” that can contribute to urban planning and the efficient targeting of services. Geographical partitioning is also of research interest in understanding the connectivity between different areas. Changes in call density in an area over time could reveal land use features such as business or residential areas or the activities occurring there.

Another large body of research relates to mobility, where it is shown in various papers that patterns of mobility are quite stable over time with most people following the same trajectories irrespective of the distance they tend to cover. This has application to the detection of home or workplace and commuting patterns through the observation of repeated journeys. The mobility patterns of different groups could be used to optimise transport facilities and also detect segregation. Deviations from normal mobility behaviour could theoretically inform on visitor or tourism activity.

In conclusion to this research it is observed that, without legislative access nor funding to purchase this data, partnering with MNOs is the only feasible way forward. It is recommended that the focus for research should necessarily be in areas that are of high interest to both official statistics and to the MNOs. Eurostat research [28] also suggests that partnering opportunities should start with initial small projects, to build up trust that may lead to bigger collaborations.

MNOs in the UK are already using their vast data holdings to develop statistical products for commercial use and have developed capacity in producing population densities by time of day and transport flows. However, there are currently no standard processes for validating these outputs with MNOs and their customers performing their own validation using comparisons with any data they might have access to.

The Office for National Statistics (ONS) is the UK’s national statistical institute and is independent of government. We have a role in providing objective comment on the quality or fitness for use of data and as such, might best demonstrate value to any collaboration with the MNOs by assessing the quality of the statistical products derived using MPD and to help improve these outputs if possible.

The topic of most relevance for us is population densities as it is a statutory requirement for ONS to produce this information and was the first application in production within MNOs. The academic research shows that population density estimates with MPD might be modelled with an aggregated level of MPD and therefore be less complex to process and raise fewer ethical issues. Population densities also underpin the more complex derivation of transport flows that are possibly of even higher importance within MNOs due to the large commercial opportunities in this type of data. This is also relevant for us as the population census produces origin-destination commuting flows that could be replaced with MPD data. However, transport flows clearly have greater application within official transport statistics as produced within the Department for Transport.

It should also be observed that MNOs develop their methods during the implementation of each project contracted, and paid for, by their customers. Therefore it is argued that, for any collaboration, these customers also need to be approached and persuaded to allow their data to be made available for assessment. With careful negotiation here, it is hoped that the differing concerns held with different stakeholders might be addressed with a win-win situation for all being the assurance of improved MPD-derived data leading to a greater adoption of it for varied applications.

Back to table of contents

5. Next steps

We are taking forward research into the potential of using mobile phone data (MPD) within official statistics with an initial focus on obtaining access to MPD-derived commuting flows. We want to understand the quality and methodological issues and to develop recommendations on the use of these data for statistical and research purposes. This is likely to require collaboration with transport bodies that have already purchased modelled MPD. Only aggregated and non-disclosive data will be sought.

We will also seek to influence the cross-government approach to using MPD for statistical purposes. This will involve extensive engagement and encouragement of collaboration with government bodies, including the Government Statistical Service (GSS), mobile network operators (MNOs) and other organisations who may wish to acquire such data.

Back to table of contents

6. Appendix (references)

  1. Blondel V, Decuyper A and Krings G ‘A survey of results on mobile phone datasets analysis’ 2015. EPJ Data Science (2015) 4:10 DOI 10.1140/epjds/s13688-015-0046-0

Population estimates

2. Deville P, Linard C, Martin S, Gilbert M, Stevens FR, Gaughan AE, Blondel D and Tatem AJ (2014) ‘Dynamic population mapping using mobile phone data

3. Douglass RW, Meyer DA, Ram M, Rideout D and Song D (2015) ‘High resolution population estimates from telecommunications data

4. Sterly H, Hennig B and Dongo K (2013) ‘Calling Abidjan’ – Improving Population Estimations with Mobile Communication Data

5. Ma X and Wu L (2012) ‘Towards Estimating Urban Population Distributions from Mobile Call Data

6. Terada M, Nagata T and Kobayashi M (2013) ‘Population Estimation Technology for Mobile Spatial Statistics’ NTT DOCOMO Technical Journal

7. Oyabu Y, Terada M, Yamaguchi T, Iwasawa S, Hagiwara J and Koizumi D (2013) ‘Evaluating Reliability of Mobile Spatial Statistics’ NTT DOCOMO Technical Journal

8. Ricciato F, Widhalm P, Craglia M and Pantisano F (2015) Estimating population density distribution from network-based mobile phone data

Urban planning (land use)

9. Rubio A, Sanchez A and Frias-Martinez E (2013) ‘Adaptive non-parametric identification of dense areas using cell phone records for urban analysis

10. Reades J, Calabrese F, Sevtsuk A and Ratti C (2007) ‘Cellular Census: Explorations in Urban Data Collection

11. Louail T, Lenormand M, Cantu Ros OG, Picornell M, Herranz R, Frias-Martinez E, Ramasco JJ and Barthelemy M (2014) ‘From mobile phone data to the spatial structure of cities

12. Odawara T and Kawakami H (2013) ’Using Mobile Spatial Statistics in Field of Urban Planning’ NTT DOCOMO Research Paper

13. Ratti C, Pulselli RM, Williams S, Frenchman D (2006) ’Mobile Landscapes: Using Location Data from Cell Phones for Urban Analysis’ Environ Plann B Plann Des 33(5):727–748

Mobility

14. Ahas R, Silm S, Järv O, Saluveer E and Tiru M (2010) ‘Using Mobile Positioning Data to Model Locations Meaningful to Users of Mobile Phones’ Journal of Urban Technology 17(1): 3–27

15. Lenormand M, Picomell M, Cantú-Ros OG, Tugores A, Louail T, Herranz R, Barthelemy M, Frías-Martínez E, Ramasco JJ (2014) ‘Cross-checking different sources of mobility information’ PLoS One 9(8):e105184

16. Gonzalez MC, Hidalgo CA, Barabasi A-L (2008) ‘Understanding individual human mobility patterns’ Nature 453(7196):779–782

17. Song C, Qu Z, Blumm N, Barabasi A-L (2010) ‘Limits of predictability in human mobility’ Science 327(5968):1018–1021

18. Lu X, Wetter E, Bharti N, Tatem AJ, Bengtsson L (2013) ‘Approaching the limit of predictability in human mobility’ Sci Rep 3:2923

19. Csaji BC, Browet A, Traag VA, Delvenne JC (2013) ‘Exploring the mobility of mobile phone users’ Physica 392(6):1459–1473

20. Palchykov V, Mitrović M, Jo HH, Saramäki J, Pana RK (2014) ‘Inferring human mobility using communication patterns’ Sci Rep 4:6174

21. Martino M, Calabrese F, Di Lorenzo G, Andris C, Liu L, Ratti C (2009) ‘Ocean of information: fusing aggregate & individual dynamics for metropolitan analysis’ IUI - International Conference on Intelligent User Interfaces

22. Calabrese F, Colonna M, Lovisolo P, Parata D, Ratti C (2011) ‘Real-time urban monitoring using cell phones: A case study in Rome’ IEEE Transactions on Intelligent Transportation Systems - TITS 12(1):141–151

23. Trasarti R, Olteanu-Raimond AM, Nanni M, Couronné T, Furletti B Giannotti F, Smoreda Z, Ziemlicki C (2014) ‘Discovering urban and country dynamics from mobile phone data with spatial correlation patterns’ Telecommunications Policy 39(3-4): 347–362

24. Berlingerio M, Calabrese F, Di Lorenzo G, Nair R, Pinelli F, Sbodio ML (2013) ‘A system for exploring urban mobility and optimizing public transport using cellphone data’ European Conference, ECML PKDD

25. Angelakis V, Gundlegård D, Rydergren C, Rajna B, Vrotsou K, Carlsson R, Forgeat J, Hu TH, Liu EL, Moritz S, Zhao S, Zheng Y (2013) ‘Mobility modeling for transport efficiency - analysis of travel characteristics based on mobile phone data’ Netmob 2013 Third International Conference on the Analysis of Mobile Phone Datasets

26. Wang D, Pedreschi D, Song C, Giannotti F, Barabási AL (2011) ‘Human mobility, social ties, and link prediction’ KDD

27. Jarv O, Ahas R, Saluveer E, Derudder B, Witlox F (2012) ‘Mobile phones in a traffic flow: A geographical perspective to evening rush hour traffic analysis using call detail records’ PLoS One 7(11):e49171 Tourism

28. Eurostat (2014) ‘Feasibility study of the use of mobile positioning data for tourism statistics’ Consolidated Report Eurostat Contract No 30501.2012.001–2012.452

29. Ahas R, Aasa A, Mark Ü , Pae T and Kull A (2007) ‘Seasonal tourism spaces in Estonia: Case study with mobile positioning data’ Tour Manage 28(3):898–910

30. Ahas R, Aasa A, Roose A, Mark Ü and Silm S (2008) ‘Evaluating passive mobile positioning data for tourism surveys: An Estonian case study’ Tour Manage 29(3):469–486

31. Bajardi P, Delfino M, Panisson A, Petri G and Tizzoni M (2015) ‘Unveiling patterns of international communities in a global city using mobile phone data

Ethnicity and community partitioning

32. Silm S and Ahas R (2014) ‘The temporal variation of ethnic segregation in a city: evidence from a mobile phone use dataset’ Doi:10:1016/j.ssresearch.2014.03.011

33. Blondel V,Krings G and Thomas I (2010) ‘Regions and borders of mobile telephony in Belgium and in the Brussels metropolitan zone’ Brussel studies 42 ISSN 2031-0293

34. Expert P, Evans T, Blondel V and Lambiotte R (2011) ‘Uncovering space-independent communities in spatial networks’ PNAS 103(19) 7663–7668

35. Ratti C, Sobolevsky S, Calabrese F, Andris C, Reades J and Martino M (2010) ‘Redrawing the map of Great Britain from a network of human interaction’ PLoS ONE 5(12): e14248. doi:10.1371/journal.pone.0014248

36. Blumenstock J and Fratamico L (2013) ‘Social and spatial ethnic segregation: a framework for analyzing segregation with large-scale spatial network data

Socio-economic status and economic levels

37. Soto V, Frias-Martinez V, Virseda J and Frias-Martinez E (2011) ‘Prediction of Socioeconomic Levels using Cell Phone Records’ DOI: 10.1007/978-3-642-22362-4_35 • Source: DBLP

38. Eagle N, Macy M and Claxton R (2010) ‘Network diversity and economic development’ Science 328, 1029 (2010)

39. Mao H, Shuai X, Ahn Y and Bollen J ‘Mobile communications reveal the regional economy in Cote d'Ivoire

40. Smith-Clarke C, Mashhadi A and Capra L ‘Poverty on the cheap: Estimating poverty maps using aggregated mobile communication networks

Infectious diseases monitoring and prevention

41. Wesolowski A, Eagle N, Tatem AJ, Smith DL, Noor AM, Snow RW and Buckee CO (2012) ‘Quantifying the impact of human mobility on malaria’ Science 338(6104):267–70 • October 2012

42. Tizzoni M, Bajardi P, Decuyper A , King GKK, Schneider CM, Blondel V, Smoreda Z, González MC and Colizza V ‘On the use of human mobility proxies for modelling epidemics

43. Lu X, Bengtsson L and Holme P (2012) ‘Predictability of population displacement after the 2010 Haiti earthquake’ PNAS vol. 109 no. 29

44. Wesolowski A, Metcalf CJE, Eagle N, Kombich J, Grenfell BT, Bjørnstad ON, Lessler J, Tatem AJ, and Buckee CO (2015) ‘Quantifying seasonal population fluxes driving rubella transmission dynamics using mobile phone data’ PNAS vol. 112 no. 35

Back to table of contents