Graduationreport_JKeij.

Graduationreport_JKeij.
Jasper Keij
March 2014
Smart phone counting
Location-Based Applications using
mobile phone location data
Smart phone counting
Location-Based Applications using mobile phone location data
Graduation report
Jasper Keij
28 March, 2014
Student number: 1504436
Mail: jasper@mezuro.com/jasperkeij@hotmail.com
Graduation work, Master Civil Engineering.
Track Transport & Planning, TU Delft
Graduation committee:
Prof dr. ir. Serge Hoogendoorn
Chairman
Dr. ir. Winnie Daamen
Daily supervisor TU Delft
Ron Beute
Daily supervisor Mezuro
Dr. ir. Ben Gorte
External member
Ir. Paul Wiggenraad
Graduation coordinator
TU Delft, Civil Engineering,
Department of Transport & Planning
TU Delft, Civil Engineering,
Department of Transport & Planning
Mezuro B.V., Weesp
TU Delft, Civil Engineering,
Department Remote Sensing & Geoscience
TU Delft, Civil Engineering,
Department of Transport & Planning
PDF edition
Sources picture frontpage: nieuwsblad.be
A request for the appendices and/or a high-resolution version of the report can be sent to the author.
Preface
This thesis is presented in partial fulfilment of the requirements for the degree of MSc. in Civil Engineering,
track Transport & Planning, and has been completed at the Delft University of Technology and at the
company Mezuro. This report covers the development of the application ’Travel time information’ with
mobile phone location data.
First, I would thank my graduation committee. Foremost I would thank Winnie Daamen for her assistance and detailed feedback. Also, I would like to thank the rest of the committee, Serge Hoogendoorn,
Paul Wiggenraad, Ben Gorte and Ron Beute, for their guidance and comments.
The opportunity to work at Mezuro and working with the dataset of Mezuro was, is and will be a
great pleasure. I would extend my gratitude to all at Mezuro for their assistance and hospitality. At
the university, special thanks to the room-mates of the Afstudeerhok for the great conversations about
big data and the motivation in the last phase of the graduation. At last I would thank my family. The
feedback on the report and the different views on the study were very helpful.
Mobile phone location data give many new brainteasers which have led to this study. As Reades et al.
[2007] stated:
"Without encouragement, far more detailed data sets held by the networks will never see the light of day"
Jasper Keij
Delft, March 2014
iv
Summary
Data from mobile phones form a data source that provides a possibility to discover people movements
for transportation research. Mobile phone location data are collected at the network-side of the mobile
phone system. The main components collecting mobile phone location data at the network-side are the
device itself and cells. Cells are the transmitters and receivers of the signals which connect the mobile
phone network with the device. The connection is established through servers in the mobile phone
network where also for example the billing is handled. The available dataset for this research, consisting
of more than five million users and 25 thousands cells, is from one of the large telecom operators in the
Netherlands and analysed by Mezuro, a Dutch company working in the field of measuring mobility.
The aim of this research is to test the usability of mobile phone location data for Location-Based Applications. This thesis is divided into two parts, one for a research to the usability of mobile phone data
in Location-Based Applications and the second developing one of the applications. In the first part a
classification is proposed to test whether mobile phone location data are useful for different applications.
In the second part a framework is defined to develop the application ’travel time information’ and the
first part of the framework is developed using mobile phone location data. The aim of the application is
to inform travellers of the estimated travel time.
The type of mobile phone location data is event-driven data, created when a user makes or receives a
call, sends or receives an SMS and when using data. Network-driven data are created with a periodic
interval, for example when the user moves to other cells. The network-driven technique generates more
intermittent data while the event-driven technique generates data when the user is using the network. A
variation of localization technologies exists in order to locate cell phones in the network. Localization
technologies are being used when for example tracing down a person after approval by a judge. All
localization technologies result in different accuracies and some need extra equipment in the network. The
position can be determined using triangulation but can also be determined with the cells’ identification
number. Thereby only the location of the cell site is known.
A lot of studies using mobile phone location data have been found, describing the use of mobile phone
data in a large variety of applications and many different mobile phone categories. Literature shows
mixed results when creating the Location-Based Applications using mobile phone data. When creating
a comparison between the mobile phone location data of Mezuro and results from literature cannot be
equally compared due to the various types of mobile phone data and location techniques and therefore,
the accuracies and actualities differ. Because in literature no uniform classification has been found, a
classification is developed to test the usability of mobile phone location data. With this classification
different datasets can be compared. The characteristics of the classification are the location accuracy and
the time update frequency.
The dataset of Mezuro, available for this research, is an extract from Call Detail Records (CDR). Call
Detail Records are event-driven data and during the extraction, the telephone number is anonymized
by creating a one-way Hashed ID. Each day approximately 115 million events are generated. The advantage of the available data is that the Hashed ID is only changed after thirty days and privacy is
v
Summary
Location-Based Applications using mobile phone location data
guaranteed by oblige grouping of events, with the dataset of Mezuro the system is only capable of
extracting data that results in at least 15 observations. However the system is able to calculate individual
computations before the results are grouped. The dataset contains only the location of the cell site.
From the results of the literature and a basic analysis on the data of Mezuro, the dataset of Mezuro
was classified. All applications have to be grouped, to respect the privacy. From the first analysis it
turns out that mobile phone data is suitable to use for grouped-based applications on condition that the
application needs a required localization accuracy of 1000 meters or more and one update per hour or more.
From the classification, the continuation of the research is defined, resulting in a research developing an application which is on the edge of what is possible to develop with mobile phone location data of
Mezuro according to the classification. Therefore the application ’travel time information’ is developed.
This application informs travellers about long-distances estimated travel times. A framework is created
to define the steps to build the application with mobile phone location data and the first steps of the
framework are developed with the dataset of Mezuro. In a case-study mobile phone data were generated
by sending text messages and making calls. The corridor Amsterdam - Eindhoven has been chosen as
test route and two common long distance modes, train and car, were used. Every 30 seconds a SMS was
sent and every four minutes a call was made. In total more than 4000 events were generated. Besides
the events, GPS devices recorded the location every second. After the data collection, the mobile phone
events are connected with the GPS points and the properties of the network. With the GPS points a
comparison can be made with the positioning of users with mobile phone data. Also a statement can be
made if the mode can be detected with the dataset of Mezuro.
Analyses were done in order to estimate if the theoretical network characteristics meet the practical
network characteristics. With these analyses the result of the analysis done in Part I, can be updated.
Also the spatial elements, which are used in the development of the application, are tested. One of the
researched elements is the distance from the GPS position to the cell site, the location of the transmitter.
This distance is related to network characteristics. The average distance from the GPS positions to the
sites is approximately 930 meters. In the train the distance is smaller than in the car. Furthermore the
distance is in most cases smaller than half of the radius of the cell. The distance to the site in urban
and rural areas differs significantly, which can be explained by the smaller radii of the cells which are
also larger outside the urban areas. The functioning of the mobile phone network and its properties is
not completely clear, resulting in GPS points outside the theoretical cell area and an unknown smaller
practical cell area. This influences the positioning of the users with less accuracy and therefore the
application will give a higher error in the estimated travel time.
A Voronoi diagram and the Best Serving Cell Map (BSCM) are used for the development of the application, both show a decrease in the positioning error calculated with the GPS positions. Because both
elements are often used in literature and both were available, the elements are used. A Voronoi diagram is
created using the fact that a point in a Voronoi area is always closer to the cell site than other sites nearby.
Both spatial elements show a decrease in the average distance, the Voronoi diagram more than the BSCM.
Another technique which is used, is mapping the possible users location onto the road and rail track,
using the spatial elements site, Voronoi diagram and BSCM. Then, for both the Voronoi diagram as well
as the BSCM, the average distance from the GPS position to the mapping point for each event decreases
to 500 meters, compared with the distance of the mapping point created with the spatial element cell site.
Both spatial elements are used to determine the mode and map the positions.
To start to develop the application ’travel time information’, first an algorithm is created to detect
the mode for one event of a user, using the mentioned spatial elements. The mode has to be detected to
know whether a traveller uses the train or car and with that knowledge the travel time of a user can be
determined. The algorithm results in a 90% correctness of assigning the mode. When combining cells
and therefore multiple estimations of modes, the correctness increases. A Monte-Carlo simulation is
created to test the combinations. The correctness increases to 97% at three events and after three events
the accuracy of the right mode increases to 100 %, achieved at six events. When computing the travel
vi
Summary
Location-Based Applications using mobile phone location data
time on the route in the end, the mapping algorithm creates mapping points on the track for the mode
computed by the mode algorithm. The mapping algorithm utilizes a combination of the spatial elements.
The difference in speed is calculated between the GPS point and mapping point to measure the result.
Every combination of events of one trip from the data collection is tested. A median speed difference of
1.5 km/h is found. The 95 % cumulative frequency of the empirical data gives a speed difference of 16.5
km/h. The average deviation of the speed is higher in case of a higher speed.
In the dataset of Mezuro with all users, both algorithms have been executed. When a user travels
more than 20 kilometres within a certain time limit on the route Amsterdam-Eindhoven, the events of
the user are used. The algorithm estimating the mode for multiple events, detects more train users than
car users. The call and data events are on average higher in the car while SMS events are popular in the
train. On average the travelled distance on the route is 40 kilometres. The mapping algorithm results
in an average speed of 73 km/h. The peaks in the morning and evening for car users can be seen with a
lower average speed. The mapping algorithm has an average travelled distance of 34 kilometres. For both
algorithms the number of users with the detected mode train is higher than the number of users with the
detected mode car. This could be caused by the fact that train passengers are more active with the mobile
phone compared to motorists.
To conclude, the application needs to be completed with the proposed framework to reach a final
conclusion of the usability of the dataset to construct the application. The main contributions of this
study are the development of the classification, the proposed framework to develop the application and
two algorithms to create the first part of the application. The classification of the dataset of Mezuro turns
out to be correct from the analyses of Part II. In the future, the classification for determining the usability
for Location-Based applications should be used for other datasets. Then, the dataset of Mezuro can be
compared. The proposed framework demonstrates the steps to inform travellers about the estimated
travel time. As recommendation, the framework can be extended to inform about the predicted travel time.
The two algorithms, detecting the mode and mapping the position, are the first steps of the framework
and they can be applied in the application due to the good validation results. The algorithms are also
suitable to use for other applications, for example the application ’transportation planning’. If the dataset
of Mezuro is definitively usable for developing the application, cannot be say. The complete application
should be built to give the answer to that question, but because predicted travel times cannot be computed,
other applications could be constructed better using, if it is possible, the developed algorithms. This report
shows that mobile phone location data give many opportunities to develop Location-Based Applications.
Also, new developments are increasing the accuracy and available events, leading to more possibilities in
creating applications using mobile phone data.
vii
Table of Contents
Preface
iv
Summary
v
List of abbreviations
x
1 Introduction
1.1 Research problem statement, questions and approach . . . . . . . . . . . . . . . . . . . .
1.2 Structure of the report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2
3
Part I, mobile phone location data and the usability in Location-Based Applications
5
2 Localization of mobile phones
2.1 Cells . . . . . . . . . . . . .
2.2 Localization technologies . .
2.3 Mobile phone location data .
2.4 Advanced location methods
2.5 Conclusion . . . . . . . . . .
.
.
.
.
.
6
6
10
16
19
22
3 Usability mobile phone location data
3.1 Literature review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Mobile phone location data for applications usability . . . . . . . . . . . . . . . . . . . .
3.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
23
29
30
4 Mobile phone location data of Mezuro
4.1 Characteristics of the data of Mezuro . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Mobile phone location data of Mezuro usability . . . . . . . . . . . . . . . . . . . . . . .
4.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31
31
34
35
5 Conclusions and recommendations Part I
5.1 Answer main research question . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Recommendations for further research . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
37
37
38
Part II, estimation of travel times using the dataset of Mezuro
40
6 Setup of the application
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Travel time types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
41
41
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
viii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Table of Contents
6.3
6.4
Location-Based Applications using mobile phone location data
Applications’ framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7 Experimental setup
7.1 Research approach . . .
7.2 Case-study . . . . . . . .
7.3 Elaboration case-study .
7.4 Data sources description
7.5 Conclusion . . . . . . . .
42
45
.
.
.
.
.
46
46
46
48
52
55
8 Data analyses
8.1 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2 General results of mobile phone data of Mezuro . . . . . . . . . . . . . . . . . . . . . . .
8.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
56
59
75
9 Development application
9.1 Mode algorithm - event-level
9.2 Mode algorithm - trip level .
9.3 Mapping algorithm . . . . .
9.4 Conclusion . . . . . . . . . .
.
.
.
.
77
78
81
83
87
10 Results implementation application
between Amsterdam and Eindhoven
10.1 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
89
89
90
93
11 Conclusions and recommendations Part II
11.1 Answer main research question . . . . .
11.2 Findings . . . . . . . . . . . . . . . . . .
11.3 Conclusions . . . . . . . . . . . . . . . .
11.4 Recommendations for further research .
94
94
94
95
96
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Conclusions
98
Main findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Overall conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Bibliography
98
99
100
ix
List of abbreviations
LoS = Line of Sight
LTE = Long Term Evolution
MCC = Mobile Country Code
MNC = Mobile Network Code
MS = Mobile Station
MSC = Mobile Switching Centre
OSM = Open Street Map
PDN = Public Data Network
PSD = Packet Switched Domain
PSTN = Public Switched Telephone Network
RNS = Radio Network Subsystem
RSS = Received Signal Strength
RTT = Round Trip Time
SGSN = Serving GPRS Support Node
SIM = Subscriber Identity Module
SMS = Short Message Service
SRNC = Serving Radio Net Controller
TA = Timing Advance
TDoA = Time Difference of Arrival
ToA = Time of Arrival
TSS = Transmitted Signal Strength
UL-ToA = UpLink Time of Arrival
UMTS = Universal Mobile Telecommunications System
1G/2G/3G/4G = first/second/third/fourth Generation of mobile phone mobile communication technology standards
AoA = Angle of Arrival
BSC = Base Station Controller
BSCM = Best Serving Cell Map
BSS = Base Station Subsystem
BTS= Base Transceiver station
CDMA = Code Division Multiple Access
CDR = Call Detail Record
Cell ID = Cell Identification
CGI = Cell Global Identity
CN = Core Network
CSD = Circuit Switched Domain
GGSN = Gateway GPRS Support Node
GIS = Geographic Information System
GMSC = Gateway Mobile Switching Centre
GPS = Global Positioning System
GSM = Global System for Mobile Communications
HLR = Home Location Register
LAC = Location Area Code
LAI = Location Area Identity
LAU = Location Area Update
LBA = Location-Based Applications
LMU = Location Measurement Unit
LoB = Line of Bearing
x
1
Introduction
The mobile phone is rapidly replacing not only the fixed telephone, but nowadays also the computer.
Because the usage and development of the mobile phone have increased rapidly, data from mobile phones
might be valuable for the transportation sector, for example reconstructing individuals’ movements. This
location-based data can be used to reproduce activity patterns and to perform mobility research. For traffic
management purposes mobile phones function as in-car devices which can collect for example travel
time information. However, mobile phone data have to be handled carefully because privacy needs to
be guaranteed and in some cases scepticism exists whether the data are anonymous [Zang and Bolot, 2011].
Today 98% of the population in the Netherlands has a mobile phone and 92% of the population in
the world has a mobile phone1 . 67% of the population has a smartphone in the Netherlands2 . A smartphone has the ability to use apps and it is easier to use the internet. Therefore people use the smartphone
for mailing, searching for travel information or looking at social media.
Mobile phone location data are passive location data and are based on automatically recorded data
from the mobile phone telecommunications network. Passive location data have the advantage that a
large population is available. Active localization data are recorded data collected by the user himself.
Active location data are for example GPS traces collected by a mobile device or navigation system. A
travel survey is also active location data. Mobile-based localization is active localization while passive
localization is network-based. In most circumstances the active data are more accurate than the passive
data. Network-based location is described and researched in this research. For this research, a same
distinction can be made for applications using location data. Location-Based Services are based on active
location data while Location-Based Applications are based on passive location data. A Location-Based
Service is for example the mobile phone application Waze, using the GPS position on the telephone and
giving travel time and route information to users. Note that in this case the definition of Services and
Applications are the same, namely location based products.
This research will examine applications based on data collected using mobile phones, specifically using
data from an extraction from billing data. When someone makes a call, the cell phone will connect to a
GSM or UMTS cell. A cell is most of the times attached to a cell tower or placed on buildings, called the
cell site, and gives signal reception in the cell area. A cell could have a range of up to 35 kilometres and in
the Netherlands the cell areas have overlaps with many other cell areas. In this research, data are used
from Call Detail Records (CDRs) from a large telecom provider. Every time a check has to be made to
verify the credit of the user is sufficient, an event is created in the CDR. The system checks when a call is
received or dialled, when a text message is sent or received and when a certain, yet unknown, amount of
data is used. The data used in this research will be named data of Mezuro. Mezuro is the company where
this research is performed and Mezuro has the ability to analyse the data.
1 http://www.marketingfacts.nl/berichten/mobile-first-over-second-screen.
Accessed 5 August, 2013
2 http://tweakers.net/nieuws/93167/gfk-meer-smartphonegebruikers-dan-pc-bezitters-in-nederland.html.
2014
1
Accessed 11 February,
Introduction
Location-Based Applications using mobile phone location data
Potential applications for the use of the given data will be identified. Mobility management applications have been researched with mobile phone datasets (Lin et al. [2001], Fiadino et al. [2012] and Iovan
et al. [2013]). Another type of applications is traffic management. Much research is done to transform
mobile phone data into a basis for traffic management and control applications (Astarita and Florian
[2001], White et al. [2004] and Steenbruggen et al. [2011]).
Mobile phone data are ’Big Data’ [Laurila et al., 2012]. Many definitions of the term Big Data exist,
Zikopoulos and Eaton [2011] use the definition that "Big Data applies to information that cannot be
processed or analysed using traditional processes or tools". Another point described by Zikopoulos and
Eaton [2011] is that organisations have access to a lot of information but do not know what to achieve
with the data or how to process it. This holds also for mobile phone providers. An example is the challenge
issued by Orange, a France Telecom provider [Blondel et al., 2012]. The challenge results in many different
applications using mobile phone data, researched by scientists and companies. With this knowledge of
the field of applications, providers could use the data for the different researched applications.
1.1
Research problem statement, questions and approach
In this section first the problem of this research is given. Then the research questions are defined for the
two parts of the report. At last, the research approach is explained.
1.1.1
Problem statement
Mobile phone data of Mezuro have cell locations of anonymized mobile phone users. The mobile phone
data contain only the identification number of a cell as location information. Because a cell could have
a range up to 35 kilometres, the position of the mobile phone is not very accurately determined from
mobile phone information. Also the amount of events per user is low. If and to what extent the mobile
phone data add extra value for Location-Based Applications, is unknown. Up to now the real value of
mobile phone data of Mezuro is not defined.
1.1.2
Research questions
The research is split into two parts. The first part is a research on the usability of the dataset of Mezuro
for different applications. In the second part one of the applications is chosen to be developed using
mobile phone data.
The main research question for Part I is:
To what extent are mobile phone location data of Mezuro useful for Location-Based Applications?
The main research question is supported by the sub questions for Part I:
- Which types of localization technologies and data types exist and to which type do the data of Mezuro
belong?
- Which methods exist to increase the accuracy of the derived location of the mobile phone?
- Is it possible to classify the usability of mobile phone location data for applications?
- Which Location-Based Applications could be developed with the present available mobile phone data?
The main research question for Part II is:
To what extent the mobile phone dataset of Mezuro is able to create the application ’Travel time information’?
The sub questions of Part II support the main research question:
- How can the application be created using the dataset of Mezuro?
- Do the practical network characteristics meet the theoretical, given, network characteristics?
- How do the network characteristics affect the distance between the GPS position to the cell site?
- Which advanced location technique has the smallest location error when it is used to map the location of
2
Introduction
Location-Based Applications using mobile phone location data
the user on a road or train track?
- How accurate the mode can be detected for one event or for the amount of events for the average frequency
of calling, texting and data use?
- What is the accuracy of the determined mapping positions, when mapping the positions on to roads or rail
tracks?
1.1.3
Research approach
The first part describes an exploratory view of the usability of mobile phone location data for applications.
A literature study clarifies mobile phone location data and techniques to improve these locations. The
trends in usability of mobile phone data are identified. Classes of applications are created to assign the
usability of mobile phone data. With this classification the usability of the specific dataset of Mezuro is
tested.
Based on the findings from the literature study and the test on the usability of the dataset of Mezuro,
an application is developed. A framework is defined to develop the chosen application. Most of the
algorithms have been found in literature and the algorithms are joined in this framework. A part of
this framework is created with mobile phone location data. A case-study will be used as tool for this
preliminary, exploratory stage of the development of the application.
1.2
Structure of the report
The structure of the report is shown in figure 1.1. In the first part an overview is given about mobile phone
data and its usability for Location-Based applications. In Chapter 2 first a classification of mobile phone
data is given. Then the architecture of the mobile network is described. Localization technologies and
advanced location methods found in literature are introduced. Next, in Chapter 3 the usability of mobile
phone location data is described. It includes a literature review describing the used data and methods
described in literature and the corresponding results. Classes are created to test the usability of mobile
phone data for different requirements for the applications. Chapter 4 explains the mobile phone data of
Mezuro and will test the usability of the mobile phone data of Mezuro for each class. The last chapter,
Chapter 5, gives conclusions and recommendations for Part I. In the second part travel time is examined
with the mobile phone data.
In the second part the data of Mezuro are used to test whether the data are from a sufficient quality to use for one of applications described in Part I. First, in Chapter 6, the setup of the application is
described and a framework is constructed to develop the application using mobile phone location data
of Mezuro. Then, the experimental setup is given, the data collection and case-study are explained in
Chapter 7. Because a lot of data needs to be combined, the data sources are explained in this chapter.
Then the preparation of the data an the first analyses are described of mobile phone data of Mezuro,
see Chapter 8. In Chapter 9 the algorithms used to combine the data are explained with the belonging
results. In the last analysis chapter, Chapter 10, the algorithms are applied for the data from Mezuro.
Then the conclusion for Part II is given and after Part II the conclusions and recommendations for the
total research are described.
3
Introduction
Location-Based Applications using mobile phone location data
Figure 1.1: Research design
4
Part I
Mobile phone data and the usability in Location-Based Applications
2
Localization of mobile phones
This chapter describes the basics of mobile phone location data. A cell is the key-element in the location
determination of a Mobile Station (MS). This MS is often a mobile phone, but it could be everything
with a SIM (Subscriber Identity Module) card. The first section explains the concept of a cell and its
corresponding components. The different localization technologies, methods to find the location of the
mobile phone, are described. The localization technologies are drawn from the least accurate (Cell ID)
to the best accurate localization technology (UL-ToA). A classification is made for ordering the different
types of mobile phone location data. Next, advanced location methods are summed up. The methods,
found in literature, improve the location accuracy by for example decreasing the cell area using a Voronoi
tessellation. The chapter ends with a conclusion.
2.1
Cells
The key-element in the determination of a location of an MS is the cell. A cell is the general name of an
antenna providing mobile phone coverage. A cell consists of a cell site, the location of the antenna, and
the cell area, the coverage area of the antenna. All components are shown in figure 2.2. Usually three cells
are attached at a site, see figure 2.1. Cells have different ranges, up to 35 kilometres. So one cell could
cover an area of approximately 1300 km2 .
Figure 2.1: Cells in the Netherlands, source: www.open.ou.nl, www.grenswetenschap.nl and own picture
6
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.2: Components of a cell
For the different generations, explained in the next section, different cells exist but cells from different
generations could be aggregated at one site. The telecommunications architecture is described in Appendix
A. Figure 2.3 shows the coverage of all cells of one provider in the Netherlands.
Figure 2.3: Coverage of all cells in the Netherlands
2.1.1
Generations and frequencies
The antennas of cells differ in generations and frequencies. The first generation (1G) is the first communication generation and uses analogue telecommunications (Borgaonkar and Redon [2011]). The second
generation replaced 1G because 2G, GSM in Europe and CDMA in the USA, has an extra functionality,
namely text messaging: Short Message Services (SMS). Also, 2G uses digital telecommunications instead
7
Localization of mobile phones
Location-Based Applications using mobile phone location data
of analogue telecommunications and uses the so-called GSM 900 and GSM 1800 frequencies in Europe.
The frequency of the radio signals of the cell are near 900 MHz respectively 1800 MHz. With a higher
frequency less disturbances occur during a call. However, the range of a cell with a higher frequency is
lower, at maximum 10 kilometres. The third generation (3G), UMTS, uses the frequencies near 1900 MHz.
With the third generation data traffic can be sent faster. Because UMTS uses a higher frequency, UMTS
has a lower range than GSM. The cell areas will thereby be smaller which increase the localization accuracy.
Nowadays a new generation is being deployed having faster speeds. This new generation (3.9G), named
Long Term Evolution (LTE), is using six different frequencies. Different providers in the Netherlands
bought different frequencies, namely 800, 900, 1800, 2100 and 2600 MHz1 .
2.1.2
Allocation
In urban and suburban areas any given location is usually covered by multiple cells. Overlap in cells exists
because of the different types of connection (e.g. for a data connection or making a call), the chance of
failure of one of the components in the network architecture and the maximum capacity of a cell (e.g.
festivals due to an increase in demand). The availability of the cell, distance to the site of the cell and type
of the connection (e.g. data or call) determines the allotment for a cell. However, the allotment to a cell is
most of the times determined by signal strength of the site to the MS. The signal strength decreases when
the distance between the MS and cell increases, and therefore the MS is most of the time connected with
the cell closest to the MS. The closest cell could have lower signal strength than cells further away due to
the propagation of the waves. A clear line of sight with the cell improves the connection.
As example, figure 2.4 shows the cells covering the TU Delft area. As shown the signals of approximately 75 cell towers reach the TU Delft area. Even cells in Leiden, Spijkenisse and on the Maasvlakte
have a sufficient range. From experiences of users of one provider, the signal strength on the North Sea
on, for example, a boat is sufficient to make a connection but at other spots in the Netherlands, like the
area near Driebergen-Zeist station the signal strength is bad. Multiple cells theoretically reach this area
but a proper connection cannot be made. One of the causes is the blocking of the radio waves by for
example buildings.
1 http://tweakers.net/nieuws/86118/kpn-vodafone-t-mobile-en-tele2-kopen-4g-frequenties.html.
8
Accessed 5 September, 2013
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.4: Cell sites of cells covering TU Delft area (white circle)
2.1.3
Femto cell and micro cell
A femto cell and a micro cell have the same function as a normal cell and are an addition to the network.
They have a smaller range and therefore the location determination is more accurate than a normal cell
(Borgaonkar and Redon [2011]). The difference between a femto cell and a micro cell is the difference in
range. A femto cell gives capacity from five to twenty MSs for home or business use. The range of a femto
cell is only several tens of meters. Micro cells provide the same extra capacity in public spaces. In a train
station, tunnels or on a square micro cells are placed, providing extra capacity and signal strength. The
range of a micro cell is on average 2 kilometres.
Micro cells can be used in dense areas such as city centres, stations and festivals. The Dutch railway station Schiphol and metro stations in Rotterdam and Amsterdam have micro cells for coverage
underground. Figure 2.5 shows micro cells in the metro of Rotterdam. The circle is the theoretical coverage
of the cells. But because the cells are located in the tunnel, the practical coverage is only the part in the
tunnel.
9
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.5: Micro cells in metro stations in Rotterdam with their theoretical radius (on average 500
meters)
2.2
Localization technologies
Different localization technologies exist, applied at the network side as well as the MS side of the network.
The technologies are available for different digital cellular technologies (UMTS, GMS, CDMA and LTE). In
this section only localization technologies for the technologies UMTS and GMS are researched because
these technologies are used in the Netherlands. CDMA is only being used in the USA. Furthermore
the technique LTE is early 2014 in the starting phase and it is not yet rolled out completely, therefore
localization technologies for LTE are not described.
Table 2.1 shows the different methods described in this section. The availability for UMTS and/or GSM
is named and the possible need of extra equipment is described. For some of the location technologies
software and hardware need to be installed to use the technology. Furthermore the accuracy per technique
is classified. The accuracy differs for each location, but weather conditions and time will also influence
the accuracy.
Table 2.1: Overview localization technologies, based on Costabile [2010].
Technique
Based on
Available for
Extra equipment
Accuracy
Cell ID
AoA
Network-based
Network-based
UMTS and GSM
UMTS and GSM
0.3 -20 km
100-200 m
TA
RTT
UL-ToA
Network-based
Network-based
Network-based
GSM
UMTS
UMTS and GSM
RSS
Fingerprinting
Network-based
Network-based
UMTS and GSM
UMTS and GSM
Hardware and
software modifications
Software modifications
Software modifications
Additional hardware
and software (LMU)
Hardware modifications
Additional hardware
and software
0.2 – 11 km
0.2 – 11 km
<50 m
0.25 – 12 km
10 m
Cell identification
Cell identification (Cell ID) is the simplest method to derive the location of an MS. Cell ID is the method
using the current CGI (Cell Global Identity) of the MS. Figure 2.6 shows the MS, in this case a mobile
phone, in the area of a site. No calculations are needed for this method. The main limitation of this
10
Localization of mobile phones
Location-Based Applications using mobile phone location data
method is the bad accuracy of the localization. Cells can have a range up to 35 kilometres. The accuracy
of the location is smaller, because the average range is smaller and the cells are located close together. The
accuracy of Cell ID is researched in various papers, therefore all location accuracy studies are summed up
in table 2.2. The related confidence interval is named as well. All error distances with a confidence interval
of 50% are averaged, a median error distance can be found of approximately 350 metres in urban areas,
750 metres in suburban areas and 1650 metres in rural areas. The different studies use different types of
data and different types of detection. Also the analyses are done in several countries and therefore the
definition of an urban, a suburban and a rural area differs. Only one study is performed analysing the
accuracy in the Netherlands [Witteman, 2007].
Table 2.2: Location accuracy with precision found in literature
Source
Accuracy (m)
Confidence interval
Murray [2002]
Urban: 320
Suburban: 640
Route 1: 141.7, 194.1, 353.5
Route 2: 224.4, 274.7, 419.5
Route 3: 197.6, 256.4, 476.0
(urban)
Urban: 232, 574
Suburban: 760, 2479
1100 (suburban)
Urban: 480
Suburban: 750
Highway: 1000
Urban: 790
Suburban: 490
Highway: 2910
Route 1: 534, 887
Route 2: 167, 384
Route 3: 341, 715
Route 4: 347, 632
(suburban)
Home: 853, 1448, 2060, 6212
Work: 998, 1336, 3701, 34166
Urban: 247,258, 526
Suburban: 1062, 1217, 1870
Rural: 1042, 1045, 1746
230 (urban)
210, 410 (urban)
67%
Lin and Juang [2005]
Chen et al. [2006]
Witteman [2007]
Trevisani and Vitaletti [2004]
(Italy)
Trevisani and Vitaletti [2004]
(USA)
Paek and Kim [2010]
Isaacman et al. [2011]
Lakmali and Dias [2008]
Curran and Hubrich [2009]
Yadav et al. [2012]
11
50%, 67%, 95%
50%, 90%
50%
50%
50%
50%, 80%
25%, 50%, 75%, 95%
50%, 67%, 95%
50%
50%, 80%
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.6: Method: Cell ID. Only the location of the cell is known. The area of the location of the MS is
the whole cell area; not to scale.
Angle of Arrival
Angle of Arrival (AoA) is a method which uses the angle of the connection to determine the location.
An AoA measurement leads to a Line of Bearing (LoB) with a certain angle α and with a certain error
ε, see figure 2.7. With one cell the location of the MS is on the Line of Bearing while with two cells a
position, a quadrangle, can be determined and the location accuracy improves when more cells have an
AoA measurement. Antennas are needed which send and receive ‘direction-dependent’ signals [Koppanyi
et al., 2012]. Also the MS needs to be in Line of Sight (LoS) of the cell because reflections will cause a
location estimation which is not correct [Raja et al., 2004]. In rural areas this LoS is easier to establish
and therefore the method performs better, due to the inaccurate LoS because of reflections, the technique
is not often used. The number of measurements is depending of the amount of cells in which the MS is in
but the MS takes bearings with maximum five cells.
12
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.7: Method: Angle of Arrival. After measuring the angles (α) with a certain error (ε) of the two
cells, an area can be found where the MS is located; not to scale
Time of Arrival
Time of arrival uses timing information between the cell and MS. UMTS and GSM use different technologies. For GSM, Timing Advance (TA) is available and for UMTS, Round Trip Time (RTT) is available.
Both technologies use the allocated time slots for a transmission. With the known time slots and the
speed of the radio waves (the speed of light) the distance between the MS and cell is calculated. The cell
is for each timeslot divided in different zones. The distance of each zone for TA is fixed with a distance
of approximately 554 metres [Samiei et al., 2010]. For RTT the time slot is called a chip. Because these
chips have a smaller time resolution, the fixed distance is approximately 39 metres [Kwan et al., 2012]
and therefore the location accuracy is higher. An illustrated explanation is given in figure 2.8.
13
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.8: Method: Timing Advance/Round Trip Time. From two cells the time slot is determined
(between the two thick lines) and an area can be found where the MS is located; not to scale
Time of Arrival can also be used to perform a triangulation, which requires Location Measurement Units
(LMUs) to be installed at the cells. The MS does not need any modifications. These LMUs are additional
hardware performing the triangulation. With UpLink Time of Arrival (UL-ToA) in a GSM or UMTS
network the triangulation can be performed. With a so-called ‘access burst’ a signal from the cells to the
mobile phones will cause an asynchronous handover. With the known access times a triangulation can be
performed with a certain error ε, see figure 2.9.
14
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.9: Method: UL-ToA. From two cells the time between the cell and MS is determined (between
the two thick lines) and an area can be found where the MS is located; not to scale
The accuracy of above methods depends on the Line of Sight path between the sites and MS. If an LoS
does not exist between the MS and the site, the calculated position could differ from the actual location.
Again in urban areas the deviation of the radio waves, for example reflection caused by buildings, the
problem is bigger than in rural areas.
Received Signal Strength
RSS (Received Signal Strength) is a method using the signal strength of the BTS. The RSS is the strength
of the signal at the MS and the TSS (Transmitted Signal Strength) is the strength of the signal at the
cell. The strength of the cells is collected to have a smooth handover. Therefore the strength of up to six
cells is measured. The difference between the RSS and TSS is called the signal loss. With the signal loss a
distance can be found between the cell and MS. Figure 2.10 shows the working of RSS.
With a propagation model, like the model from Hata [1980], the location is calculated from the RSS
values. A distance is found with the site as centre. With three of these different distances the location can
be found. However, an error can cause an incorrect position. Three sources of errors exist namely: the
used propagation model, truncation and calibration of RSS values according to the specifications and
disturbances due to fading. Fading is the process which reduces the signal, often caused by obstacles.
The Database Correlation Method uses also parameters of the mobile phone network to have location dependent properties which can be matched with data from the network. These location dependent
properties are fingerprints (Lakmali and Dias [2008] and Laoudias [2012]). A pilot measures first the
location dependent properties. Most of the times RSS values are measured for each accessible location
and the values are matched with the strength given by the mobile phone. The measured properties are
saved in a database. Pilot Correlation Method is a version of Database Correlation Method, see Borkowski
and Lempiäinen [2005]. This method is also used for determining the location of handover[Bar-Gera,
2007].
15
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.10: Method: RSS. From two cells the signal loss between the cell and MS is calculated (between
the two thick lines) and an area can be found where the MS is located; not to scale
2.3
Mobile phone location data
Many types of mobile phone location data exist. Besides the above described positioning technologies,
the data can be extracted at different levels in the telecommunications network. One of the most common
used extractions is the Call Detail Record (CDR). A description of a CDR will be described in the next
section. All types will be described in the classification of mobile phone data. One of the main issues of
mobile phone location data is the privacy of the users which can be connected with the types of data.
2.3.1
Classification of mobile phone location data
Mobile phone data have never the same properties. There are multiple types of aggregation and location
techniques. Figure 2.11 shows an overview of the different types of mobile phone data. The first level,
yellow, divides the data in to cell-driven, event-driven and network-driven. Explanations for these different
levels are described in the next pages. Then the mobile phone data are divided in aggregation levels,
the blue column. The data on cell level are based on aggregation by cells, while the event-driven and
network-driven levels are based on aggregation by Mobile Stations (MSs). At last the green level explains
the localization technology, described in section 2.2.
16
Localization of mobile phones
Location-Based Applications using mobile phone location data
Cell-driven data
The first level is data aggregated at cell towers. The cell tower statistics give information about the amount
of calls, text messages and data use of one cell tower. With the cell tower statistics density information
can be determined. The aggregation can be done using Erlang or by taking the number of calls or text
messages (SMS). Erlang is a unit to measure the occupancy of the network or the call intensity.
Event-driven data
Aggregated event-driven data with cell location information, most of the times an extraction from a Call
Detail Record, give information about details of calls, data use and text messages per cell. Because the
CDR is aggregated no individual information can be found.
Individual Call Detail Record (CDR) gives information on individual level. It provides information about
calls, text messages and data use of an individual user with the connections between other persons.
However, no spatial information is given so any applications with location information can be extracted
from the data. A CDR exists of different attributes during an event. An event is when a user receives or
makes a call, sends or receives a text message or uses data. When a user makes a phone call, attributes
like the duration of a phone call, time of calling, the number that was called and the location are stored.
Individual CDR with cell location information provides also information about the location. Individual
CDR with cell location information can have information about the called user and the duration of the
call such as the category ’Individual CDR’ has, however most of the mobile phone data found in literature
in this category have not this type of information.
Network-driven data
Individual network-driven data are data generated with a periodic interval, during a handover or when
the Location Area is changed. The Cell IDs are saved during such event. More events are generated due
to the periodic updates generated by the network. With a periodic interval any other location technology
could be used.
Advantages, disadvantages and applications of the different levels are shown in table 2.3. Most of
the categories are derived from figure 2.11. The category ‘individual CDR with cell location information’
uses the localization technology ’Cell ID’ or one measurement of localization technology ‘TA/RTT’ and
individual CDR with triangulated location uses the localization technology ‘AoA’, ‘RSS’, ‘UL-ToA’ and
multiple measurements of ‘TA/RTT’.
17
Localization of mobile phones
Location-Based Applications using mobile phone location data
Figure 2.11: Classification tree for mobile phone data
18
Localization of mobile phones
Location-Based Applications using mobile phone location data
Table 2.3: Mobile phone location data, derived and adjusted from Calabrese [2011].
Category
Advantages
Disadvantages
Aggregated cell
statistics
Aggregated CDR
with cell
location information
Individual CDR
Easy to manage,
possible in real time
Easy to manage
No information
on users mobility
No individual
interaction
information
Large dataset,
mostly not real-time
Large dataset,
mostly not real-time
Individual CDR
with cell
location information
Individual CDR
triangulated location
Individual networkdriven data
2.3.2
Individual communication
patterns
Individual communication
patterns and mobility
patterns
Individual communication
patterns, possible in
real time
Individual communication
patterns, possible in
real time
Large dataset,
possibly need for special
hardware to access data
Large dataset,
possibly need for special
hardware to access data
Privacy
Privacy is the main reason that mobile phone location data are not available on a large scale. The
government and the telecom providers have strict regulations in order to protect the users’ privacy and to
follow the laws. Making the data anonymous, is the first step to preserve the privacy. Furthermore there
is the possibility of changing the (anonymous) identification number after a predetermined time period.
With the changing number a user becomes more difficult to trace. Another possibility is to force grouping
of the data. With the grouping of the data a single user is definitely untraceable.
2.4
Advanced location methods
In this section advanced location methods will be discussed. Advanced location methods use different
types of analysis techniques to increase the location accuracy. Where the location techniques described
in the previous section have to use the network’s software and hardware, the location methods in this
section are methods to be performed after the mobile phone location data are collected.
In table 2.4 methods are divided in two categories, one event and multiple events. One event methods are methods where there is no knowledge about the sequence of events, wherein the multiple events
methods this knowledge is available. One method, overlapping cells, is already used for the mobile phone
data of Mezuro, the Top Serving Cells Map (TSCM) is in development by Mezuro and the rest of the
methods have been found in literature. Table 2.4 gives a value to four categories for each method. The
calculation time, the amount of extra information needed, the accuracy improvement and the results in
literature are valued in the table. The description of the methods can be found below the table.
19
Localization of mobile phones
Location-Based Applications using mobile phone location data
Table 2.4: Overview advanced location methods (+ = good, -/+ = average and - = bad)
Methods
Available
literature
1
2
3
+
+
+
-/+
+
+
-/+
-/+
-
+
+
-
+
+
+
Chen et al. [2006], Phithakkitnukoon et al. [2012]
Fontaine et al. [2007]
Candia et al. [2008], Ahas et al. [2008], Doyle et al. [2011],
Frias-Martinez et al. [2010], Huang et al. [2010]
Baert and Seme [2004], Portela and Alencar [2006]
Traag et al. [2011]
Girardin et al. [2009]
-
-
+
-
-/+
-/+
-
Reades et al. [2007], Novak et al. [2013],
Isaacman et al. [2011]
Multiple events
Map matching
-
-/+
-/+
Overlapping cells
Random walk
Markovian
Kalman filtering
+
+
-/+
-/+
-/+
-/+
-/+
-/+
+
-/+
+
Trajectory modelling
-
-/+
+
Gao and Liu [2013],
Fontaine et al. [2007]
Gonzalez et al. [2008]
Eagle et al. [2009], Milani et al. [2009]
Sohn and Kim [2008], Zhang [2012], Olama et al. [2008],
Ramm and Schwieger [2007], Qiu and Ran [2008]
Anderson and Muller [2006], Chen et al. [2006],
Cheng et al. [2006]
One event
Centroid
Zonal sampling
Voronoi
Weighted Voronoi
Smooth Voronoi
Best Serving
Cell Map
Top Serving
Cells Map
Clustering cells
1
2
3
Calculation time
Information needed
Accuracy improvement
One event
Centroid
Centroids are the geometric centres of the cell area. The centroid is used as the location of the MS. Using
centroids instead of the location of the site, locations are used within the cell area and not at the edge of
the cell area. Because most of the times the MS connects with a cell close to the cell site and the MS is
closer to the cell site than the centroid of the cell area.
Zonal sampling
The method zonal sampling uses hexagonal areas. A cell area can overlap different zones, therefore a cell
needs to be assigned to one zone. Because these areas have the same shape, attention is needed to avoid
transportation lines on the border of a hexagonal area.
Voronoi
A Voronoi diagram divides the cell areas in polygonal areas. The Voronoi diagram uses the fact that a
point in a cell is always closer to the cell site than other sites nearby. The border between two Voronoi
regions has the same distance to two opposing sites. First each Voronoi region is connected with one site
and thereafter the Voronoi region of one site is divided into Voronoi regions for each cell using the cell
areas as resource.
20
Localization of mobile phones
Location-Based Applications using mobile phone location data
Weighted Voronoi
Because cells have different ranges, a normal Voronoi diagram could be improved by using these ranges.
A weighted Voronoi diagram uses the different ranges to create sectors according to the range. Baert and
Seme [2004] researched a generalized form with straight lines between the intersections while Portela
and Alencar [2006] made a distinction between the ranges of the cell areas at a cell tower and uses circles
between two intersections.
Smooth Voronoi
A smooth Voronoi diagram takes a probabilistic modelling approach as base. A normal Voronoi diagram
has a clear border between two cell areas, however this is not realistic. At a location one could connect to
multiple cells and a smooth Voronoi has been used to make it possible connecting multiple cells at the
same location.
Best Serving Cell Map
A Best Serving Cell Map (BSCM) uses the radio propagation models or measurements by the provider to
calculate the best serving cell at each location. A Best Serving Cell Map could be made with more detail
when the propagation models take into account height, e.g. buildings.
Top Serving Cells Map
A Top Serving Cells Map (TSCM) uses the same source as the BSCM. The difference between the maps is
that a TSCM has for each location multiple serving cells. The likelihood of using the cell in the areas is
also available in a TSCM. The BSCM gives the cell with the highest likelihood of the TSCM.
Clustering cells
Clustering has been used to identify places people visit. Because people can have a connection to different
cells at the same location, clustering leads to one area containing different cells in or outside the area. An
algorithm used for clustering cells is the K-means clustering [Reades et al., 2007]. K-means clustering uses
the fact that every cell in a cluster is as much like the other cells of that cluster. Also one cell in a cluster
has to be as different as possible from cells of any other cluster. A cluster can be identified on different
requirements of the aim. K-means clustering is also used in the Hartigan’s leader algorithm [Isaacman
et al., 2011]. The intramax method is a hierarchical clustering algorithm. It maximizes the rate of the
total interaction which takes place within the aggregation of areas [Novak et al., 2013]. The method uses
the relative strength of the interactions between areas. More about clustering can be found in Jain et al.
[1999].
Multiple events
The advanced location techniques with multiple events are techniques using events from the same user
within a certain time period. With multiple events the certainty that a user is moving or staying is larger.
The certainty that a moving user is at the same track or the direction is larger as well.
Map matching
Map matching is useful for dynamic events. When a user is on the move, the user can be assigned to
a road, railway or cycle path in the specific cell. Furthermore, if multiple events point out the user is
travelling with such a speed that the user has to be on the highway or in a train for example, the user can
be assigned to one of the two types on the map. Most of the studies use map matching to extract e.g.
the flow on a highway, but Fontaine et al. [2007] and Gao and Liu [2013] name it explicitly. Both studies
match events on a highway, however when matching events on local roads multiple roads could will be
situated in the cell area due to the large cell area.
Overlapping cells
When cells are overlapping, an MS at a fixed location could change cells. It is not certain whether the MS
is moving. However when events are overlapping during night, it is more likely that the person is not
21
Localization of mobile phones
Location-Based Applications using mobile phone location data
moving. With more overlapping cells and the assumption of no movement, the location of the MS is more
likely to get more accurate. No literature has been found using the method with overlapping cells and
corroborates the method and the improvement of the accuracy. Mezuro uses this technique to create
home and work locations.
Random walk
The random walk model is used for modelling movements from mobile phone data. The model is based
on the assumption that the movements of the user are random and therefore a neighbouring cell can
be visited with an equal distribution over the cells. The model is easy to create because no information
about the state of neighbouring cells is needed. However no movements are fully random and therefore
this method is questionable.
Markovian
The Markovian model has different probabilities for the neighbouring cells for visiting one of these cells.
The probabilities depend on the user history. Furthermore the movement direction will cause a higher
probability that the user will continue in the same direction and travelling towards the thereby belonging
cell than another neighbouring cell.
Kalman filtering
Kalman filtering uses weighted averages to filter out the noise of a dataset. Because mobile phone data
are raw measurements the location accuracy of a moving MS can be improved by estimating the motion
of the MS, filtering out the noisy events. The filter can be used real-time because it predicts the next states
and with a weighted average the prediction is improved. Kalman filtering is used with network-driven data.
Trajectory modelling
Hidden Markov Model models a trajectory as a series of hidden states. It uses dynamic programming to
solve the series of hidden states. Another model is the particle filter which uses also the series of hidden
states. However it uses it in a continuous state space. In comparison with the Hidden Markov Model, it is
impossible to solve exact. Therefore with the help of a Monte Carlo Simulation an approximation of the
solution is made. Kalman filtering is used with network-driven data.
2.5
Conclusion
An overview is given in this chapter providing the basics of mobile phone data, architecture and localization
technologies. A lot of different types of mobile phone data exist. Per cell a lot of types of data can
be extracted with different quality levels. Furthermore data can be extracted at different times, for
example when changing cells, or when making calls, sending SMS and use data. Besides the localization
technologies, multiple advanced location methods are found. Methods like Voronoi diagrams, overlapping
cells and Best Serving Cell Map are the best methods for improving the localization of a user. Which
advanced location method a better accuracy provides, depends on the type of data and desired application.
22
3
Usability mobile phone location data
In the near future mobile phone data could be an important data source in the field of traffic and transport.
Many studies are done in the last years to determine the ability of using the data for different applications.
Mobile phone location data exist in different forms and the accuracy differs per location technology and
network topology.
In this chapter an extensive literature overview is given to provide a comparison between studies, based
on the results of the studies for example numeric errors. The literature review will be used in Part II to look
at different advanced location methods with a reasonable result for the used combination of researched
parameters and type of mobile phone data. In the second part of this chapter, a classification is created to
categorize the usability of mobile phone data for applications.
3.1
Literature review
The first research using mobile phone data for a traffic application began in 1994 [University of Maryland
Transportation Studies Center, 2004]. CAPITAL tries to estimate traffic speed in the capital of USA,
Washington. Via triangulation the position of the detected vehicle was determined and thereafter the
speed was calculated. When someone made a call, the position was saved and with the positions the
speed was determined. But only for 20% of the detected vehicles the speed could be calculated and
therefore speeds could not be used for traffic monitoring. After this project more field tests were performed
throughout the globe. One more is exemplified, the only study performed in the Netherlands. In 2004
a combination of Vodafone and LogicaCMG tried to visualize traffic flows and congestion on the road
networks in Noord-Brabant [Steenbruggen et al., 2010]. This travel time information was given real-time.
The correlation between travel times from the system and the reference system, loop detectors, was high.
In table 3.1 on the next pages the literature review is given. Besides the description of the study or
investigated parameter(s) and the result or numeric error, the type of mobile phone data is named. An
explanation of the type of mobile phone data is listed in table 2.3. Furthermore, the advanced location method, if used, is stated, explanations of the methods are given in Section 2.4. At last the type of
study is mentioned, a case-study or simulation. The simulation studies use a dataset created by simulation.
AirSage data and D4D data, relevant for studies in table 3.1 (See column ’Type mobile phone data’).
AirSage data use mobile phone data from users of different providers in the United States. The company
AirSage uses TA and RTT for locating users. The data are saved in a database and processed to sell the data
to different customers. The type of the mobile phone data of AirSage is individual CDR with triangulated
location information. The location accuracy is around 220-320 metres ([Wang et al., 2013] and [Calabrese
et al., 2013]). Orange distributed mobile telephone data of Ivory Coast, see Blondel et al. [2012]. With the
name Data for Development, researchers have researched different subjects with the mobile phone data.
Because the data with locations of the cells were publicly available, many topics were researched. Besides
transportation and planning topics, topics related to healthcare, language and poverty were investigated.
Most of the times such large datasets are not publicly available. With 5 million customers, the amount of
users is one third of the total population of Ivory Coast. The dataset contain five months of anonymized
23
Usability mobile phone location data
Location-Based Applications using mobile phone location data
CDR. 2.5 billion events, SMS and calls, were provided. D4D data belong to the category ’Individual CDR
with cell tower location information’.
3.1.1
Results studies from the literature review
The different topics summarize the results of table 3.1. Due to the different types of mobile phone data,
the results are difficult to compare. Also the countries where the studies were done, differ between the
studies and therefore the network topology differs. This causes a difficult comparison between studies
too. However, a global idea of results could be given.
The location accuracy is varying over the different type of data. One study with network-driven mobile
phone data discovered a range of the location accuracy of 150 up to 1500 meters while another study
found an accuracy range of 10 till 40 meters. The difference is caused by the difference in localizations
technologies. The first study used Timing Advance while the second study used handovers, this method
uses the Received Signal Strength. While triangulated mobile phone data location accuracies of 168 and
350 meters are discovered. Also the AirSage data found a similar accuracy range. In the literature review
no location accuracies of mobile phone data with location information has been found. Network-driven
data appear to have a better location accuracy than the other types of mobile phone data.
Activities
Special events and tourism activities can be detected with mobile phone data. Multiple studies, like Ahas
et al. [2007], show tourism activities or special events. The most popular activities are highly predictable
[Quercia et al., 2010]. Like Maestre et al. [2013] did, home and work locations are possible to detect.
However one study, determining the home location, found an error of 35 % [Frias-Martinez et al., 2010].
Most of the studies researching this type of topic used cell tower location information. A lot of studies
use the Voronoi tessellation as advanced location method, but also clustering is used. One study has a
numerical result using individual CDR with location information. Frias-Martinez et al. [2010] detected 65
% of the home locations correct.
Planning
Looking at origin-destination relations, the results were positive. One study found that the data is inferior
with the existing technique [Sohn and Kim, 2008]. Candia et al. [2008] found an average travelling distance
of 6 km in 0.5 hour. As advanced location method zones, grid systems and Voronoi tessellation are being
used.
Transport
After determining routes from mobile phone data, mode choice, flow, travel time and speed can be
examined. Here the results are not all positive and significant. Swaans et al. [2006] and Alger et al. [2004]
found out they have a lack of data, researching respectively travel time and speed. Measuring the flow, the
numeric error fluctuates between 3% to almost 27% in six studies. For the travel time it fluctuates between
11% and 15%. Individual CDR with cell location information is used in one study. [Doyle et al., 2011]
detected the mode choice with a numeric error of 10%, however this is done at one route. Network-driven
data give here as well better results than the other types of data, although the differences between studies
with network-driven data are large.
24
Description / investigated parameter
Results and/or numeric error
Type mobile phone data
Simulation/casestudy
Advanced location
method used
Ahas et al. [2007]
Seasonal tourism activity
can be detected
Improvement with accommodations can be found
Lack of data
-
case-study Estonia
Voronoi
case-study Germany
Best serving cell map
Asakura and Hato
[2004]
Bar-Gera [2007]
Anchor points (e.g. home
location)
Speed, travel time
168 m (90%)
case-study Japan
Move or stay identification
Foot-printing
Berlingerio et al.
[2013]
Caceres et al. [2007]
OD-matrix, traffic flow
OD-matrix, traffic flow
OD-matrix suffices to use in
transport models
Flow: 3%
Individual CDR with cell
tower location information
Individual CDR with cell
tower location information
Individual network-driven
data(handover)
Individual CDR triangulated
location
Individual network-driven
data(handover)
D4D data
case-study Estonia
Alger et al. [2004]
Seasonal
activity
tourism
Tourism compared with
used accommodations
Speed
Caceres et al. [2011]
OD-matrix, traffic flow
Flow: 6-8%
Calabrese
[2010]
Calabrese
[2011a]
Calabrese
[2011b]
et
al.
Home and event location
et
al.
Density
Positive correlation events home
Location: 150 -1500 meters
et
al.
OD-matrix, density
Calabrese
[2013]
et
al.
Intra-urban mobility
Ahas et al. [2008]
25
Candia et al. [2008]
Human dynamics
Doyle et al. [2011]
Train/car between two
cities
Route detection
Fiadino et al. [2012]
Speed 20%. Travel time: 11%
5 trips during weekdays and
4.5 trips during weekend
days
Model explains 49% variation individual mobility and
56% vehicular mobility
Average travelling distance:
6 km in 0.5 hour
10 % error in mode choice
Dataset
contains
low
amount high mobility users
Individual network-driven
data (LAU)
Individual network-driven
data (handover and LAU)
AirSage data
case-study Israel
case-study
Coast
simulation
Ivory
Voronoi
-
case-study
-
case-study USA
-
Individual network-driven
data (periodic; every 480 s)
AirSage data
case-study Rome
Voronoi
case-study USA
-
AirSage data
case-study USA
Grid system
-
case-study
Voronoi
Individual CDR with cell
tower location information
Individual network-driven
data (LAU)
case-study Ireland
Voronoi
case-study Austria
Sequence
Location-Based Applications using mobile phone location data
Reference
Usability mobile phone location data
Table 3.1: Literature overview of studies using mobile phone location data
Results and/or numeric error
Type mobile phone data
Simulation/casestudy
Advanced location
method used
Fontaine et al. [2007]
Speed at highways
Zonal sampling
Home location
Individual network-driven
dat a(LAU)
Individual CDR with cell
tower location information
Individual network-driven
dat a(LAU)
Individual CDR with cell
tower location information
Individual CDR with cell
tower location information
Individual network-driven
data(handover)
simulation
Frias-Martinez et al.
[2010]
Friedrich et al. [2010]
Significant results, difference in speed: 0-5 mph
Error(per month): 35%. Coverage: 47%
Level Of Service: 3%
case-study
Voronoi
case-study Germany
-
simulation
Clustering
case-study USA
Propagation model
case-study Sweden
Foot-printing
AirSage data
case-study USA
Individual CDR triangulated
location
Individual CDR triangulated
location
Individual CDR with cell
tower location information
case-study China
Interpolation
method
Voronoi
case-study USA
-
case-study USA
Hartigan’s leader algorithm
Individual CDR triangulated
location
Individual network-driven
data (LAU/handover)
Aggregated cell tower statistics(Erlang)
Individual network-driven
data (LAU)
D4D data
case-study Estonia
-
case-study Austria
-
simulation
case-study Antwerp
Kernel-Density Estimation (KDE)
-
case-study
Coast
Kernel-Density Estimation (KDE)
Gao and Liu [2013]
OD-matrix, traffic flow,
LOS
Demand simulation
Girardin et al. [2009]
Point of interest
Gundlegård
and
Karlsson [2009]
One
road
piece
speed/travel time
Hoteit et al. [2013]
Real human trajectories
Huang et al. [2010]
Anchor points
Correct estimation: 10% for
location error of 50 m
Point of interest change during the years
Handover location error,
GSM: 40 m. UMTS: 10
m. Travel time: 5 seconds,
speed: 3 km/h
Interpolation best approach
to detect mobility class
Anchor points: 10%
Huayong et al. [2010]
Differences in travel time
Location: 350 meters
Isaacman
[2011]
Important places and
home/work are recognizable
et
al.
Jaerv et al. [2012]
Traffic flow
Hartigan’s leader(50%): 800
metres. Without: 1600 metres. 10% error in important
places determination
Flow: 25%
Janecek et al. [2012]
Travel time
Number of jams: 1%
Krisp [2010]
Evacuation
Maerivoet and Logghe [2007]
Maestre et al. [2013]
Travel time
100% precision/ few false
positives
Travel time: 15%
Commuting patterns
Home and work location can
be detected
Ivory
Location-Based Applications using mobile phone location data
Description/investigated
parameter
Usability mobile phone location data
26
Reference
Results and/or numeric error
Type mobile phone data
Simulation/casestudy
Advanced location
method used
Milani et al. [2009]
Car accident detection
Markov model
Locations during the day
Individual network-driven
data (handover)
Cell location information
simulation
Montoliu et al. [2013]
Precision: 25%, direction:
35%, start time: 15%
63 % of the day
Stay points
Nanni et al. [2013]
10 % decrease system travel
time
-
D4D data
Novak et al. [2013]
Optimizing public transport, OD-matrix
OD-matrix
case-study Switzerland
case-study
Ivory
Coast
case-study Estonia
Pei et al. [2013]
Land use classification
case-study Singapore
Clustering
Phithakkitnukoon
et al. [2012]
Social network mobility
case-study Portugal
Centroid
Traffic information services
OD-matrix
Not adequate, 60% detection
rate
80% of people’s travel scope
are within 20 km of nearest
social ties
Technically possible, increase needed in accuracy
-
case-study Finland
-
case-study China
Zones
Quercia et al. [2010]
Recommending nearby
events
Most popular events are
highly predictable
Individual network-driven
data(LAU)
Individual network-driven
data (LAU)
AirSage data
case-study USA
Ratti et al. [2006]
Density
-
case-study Italy
Reades et al. [2007]
Event location
case-study
k-means clustering
Sagl et al. [2012]
Event location/ mobility
time steps
Travel time
Clustering promising, human activities can be found
Events are detectable
case-study Italy
-
-
case-study Germany
-
Route choice, evaluation Variable Message
Signs(VMS)
OD-matrix, traffic flow
Not statistically significant.
30% uses a diversion(VMS
on)
Flow: 10%
Aggregated cell tower statistics (Erlang)
Aggregated cell tower statistics(Erlang)
Individual network-driven
data (handover)
Individual CDR with cell
tower location information
Individual network-driven
data (LAU/handover)
k-nearest,
termfrequency-inverse
document frequency
Centroid/interpolation
case-study Germany
see Schlaich et al.
[2010]
Individual network-driven
data (LAU/handover)
case-study Germany
Sequence
Poolsawat et
[2008]
Qi et al. [2013]
al.
Schafer et al. [2009]
Schlaich [2010]
Schlaich et al. [2010]
Individual CDR with cell
tower location information
Aggregated cell tower statistics(Erlang)
Individual CDR with cell
tower location information
Intramax method
Location-Based Applications using mobile phone location data
Description/investigated
parameter
Usability mobile phone location data
27
Reference
Results and/or numeric error
Type mobile phone data
Simulation/casestudy
Advanced location
method used
Shen and Ma [2008]
Social data
Individual CDR with cell
tower location information
case-study USA
-
Sohn et al. [2006]
Individual CDR triangulated
location
Individual network-driven
data (LAU)
case-study
-
Sohn and Kim [2008]
Mobility modes(walk,
stationary, driving)
OD-matrix
Large number of errors
and uncertainties can
be resolved by machine
learning methods
15% in predicted mode
simulation
Song et al. [2010]
Predictability of mobility
Kalman filter and
Generalized Least
Squared method
Voronoi
Soto et al. [2011]
Social-economic levels
(SEL)
Travel time(highway +
main roads)
Swaans et al. [2006]
28
Tettamanti et al.
[2012]
Traag et al. [2011]
Route detection
Trestian et al. [2009]
Internet use
Wang et al. [2013]
Xie et al. [2013]
O-D traffic, traffic flow
Urban activity analyses
Yadav et al. [2012]
Cell Broadcast messages
Zhang [2012]
OD-matrix, traffic flow
Zilske and Nagel
[2013]
Travel time/mode choice
Special events recognizable
OD-estimation inferior with
existing technique
93% predictability, lack of
variability in predictability
20% wrong estimation of the
SEL
Not enough data on main
roads + off-peak
Individual CDR with cell case-study
tower location information
Individual CDR with cell case-study
Latintower location information America
Individual CDR with cell case-study Nethertower location information lands
(including TA)
Individual network driven simulation
data
Events are recognazible by Individual CDR with cell case-study
using friends
tower location information
and callee information
Users locations and mobility Individual CDR with cell case-study
heavily impact application tower location information
access behaviour
with internet usage information
Flow: 17.6%
AirSage data
case-study USA
Trends in activities can be Individual CDR with cell case-study China
detected
tower location information
CBS: 58% 600 m
Individual network driven case-study India
data
Flow: 26.63%
Individual network-driven case-study China
data(handover)
Traffic patterns can be ob- D4D data
case-study
Ivory
tained from the data
Coast
Voronoi
Map matching
Voronoi, sequence
Smooth Voronoi
k-clustering
Clustering methods
Time weighted algorithm
zones
Voronoi
Location-Based Applications using mobile phone location data
Description/investigated
parameter
Usability mobile phone location data
Reference
Usability mobile phone location data
3.2
Location-Based Applications using mobile phone location data
Mobile phone location data for applications usability
From the literature review above, the conclusion can be drawn that the real value of mobile phone location
data is not explained in the literature. Only Caceres et al. [2008] give an overview of traffic data combined
with the required mobile phone data type. However, such an overview can differ for each mobile phone
location dataset because the amount of updates and the telecommunications network topology differ for
each country and mobile phone dataset. Therefore the usability of mobile phone data will be categorized in
this section and tested in the next chapter for one particular mobile phone dataset. Per class requirements
are determined for the time and location accuracy and one example application is listed for each class.
Based on the literature review, the update frequency and the localization accuracy used in the classes,
prevent that all Location-Based applications can be created.
3.2.1
Classes
When testing the ability to use mobile phone data for Location-Based Applications, categories are made
for time and location accuracies. The location accuracy is divided into four different classes. The smallest
scale has an accuracy between 10 and 100 metres and the largest class has an accuracy of 1000 meters
or more. In between a class of an accuracy between 100 and 1000 metres is created. No classes are
created with an accuracy less than 10 metres because all applications need to be grouped and therefore
an accuracy with less than 10 metres are unnecessary.
Besides the location accuracy different time classes divide the group of applications. Table 3.2 gives a
number for each class per update frequency category and accuracy category. Not every combination
is assigned with a number. This is caused by the relation between the accuracy and update frequency.
Applications with a low update frequency does need to have a small accuracy, otherwise the precision of
the update frequency is annulled because of the large location accuracy. The same applies for the other
way round.
Table 3.2: Classes for applications using grouped data
Accuracy
10 m - 100 m (metres)
100 m - 1000 m
> 1000 m
3.2.2
Update frequency
1 - 15 minutes
1
3
15 - 60 minutes
> 60 minutes
2
4
6
5
7
Example applications
A list of example applications is summarized in table 3.3. Behind each application a short description is
given, explaining the use of mobile phone data in the application. All applications are assigned to the
groups based on own opinions. The grouping is based from the perspective of the ultimate user and its
desires for the update frequency and localization accuracy. This table is made from applications found
in literature and applications used by Mezuro. For each group one example application is given, other
applications could be assigned to the groups according to one’s own opinion. Because the groups of
applications will be checked on the ability using mobile phone data, an application could be assigned to a
group of applications in the future.
29
Usability mobile phone location data
Location-Based Applications using mobile phone location data
Table 3.3: Example application per class
Group 1 - 10 - 100 meters - 1 - 15 minutes
Crowd control - activity of controlling a crowd
Group 2 - 10 - 100 meters - 15 - 60 minutes
Effect of Dynamic Traffic Management measures - collect traffic information at the measure
to compare before and after the implementation
Group 3 - 100 - 1000 meters - 1 - 15 minutes
Incident detection - detect incidents to inform emergency services and reroute traffic
Group 4 - 100 - 1000 meters - 15 - 60 minutes
Travel time information - extract travel time information for travellers
Group 5 - 100 - 1000 meters - > 60 minutes
Public transport lines optimization - detect the demand for each stop and district
Group 6 - > 1000 meters - 15 - 60 minutes
Evacuation management - detect the amount of people in area in case of an evacuation
Group 7 - > 1000 meters - > 60 minutes
Transportation planning - create an O-D matrix to use in transportation models
3.3
Conclusion
The first section gives an overview of the literature. A lot of studies are done to research transportation
and planning topics with mobile phone data with varying success. Some of the studies had the problem
of lack of data, while others claim to have good results. The applications use different types of data and
methods which makes it methodically difficult to create good comparisons. In Part II, when creating a
part of an application, the advanced location methods could be used from studies with reasonable results.
However the type of mobile phone data differs per study so no uniform conclusion can be given of the
usability of mobile phone location data. Also no study created a feasibility study or created an overview
of the usability of different mobile phone types. Therefore this overview is created in the second section
of this chapter. A classification is proposed to categorize mobile phone location data and then for each
category applications can be linked. In the next chapter for one mobile phone dataset the usability will be
tested. Applications can be assigned to a class to give a guideline for the usability of mobile phone data.
30
4
Mobile phone location data of Mezuro
In this chapter the mobile phone location data of Mezuro will be explained and be classified. Thereafter
a first analysis is made to classify the dataset of Mezuro, providing the time and location accuracy of
the mobile phone data of Mezuro. At last the applications are selected that are possible to be created
using the dataset of Mezuro giving the requirements of the accuracies of the applications. The usability
mobile phone data of Mezuro for Location-Based Applications will be investigated in this chapter and
the classes created in the previous chapter will be used to say whether applications can be created using
mobile phone data of Mezuro.
4.1
Characteristics of the data of Mezuro
Mezuro analyses mobile phone location data of one of the three telecom providers and this data were
available for this study. This telecom provider has approximately 25,000 cells. The angle and radius of one
cell differs respectively from 10 till 360 degrees and from 150 till 35,000 meters. The telecom provider offers
also private femto cells for consumers and companies and the provider increases lately the capacity with
micro cells at public spaces. In this section the localization technology, advance location methods and the
mobile phone classification, described in Chapter 2, are explained for the mobile phone data of Mezuro.
4.1.1
Localization technology
The extraction of the Call Detail Records available for research in this thesis has only Cell ID as location
information. Cell ID has a location accuracy of 350 metres in an urban area. In the suburban area a
location accuracy is found of 650 metres and in a rural area a location accuracy of 1650 meters. These
median numbers are an average of different studies done in different countries. When looking in the
Netherlands a median error of 1100 meters can be found in a suburban area (the city Enschede and
surrounding areas) [Witteman, 2007]. More research has to be done to determine the location accuracy
error of the network in the Netherlands. With Timing Advance a more accurate location can be found,
which is already applied for traffic applications in the Netherlands[Rutten et al., 2006].
All network-based location methods are used in the Netherlands by the telecom providers. The telecom
providers know the signal strength, angle and timing information. This information is gathered by the
cell. However due to privacy reasons the more accurate mobile phone location data are not available for
Mezuro. The telecom provider uses this data to detect problems in the network and only intelligence
services could claim personal data for an investigation.
4.1.2
Type of the data
The type of the data from Mezuro is ’Individual CDR with cell location information’, see table 2.3. The
data from Mezuro are extracted from the CDR. The extraction from the CDR consists of all incoming and
outgoing calls, all receiving and outgoing text massages (SMS) and a check when a specific amount of
data is used. The check will verify the balance of the data bundle. In the data of Mezuro no information is
given about the called or texted user.
31
Mobile phone location data of Mezuro
Location-Based Applications using mobile phone location data
An extraction of the CDR of the telecom provider is shown in table 4.1. More information about the data
can be found in Appendix B.
Table 4.1: Example lines of the data of Mezuro
Start_DT_TM
7-1-2013 11:37
7-1-2013 12:37
7-1-2013 13:25
7-1-2013 14:06
7-1-2013 15:32
7-1-2013 17:29
4.1.3
Hashed_ID
5271410
1234536
7550920
7170230
7327640
6919010
Type
S (SMS)
V (Voice)
V (Voice)
D (Data)
V (Voice)
V (Voice)
Direction
MO (Mobile Originating)
OE (Originating End)
TE (Terminating End)
U (direction Unknown)
MO (Mobile Originating)
MT (Mobile Terminating)
Country_Code
204
204
204
204
204
204
Cell_ID
4227013
4227013
2028111
12747674
5306942
12134482
Number of events
For determining the usability of the dataset of Mezuro, statistics are calculated to define the number
of events during the day. Figure 4.1 shows the number of events in the Netherlands of the users of
the telecom provider during working days per hour. The telecom provider has 5.7 million users in the
consumers market as well as the business market. This is roughly one third of the Dutch population. The
number of events is averaged from data of two months. On average the number of events is approximately
three million per hour. During the day the number of events could reach almost 7 million events while
the number of events drops during the night to almost a half million. During the morning the variation is
the highest, where the data show a standard deviation of almost one million events.
Figure 4.1: Number of events per hour during working days
In table 4.2 the average number of events per hour per user is shown for working days and all days during
the week. During the day a user has on average 1.6 events per hour. However, in the morning peak (6-9)
people have fewer events than during the day and in the evening peak (15-18). In the morning peak
the standard deviation is higher than the rest of the day. This is mostly caused during the holidays.
During holidays, people will stay probably longer in bed and uses their phone less. Monday, Tuesday and
32
Mobile phone location data of Mezuro
Location-Based Applications using mobile phone location data
Thursday have a higher standard deviation than Wednesday. These data will be used in for the number
of update frequencies in Chapter 3.2. More analyses are performed to provide statistics of the data, see
Appendix B. For example the average number of events per user for each event type is calculated and the
average duration of a call is computed.
Table 4.2: Events per user for working days and all days
Hours of the day
1-5
6-9
10-14
15-18
19-24
4.1.4
Working days
Mean
0.23
1.05
1.86
1.94
1.05
Std. Deviation
0.13
0.50
0.22
0.27
0.29
All days
Mean
0.26
0.88
1.71
1.78
1.02
Std. Deviation
0.15
0.51
0.33
0.35
0.27
Privacy
Privacy is guaranteed by removing the phone numbers and replacing the numbers with a fully random
number. This random number, the so called ’Hashed ID’, is given to the phone number for a month. After
a month another fully random number is given to the phone number. So at the first of the month, the
Hashed ID is changed. Next to this, the time, date and cell location are saved into the database. Except
making phone numbers anonymous, Mezuro is only allowed to publish data containing more than 15
Hashed ID’s at a place or a route. It avoids publication of data from one person which could, theoretically,
be traced to one individual.
4.1.5
Advanced location methods
Only the ‘overlapping cells’ method is currently used for the data of Mezuro, but other techniques can be
used when creating other types of applications. With more events the already used method by Mezuro
with overlapping cells is providing a better location, because the technique decreases the cell area. The
data of Mezuro permit the use of all location techniques, however, the added value of the techniques is
unknown. The BSCM and Voronoi tessellation are possibilities for improving the location. Both methods
divide the cell areas into smaller areas which abolish the overlap. Map matching is needed to connect
events and cell areas with roads and railway lines. With more events, the favourable methods will depend
on the targets of the application.
4.1.6
Availability data and real-time data
The time delay for processing the data determines the possibility of creating the application. With some
of the applications it is necessary to have real-time data. In other applications it is useful to have real-time
data but it is no requirement, while for other applications it is not necessary at all. For some policy
applications, for example mobility research, data collection available after one week is sufficient. However
the application ’travel time information’ requires live data or at maximum a few seconds delay.
The time delay from the existing dataset will not be leading to an impossibility in the usability of
the data of Mezuro for applications. In the current situation the availability time of the existing dataset is
one time per day. The updates are extracted from the billing system. If in the future this billing system
is able to manage (almost) real-time requests then a real-time application could be created. Beside this
problem of the database, another point of concern is the privacy. With a real-time application, tracking
people is more likely to happen than when data is aggregated over the day.
Furthermore the Hashed ID is changed every month. This will lead to an unknown mobility pattern at
33
Mobile phone location data of Mezuro
Location-Based Applications using mobile phone location data
the start of each month for each user. When making a predication in the beginning of the month it is not
possible to connect the real-time positions with the mobility pattern of the same Hashed ID. In Part I,
these difficulties are not further elaborated.
4.1.7
Conclusion
The classification of the data from Mezuro is ’Individual CDR with cell location information’ and the
data have the localization technique ‘Cell ID’. Cell ID has a location accuracy of around 350 metres in an
urban area. Many advanced location methods are possible to use for an application, ‘overlapping cells’
is currently being used at Mezuro. From the performed statistics the amount events per hour is found.
During working hours on working days an average of 1.5 events per hour is generated by a user, so the
location of the user during the day can be determined. The main advantage of the data of Mezuro is
the monthly changing Hashed ID. This means that an anonymized user can be tracked for a period of a
month.
4.2
Mobile phone location data of Mezuro usability
For Location-Based Services, applications based on individual level and giving personal response, the
dataset of Mezuro is not usable. The privacy rules prescribe that no person is traceable and therefore
all data are anonymized. Also no results of queries in the database are allowed to have less than 15
results. Both rules create an impossibility of usage of the data for individual applications. However in the
future, with permission of the owner, these applications could be created. To create most of the individual
applications it is needed that per second or minute the location is determined. Nowadays the average of
number of events per hour is 1.5.
For the Location-Based applications, an analysis is done to check the usability of the data of Mezuro
for the applications. For the combinations of accuracy and update frequency, see table 3.2, an analysis
is made. The average number of events per hour is 3 million, see figure 4.1, and the average number
of events for one day is 85 million. During night a lower number of events is found. Because most of
the applications are required during the day and because during the first hour of the morning peak the
number of events is 3 million events per hour, the average number of events per hour during the day is
also used as average number for the update frequency. In the night the number of events is too low to
use mobile phone data of Mezuro for the applications, while during day the available number of events is
in reality much higher.
Table 4.3: Available number of events for different update frequencies
Update frequency
1 - 15 minutes
15 - 60 minutes
>60 minutes
Used update frequency
(whole number)
1 minute
15 minutes
60 minutes
Available number of events
0.05 million
0.75 million
3 million
For the different accuracies the number of areas in the Netherlands is determined, calculated by taking
the area where the MS could be divided by the land surface. In total the Netherlands comprehends
approximately 34,000 square kilometres of land1 . With the update frequency and the total number of
events per hour per group a required number of events and the number of available events are found, see
table 4.4. This number of areas is needed to provide the required number of events (N).
1 http://www.landen.nl/wereld/europa/Nederland
34
Mobile phone location data of Mezuro
Location-Based Applications using mobile phone location data
Table 4.4: Required number of events in the Netherlands for different accuracies
Accuracy
10 - 100 m
100 - 1000 m
> 1000 m
Used accuracy (a)
(whole number)
10 m
100 m
1000 m
Area (A = π ∗ a2 )
Required number of events (N =
0.00031 km2
0.031 km2
3.1 km2
≈ 100 million
≈ 1 million
≈ 0.01 million
34,000
)
π∗a2
This basic analysis has some discussion points. First the total land area is the total area of the Netherlands.
However most of the surface is being used for agricultural purposes. Therefore important areas like cities
and transport areas have in total a smaller area than the total land area. Only 8 %2 of the Netherlands is
cultivated. So if the required number of events is thereby approximately 10 times decreased, the number
of events is respectively 10, 0.1 and 0.001 million. Because more people live in the urban areas most events
will occur in these places.
Looking at the results of table 4.5, three out of six groups have a sufficient number of available events to
create the application with data of Mezuro. The applications of the first two groups are impossible to
create with the data of Mezuro. For class number 3 a question mark is raised. With a deficit of 0.25 million
events the applications cannot be created with the data. However, with the use of location techniques
and smart algorithms the possibility exists that applications of the group can be created. Also, as said the
used numbers are for the whole area in the Netherlands and not for the cultivated areas, which causes
a decrease in the required number of events. This analysis provides a rough estimation of the usability
therefore it depends on the application if the data of Mezuro is useful to apply for the application. From
table 3.3 the applications can be combined with table 4.5. ’Crowd control’ is one of the applications which
can definitely not be developed with the dataset of Mezuro. The same applies for the application ’incident
detection’, belonging to class number 3. As said, for class number 4 the usability is unknown and therefore
the application ’travel time information’ will be developed in the next part. When creating the application
of class number 4, the classification will be validated and the limits of the dataset of Mezuro will be
used. The applications for which the dataset is certainly possible are, for example, the applications ’public
transport lines optimization’, ’evacuation management’ and ’transportation planning’. In this analysis
the fact that the dataset of Mezuro is generated once a day, is not used. All applications which meet the
classification of the right accuracy and update frequency, are named.
Table 4.5: Usability applications per class
Class
number
1
2
3
4
5
6
7
4.3
Accuracy
(metres)
10 - 100 m
10 - 100 m
100 - 1000 m
100 - 1000 m
100 - 1000 m
>1000 m
>1000 m
Update freq.
(minutes)
1 - 15 m
15 - 60 m
1 - 15 m
15 - 60 m
> 60 m
15 - 60 m
> 60 m
Required number
of events
100 million
100 million
1 million
1 million
1 million
0.01 million
0.01 million
Available number
of events
0.01 million
0.05 million
0.05 million
0.75 million
3 million
0.75 million
3 million
Possibility of usage
of Mezuro data
7
7
7
?
3
3
3
Conclusion
Looking at the results of table 4.5, three out of six groups have sufficient number of available events
to create applications in the class with data of Mezuro. The applications in the first two classes are
impossible to create with the data of Mezuro due to the update frequency. The required number of events
2 http://www.cbs.nl/nl-NL/menu/themas/dossiers/nederland-regionaal/publicaties/artikelen/archief/2011/2011-3433-wm.htm
35
Mobile phone location data of Mezuro
Location-Based Applications using mobile phone location data
is much higher than available number of events. For class number 4 a question mark is raised. With a
deficit of 0.25 million events the applications cannot be created with the mobile phone data of Mezuro.
However, with the use of advanced location techniques and smart algorithms the possibility exists to
create applications of the class. The 25% shortage of the group could thus be improved with the advanced
location techniques. Also, during the day the available number of events doubles compared with the
earlier mentioned 3 million events.
This analysis provides a rough estimation of the usability and therefore it depends on the application if
the data of Mezuro is useful to apply for the application in class number 4. Applications like ’evacuation
management’ belong to the applications which could be developed with mobile phone data of Mezuro. To
test whether the data of Mezuro is possible to use in applications or not, the boundaries of the possibility
to use the data in the application have to be pushed. Applications of class number 4 should be tested to
check if mobile phone location data can be used to develop the application.
In the analysis of the usability of the dataset of Mezuro, the time delay is not taken into account
when saying whether the data are usable or not for the applications. So, some of the applications can be
created with the dataset while other applications have not the ability to be developed with mobile phone
location data of Mezuro because the data need to be generated real-time or with a very short delay.
36
5
Conclusions and recommendations
Part I
In this part the usability of mobile phone location data for Location-Based Applications has been investigated. After a literature overview a classification is created and tested with the mobile phone data of
Mezuro. In this chapter the conclusion and recommendations for Part I are given. The conclusion and
recommendations will be used in the next part developing one of the applications.
5.1
Answer main research question
To what extent are mobile phone location data useful for Location-Based Applications?
In this part a classification is proposed determining the usability of mobile phone location data for
applications. In literature no dataset is classified to determine the usability for applications. The dataset
of Mezuro is tested and the dataset can be useful for applications with an location accuracy of more than
100 meters and update frequencies of events of more than 15 minutes. The applications which belong to
these accuracy and update frequencies are for example, ’travel time information’, ’public transport lines
optimization’, ’evacuation management’ and ’transportation planning’. See table 3.3 for the explanation
of the applications.
5.2
Conclusions
Many types of data and location technologies exist and this is causing variability in the amount of events
and location accuracy. This is also visible in the applications’ literature review where all types of data and
types of location accuracy are mentioned. Besides the location accuracy, the network topology causes difficulties in creating applications. At places in the Netherlands it is theoretically possible to connect with tens
of cells. It is unknown which cell will be connected at a particular area. Also, the cell ranges are theoretical,
but the practical coverage is unknown. With the advanced location methods, for example using Voronoi
diagrams, the area where a mobile phone could be, is decreased, which provides a better location accuracy.
The advanced location methods can be useful when developing an application in the next part of this thesis.
A classification is proposed in order to determine the usability of mobile phone location data for LocationBased Applications. In literature no uniform classification is found. Six classes are created using different
update frequencies and different accuracies. For each class an example application is given. Mobile phone
datasets need to be tested on usability for each class, in order to make the classification useful to compare
between different datasets. The classification cannot be created for each mobile phone data type but
needs to be done for each dataset. The datasets from the same data type have geographical differences
in the telecommunications network and therefore the types cannot be generalized when the data are
classified.
The mobile phone dataset of Mezuro, available for this research, has as localization technology Cell
ID. Cell ID has a poor location accuracy. Therefore some of the classes from the applications cannot
37
Conclusions and recommendations Part I Location-Based Applications using mobile phone location data
be created with the mobile phone data of Mezuro. However this is not only caused by the location
accuracy, but also because the low amount of events per user. Making the mobile phone data anonymous
is necessary due to privacy reasons. Privacy of people is a key issue nowadays and with the advanced
technologies it becomes difficult to stay undetected when data become publicly available1 . In literature no
studies are found using not-anonymous large mobile phone datasets. The data of Mezuro are anonymized
and a result of a query has to have more than 15 different Hashed ID’s , it is impossible to create such
applications. The main advantage of the data of Mezuro is the long period between changing identification
numbers. After thirty days the Hashed ID is changed, so over a longer period the mobility can be detected.
The classification is tested with the mobile phone data of Mezuro. The mobile phone data of Mezuro can
be used for four classes. The analysis done in this study offers no guarantee that applications could be
created, pilot studies need to be done to verify the possibility of creating the application. To conclude
more detailed datasets in the field of mobile phone data exist, compared to the data from Mezuro. These
datasets have the possibility to use in the other not creatable Location-Based Applications but is depending
on the amount of updates, location technology and privacy issues.
To test the usability of mobile phone data of Mezuro, an application will be chosen which is on the
edge of what is possible according to the classification. The applications in the class with an accuracy of
less than 100 meters and update frequencies of events of less than 15 minutes is unable to be developed
with mobile phone data of Mezuro. However with the advanced location methods, an opportunity exists
that the applications can be developed and therefore one of the applications, for example ’travel time
information’, of this class is chosen to be developed in this study. Also one of the goals of the company
Mezuro, to see whether it is possible to define if the user is moving and next to determine the mode of a
user, is one of the aims of the application ’travel time information’.
5.3
Recommendations for further research
Development of an application
The application ’travel time information’ is chosen to be investigated further. The application is on the
edge of what is feasible according to the analysis done in this chapter and therefore useful to check if
the analysis is correct executed. No other literature has been found using the same type mobile phone
data computing travel time. Developing the application with mobile phone data from Mezuro will be a
challenge and the results indicate the usability of this type of mobile phone data for the application. This
application could be researched along one route and could be validated easily. The application could be
extended on the same way at other routes, validating the method.
Mobile phone data type
An improvement in the data, by using another type of mobile phone data, will lead to an improvement in
location accuracy and/or amount of average events per user. With such an improvement the rejected
applications will be easier to be created. More research in the network topology is needed to increase the
location accuracy. The network topology in the Netherlands is not investigated and the practical topology
is not known.
Analysis to test the usability of the dataset of Mezuro
The analysis to test the usability of the dataset of Mezuro, performed in Chapter 4, can be improved using
more detailed numbers. The analysis does not taken into account the difference between rural areas and
populated areas. The accuracy is not explained with real data and could be calculated in more detail. This
could be causing an impossibility of developing the application with the dataset of Mezuro in the next
part, however the aim was to develop an application which was on the edge of what is possible.
1 http://www.telegraaf.nl/digitaal/21425187/__Privacy_verdwenen__.html
38
Conclusions and recommendations Part I Location-Based Applications using mobile phone location data
Datasets
More datasets exist in the world using mobile phone location data. To create a comparison for the different
types of mobile phone data, the usability needs to be checked according to the same classes described in
Section 3.2. The classes give an overview of the usability for applications and then for different datasets
and therefore different mobile phone data types, a comparison can be created.
Real-time data collection
In this study the fact that a dataset is generated real-time or generated each day, which is the fact at the
dataset of Mezuro, is not taken into account in the classification. When performing more research, the
possibility to generate data real-time and the available data when the mobile phone data are generated
real-time have to be examined.
39
Part II
Estimation of travel times using the dataset of Mezuro
6
Setup of the application
In this chapter the framework to create the application ’travel time information’ using mobile phone
data will be explained. First, a recap of the previous part and the research questions are given. Next, the
definition and purpose of the application will be explained. Then, the application framework with the
boundaries of the study is described.
6.1
Introduction
In the first part the usability of the dataset of Mezuro for the development of applications is described.
Applications with an accuracy of more than 100 meters and update frequencies of events of more than 15
minutes can be developed with mobile phone location data of Mezuro. To test whether the classification
is created correct for the dataset of Mezuro, the development of an application is chosen which is on the
edge for what is possible with the dataset. Therefore the application ’Travel time information’ has been
chosen to be developed.
The main research question for Part II was:
To what extent the mobile phone dataset of Mezuro is able to create the application ’Travel time information’?
For Part II the following sub questions were identified, in the same order as described in the report:
- How can the application be created using the dataset of Mezuro?
- Do the practical network characteristics meet the theoretical, given, network characteristics?
- How do the network characteristics affect the distance between the GPS position to the cell site?
- Which advanced location technique has the smallest location error when it is used to map the location of
the user on a road or train track?
- How accurate the mode can be detected for one event or for the amount of events for the average frequency
of calling, texting and data use?
- What is the accuracy of the determined mapping positions, when mapping the positions on to roads or rail
tracks?
6.2
Travel time types
The purpose of the application will be to inform users about the travel time with the measured travel
times of the probe vehicles using a mobile phone, described in the next subsection. Travel times can
be distinguished in two types, estimated and predicted travel times, see figure 6.1. For the estimated
travel times, a distinction can be made in instantaneous, short-term and long-term prediction. In case of
a long-term prediction the uncertainty in the predicted travel time grows, while the uncertainty of an
instantaneous travel time is smaller. An application informing estimated travel time is already developed
with other data sources. Mobile phone data give an opportunity to improve the application by adding the
future situation on the road with knowledge about daily home-work trips.
41
Setup of the application
Location-Based Applications using mobile phone location data
Figure 6.1: Difference in travel time estimation and prediction [Lin, 2009]
6.2.1
Travel time estimation
The application ’travel time information’ will inform users of the application about the estimated travel
time. This can be done with mobile phone applications, but also for example with message signs along the
road or at stations. Estimated travel time means that the travel time over the route is computed with the
known traffic states. The travel time for users is most likely to be different than the information device
shows. In the time that these drivers are heading for their destination, the flow on the road could change
which could lead to extra delays. Travel times can be collected in various ways. One of the Dutch methods
of calculating travel times is the loop detector [Van Lint, 2010]. Other possibilities of determining travel
time are floating car data, for example with GPS-enabled devices and bluetooth devices [Haghani et al.,
2010].
6.2.2
From estimation to prediction
The final purpose of the application is to inform the expected travel time. In figure 6.1 the different
predicted travel times are shown. Besides the instantaneous travel time, a short-term and long-term
prediction can be made. Nowadays most applications show already the instantaneous travel time and
especially the future improvements described in the next section will distinguish the application developed
with a mobile phone dataset from the other applications. One of the improvements of the application to
inform the travel time is to inform the real travel time so a prediction on a short-term or long-term base.
With mobile phone data for each user the daily travel patterns and home locations can be determined.
A prediction can be made of the mode and departure time based on daily repeating patterns. With the
known future number of users, the future travel time can be determined with mobile phone data. When
using daily trips, changes in the behaviour, for example route change, could cause a wrong flow. The
results of the changes in the behaviour need to be recorded and taken into account in the development of
the application.
6.3
Applications’ framework
Figure 6.2 shows the framework in order to create the application ’travel time information’ for the current
purpose with mobile phone data. The main input is the event and network information. Events are created
42
Setup of the application
Location-Based Applications using mobile phone location data
when a user makes or receives a call or text message or makes a data connection. The event information
can be coupled with network information. The network information exists of information about the
location of cell sites, the radius of cells and cell areas. The red border is the boundary of the developed
part of the application in this study and the red relations are the used relations for the development of
that part. On an individual level the travel time will be determined first. All individual calculations can be
done inside the database and the travel times results are aggregated so always for more than fifteen users
the travel time is averaged.
Event-level
On event-level all algorithms are based on one event of one individual user. With the location of the cells
of individual users, first the mode of the cell is determined. The mode will be determined with the use of
the location of the cell and the location of the road and train tracks. After the mode is determined for a
whole trip, explained in the next paragraph, all events can be mapped on roads and rail tracks. With the
known cell location a guess can be made where the user is positioned in the cell area.
Trip-level
Before algorithms, shown in the trip-level rectangle, can be created, first a trip has to be detected. The trip
detection tells whether the user is moving or not moving. When the user is moving, the user makes a trip.
The trip exists of a sequence of events. The trip detection is shown as light blue because the algorithm is
already created at Mezuro but not yet validated. With the trip detection and mode detection of one event,
the mode can be determined of a whole trip. This can be done by taking all determined modes for all
events and choose the mode which is most likely the mode based on all determined modes. The following
step is to combine the mapping points for each individual event and determined mode for detecting the
covered route for the trips of users. The route will exist of different roads or public transport lines.
Travel time
With the known routes and mapping positions, for each predetermined trajectory of road or public
transport line, the average travel time of an individual can be calculated. The route will define the used
trajectories and the mapping points combined with the time between the points will determine the travel
time. Combining events and the matching mapping points will lead to an average travel time for each
trajectory. The travel time of individual users can be grouped and be averaged for each trajectory. The
grouping of different users is giving a more reliable travel time and for most of the mobile phone datasets
it is obligatory to respect the privacy of users. The grouping of users will be done for each trajectory but
when it is unsure a user is travelling at that trajectory, criteria can be made to not use that user. The last
oval in the picture gives the output of the application.
43
Setup of the application
Location-Based Applications using mobile phone location data
Figure 6.2: Applications’ framework using mobile phone data, in red the research scope
6.3.1
Application boundaries
Not the entire application will be developed in this study. Besides the partial development of the
application, the application is unable to give predicted travel times. Therefore only the estimated travel
times are computed.
Partial development
The application is not completely created. The application will be partially developed, the scope is shown
in figure 6.2. The red dotted line shows the boundary of the application. So the first part of the application
will be developed, creating the mapping positions of users and detecting the mode. Only a part of the
application is created because the development of the first steps can be fitted in this thesis while the
whole development of the application is too large for this thesis. The relations which are used in this
44
Setup of the application
Location-Based Applications using mobile phone location data
study are also red coloured. The black line from the trip detection to the mode detection at trip-level
visualizes that the trip detection will not be developed. The mode detection for one trip of an individual
user is developed with some assumptions to replace the trip detection. A time and spatial assumption will
detect when a user is making a trip or not. The assumptions are given in the particular section which will
be explained in Chapter 10.
Estimation of travel times
The travel time which will be computed in this report, is the estimated travel time. The dataset of Mezuro
is not able to give real-time travel times because the data is generated each day. Therefore the predicted
travel time cannot be determined. Also, in the dataset of Mezuro the Hashed ID is changed every month.
So in the first weeks of the month the mobility patterns and home and work location is unknown and
need to be computed again. A solution of this problem has not been found yet.
6.4
Conclusion
In this chapter the application ’travel time information’ is explained and defined in a framework. The
purpose of the application is to inform travellers with the estimated travel time. Existing techniques can
already determine the estimated travel time and the added value of an application with mobile phone
location data is uncertain. However it could have an added value when the application can predict the
real travel-time, based on the weekly mobility patterns and home and work location. Much research is
needed to develop such application. A predication cannot be made due to that the data are generated
each day and not real-time.
In this study the application will only be partly developed using the framework. The application will
be developed until mapping the positions. This implies that on an individual-level the mode will be
determined and the positions will be mapped onto the road or rail track. On trip-level the mode will be
determined. The proposed framework gives a good overview of how the application could be developed
with mobile phone location data. This framework can be extended when travel times could be predicted.
For this study a case-study is done to develop the first steps of the application, explained in the next
chapter.
45
7
Experimental setup
In this chapter, the experimental setup will be explained, including an explanation of the case-study and
data collection. From the framework in the previous chapter, a case-study is set up. The requirements,
required data and factors influencing travel time will be eplained. After that the procedures of the data
collection and the most remarkable findings of the data collection will be presented.
7.1
Research approach
The first part of the application will be developed in three stages as visualized in figure 6.2. The first stage
is the detection of a mode for one event, on the basis of the location of the cell site and other advanced
location methods, described in Chapter 2.2. Second the mode of a whole trip of one individual user will
be determined. From all events during the trip the mode is determined and the number of events for
the modes decides the mode of the trip. The last stage developed in this report, is the mapping of the
positions. With the predetermined mode and the advanced location methods, a location is defined for
each cell where the user has to be on the road or public transport line. In further research the application
can be developed completely and then, the travel times can be computed with mobile phone location
data.
7.2
Case-study
For the estimation of the travel time, a case-study is used. A case-study is performed for an in-depth
examination for one individual to test whether it is possible to create the application ’travel time information’. The case-study seeks to ’a comprehensive understanding’ of the development of the application
[Fidel, 1984]. But on the other side the development of the application gives general results of the phenomena mobile phone location data. Case-studies are not strictly planned and this freedom performing a
case-study is a useful attribute. In this section, the requirements, factors influencing the travel time and
required data are explained.
7.2.1
Requirements case-study
The requirements of the case-study are created with the proposed research questions and the framework
described in figure 6.2. The requirements are split up in three categories: route, required information and
data collection days.
Route
First of all a route has to be chosen which is for a long-distance. A rail and road connection have to
connect the cities and both urban as rural areas have to be situated along the routes. For a road and rail
connection are chosen because then for the two common long-distance modes the distinction can be
made whether the user is a train user or car driver. The aim to detect the train passengers is that the
different users can be determined and the travel time is computed only for car users. Another requirement
for the case-study based on the framework in figure 6.2, is that one corridor will be used. Because the
46
Experimental setup
Location-Based Applications using mobile phone location data
development of the route detection is for this study not yet included, a combination of different corridors
cannot be detected.
Required information
The next requirement in the case-study is the possibility to imitate the average frequency of calling,
texting and data usage. With the average frequency of calling, texting and data usage, a statement can
be done about the results of the different parts of the application. The telecommunications network
information will provide the connection with the events and the spatial position of the user. For example,
the location of the cell site should be compared with the real location. Derived from the research questions,
a comparison has to be made between the theoretical and practical network characteristics . So, the
position of the user has to be captured to compare the real position with the position of the cell site.
Besides the determination of the average distance from the real position to the cell site, asked in the
research questions, the value of the advanced location techniques, described in section 2.4, can be tested.
Data collection days
A test collection has been done in order to have an idea of the variability of the distance between the cell
site and real location. With the test collection the number of events could be determined. The distance
between the cell tower and real location is made normative to decide the number of trips, an average
distance of 794 meters is found.
n ≥ Z 2 /d 2 ∗ σ 2
(7.1)
With Z = 1.645. σ is the standard deviation, for the distance from the cell site to the real location a
standard deviation of 760 is found. d is the margin, here set to 200. The margin is chosen in such way that
the margin plus the average distance from the cell site to the real location for each route does not exceed
1000 meters. With these numbers, the number of required sample size is 39. Because the frequency for
calling is the lowest, this type of event is normative. The minimum trip length is 70 minutes, so with an
average frequency of one per four minutes and the amount of events per call is two, the number of events
is 39/(2*70/4)= 1.11. So in total two trips need to be done in order to have sufficient data. Because the
data collection and processing of the data collection is a time-consuming process, the number of data
collection days has not been increased.
7.2.2
Required data for the case-study
With the requirements, framework and the factors influencing the travel time, the required data is defined.
Besides the events in the dataset of Mezuro, also the real position has to be captured, see table 7.1. At last
from the factors influencing the travel time, the time of the day has to be determined.
Table 7.1: Required data
SMS events
Call events
Data events
Position
Time of the day
Telecommunications network information
7.2.3
Factors influencing travel time
Figure 7.1 shows the factors influencing the travel time on the road. Three categories are distinguished,
factors influencing the data collection, factors which not influence the data and factors explicitly used
for the data collection. The last category exists of one factor, fluctuations in normal traffic. During the
day, different trips will be created for the predefined route. So fluctuations of in normal traffic during the
day will be taken into account. However the daily fluctuations will be not taken into account, because
47
Experimental setup
Location-Based Applications using mobile phone location data
the number of needed data collection days is less than the number of working days for each week. The
factors influencing the data collection such as weather are not manipulated or changed to have during
the case-study all conditions the same as in reality.
Figure 7.1: Factors influencing travel time on the road [Systematics, 2005]
7.3
Elaboration case-study
From the requirements and the needed data, the data collection is explained. First, the route is determined
and with the known route the data collection is created. The observations of the data collection is
described in the last section.
7.3.1
Route
Two common modes, train and car, are chosen to detect the difference in modes in case the different
transport types are next to each other. The corridor Amsterdam – Eindhoven is chosen as test route, an
intercity service and highway connect both cities. The routes run through rural areas as well as urban
areas. Both types of areas have different characteristics and in rural areas fewer cells cover the route
because the demand is less, than in urban areas. Four main cities are located along the route. Besides
Amsterdam and Eindhoven, the cities Utrecht and ‘s-Hertogenbosch are also located along the route. The
corridor has parts where the distance between the train tracks and roads is short (Amsterdam-Utrecht)
and where the distance is larger (Utrecht - Geldermalsen).
48
Experimental setup
Location-Based Applications using mobile phone location data
The train and car routes, see figure 7.2, are situated almost next to each other on the corridor AmsterdamUtrecht and ‘s Hertogenbosch - Eindhoven. Between Utrecht and ‘s-Hertogenbosch the two routes are
located further away. When detecting a road user and a train user, the cell area could overlap the train
track as well as the road. The larger the distance between the two routes, the easier the distinction
between the two routes can be made. However all events can overlap the train route as well as the car
route, then the user has to be assigned to one mode using the distance from the site to the train track
and road as decision measure.
Figure 7.2: The road (orange line) and train track (black-white dotted line) between Amsterdam(top) and
Eindhoven(bottom)
7.3.2
Data collection
The two most used long-distance modes are tested, namely car and train. From the analysis of the mobile
phone data, the amount of events per hour was found. During the morning peak the amount of events per
hour is low and increases till 18:00. During the data collection three periods are distinguished, morning
peak, during the day and evening peak. The three periods have a low, medium and high amount of events
respectively. However the morning peak and evening peak have a high amount of travellers and if the
amount of events is related to the number of travellers, so because of the lower amount of events in the
morning peak it should be harder to get results in comparison with the evening peak.
49
Experimental setup
Location-Based Applications using mobile phone location data
Tools
Three tools are used to collect data, namely the GPS tracker, mobile phones and the database of Mezuro
and are explained in this section. Two mobile phones are used in the data collection process to create the
mobile phone location data. The main mobile phone is the mobile phone with a subscription to create
SMS and call events. The prepaid mobile phone is used to receive SMS and calls from the phone with
subscription. No data were generated with the prepaid mobile phone. The GPS tracker is chosen because
it gives the real position with a high accuracy and with a high update frequency.
GPS tracker: GPS position
The GPS tracks are collected using a GPS tracker. The Qstarz Travel Recorder records the GPS location per
second when the device is moving. GPS is a very accurate and common positioning system which provides
location with an accuracy of three meters1 . However the GPS is not always available and sometimes the
GPS is on ‘drift’ 2 . This drift is caused by changing GPS satellite constellation patterns. Using multiple
devices, four devices are used, this problem could be solved. With multiple devices combined, an accurate
track can be found. At the GPS tracker and the mobile phone the exact time is received with the GPS
position from satellites.
Mobile phone: create SMS and call events
Via the Friends database the traffic of the mobile telephone is collected. In the Friends database individual
events can be collected to avoid the process of making the events anonymous and grouping process.
Compared to the Mezuro database, the Friends database has the same properties but is collected before
the anonymization. With the Friends database a comparison can be made with the position of the cell site
and the GPS position. SMS and call events are generated during the trip. In the table below the frequencies
are given. Both the SMS and the call are sent to a prepaid mobile phone. The prepaid mobile phone was
used to answer the call. For the SMS events, two events are generated: one event is generated when
sending and one is generated when the SMS is received on the prepaid mobile phone. When performing a
call, the number of events is two. Both the start as well as the ending of a call (after 25 seconds) created
two events. Data events could not be periodically generated because it is unknown when and how data
events are generated.
Frequencies
The frequencies for calling and texting can be found in table 7.2. The frequencies were chosen to have
sufficient amount of data but with a frequency which can be handled during the data collection. During
the data collection both frequencies are followed as much as possible.
Table 7.2: Frequencies of calling, texting and using data
Type
SMS
Call
7.3.3
Frequency
30 seconds
4 minutes (25 s call)
Number of phones
1 (sending) + 1 (receiving)
1 (sending) + 1 (receiving)
Number of events
2
2
Observations during the data collection
The data are collected during four days. The collecting days were Tuesday and Thursday. Tuesdays are
used for the car and Thursdays for the train. The travel time Amsterdam to Eindhoven back to Amsterdam
is approximately 80 minutes. With the car it is, without congestion, 75 minutes.
During the trips not everything went well. The most important findings are explained. The trips were
planned during two days. These two days were planned to collect all the data. For the car trips all data
1 http://www.qstarz.com/Products/GPS%20Products/BT-Q1000.html
2
https://sites.aces.edu/group/crops/precisionag/Publications/Timely%20Information/GPS-GNSS_Drift.pdf
50
Experimental setup
Location-Based Applications using mobile phone location data
were collected however during the train trips the GPS devices did not completely collect all trips. Three
out of six trips were not recorded totally or only partly, an explanation cannot be found. Therefore extra
days were used to collect all data. Trip 2 is collected at 5th December and due to the stormy weather and
the thereby caused interruptions, trips 5 and 6 are collected 9th December. In total four data collection
days are used.
Furthermore at the beginning of the collecting day of the car, Amsterdam could not be reached before the starting time. Therefore the start of trip 1 is Utrecht. After trip 6 another trip is made to collect
data at the route Amsterdam-Utrecht. So, the whole route is covered three times, but the different times
of the day have not all one return trip. Traffic jams created an extra delay during multiple trips, however
the data collection of the other trips were not influenced. Also during the train trips, the train connection
at Utrecht was already gone and therefore during two trips waiting times of 15 minutes are included.
Because the traffic jams and train delays simulate the real-life, all events were used.
The train trips were collected during three data collection days. Both the GPS devices as well as the
mobile phone did not connect with the sites and satellites easily. There was a difference when calling
the prepaid mobile phone in the car and train. In the train sometimes the connection was established
very late or was stopped directly. In some cases the connection could not be made at the start of the
connection and the provider was sending a spoken message that a connection could not be established.
The failed connections cannot be explained. At last the prepaid mobile phone turned suddenly off during
the trips. On average it happened one time during the trip. However, failed calls simulates the real-life
and therefore no changes in the data were made.
Text and call events were all found in the database of Mezuro. Data events were not often found in
the database and the time between the data events was irregular. Also some data events were found in
the database at a different time than in reality. The SMS events with the same cell had a different time
than some of the data events. Therefore no data events will be used in the analysis. Unfortunately, more
data events are generated than text and call events in the mobile phone data of Mezuro, see Appendix B.
Text messages (SMS) were automatically sent to the prepaid phone using a mobile phone application. The
SMS events were imported in the app and therefore a predetermined amount of events were generated.
Due to the delay in the car because of the traffic jams at track number 2 and 7 at the end events were
imported manually. Due to the manual import function the amount of SMS events per minute was only
one. The seconds could not be set while with the predetermined events seconds could be set. Therefore
the number of events differs for each trip but this does not influence the continuation of the research.
In table 7.3 the results of the data collection are shown for the different trip numbers and modes. The
amount of delay at the trips with mode car is determined by comparing the length of the actual trip with
the duration found in the route planner. Furthermore the trips are named with manually set SMS events.
51
Experimental setup
Location-Based Applications using mobile phone location data
Table 7.3: Details of the made trips
Date
Trip number
Route
Mode
Details
GPS
devices
26-nov
1
2
Utrecht - Eindhoven
Eindhoven – Amsterdam
Car (C)
Car (C)
+/- 35 minutes delay
+/- 40 minutes delay,
manually set SMS events
2
3
3
4
5
6
7 (extra)
1
2
3
4
5
6
2 (substitute)
5 (substitute)
6 (substitute)
Amsterdam – Eindhoven
Eindhoven – Amsterdam
Amsterdam – Eindhoven
Eindhoven – Amsterdam
Amsterdam - Utrecht
Amsterdam – Eindhoven
Eindhoven – Amsterdam
Amsterdam – Eindhoven
Eindhoven – Amsterdam
Amsterdam – Eindhoven
Eindhoven – Amsterdam
Eindhoven – Amsterdam
Amsterdam – Eindhoven
Eindhoven – Amsterdam
Car (C)
Car (C)
Car (C)
Car (C)
Car (C)
Train (T)
Train (T)
Train (T)
Train (T)
Train (T)
Train (T)
Train (T)
Train (T)
Train (T)
28-nov
5-dec
9-dec
7.4
+/- 20 minutes
Manually set SMS events
+/- 15 minutes (transfer)
+/- 15 minutes (transfer)
3
3
3
3
3
2
0
3
1
0
0
4
3
4
Data sources description
In this section the data types and the method of combining the different data types will be explained.
After the data collection all data from the different devices were collected and processed. The last section
in this chapter will explain the creation of the combination of the data types.
7.4.1
Explanation of the data types
All data types will be explained separately. Friends data, cell data and Teradata are originated from the
mobile phone database of Mezuro. OpenStreetMap (OSM) data and Global Positioning System (GPS)
data use both another type of data source. The data types are linked with the tools described in the
previous chapter, see table 7.4.
Table 7.4: From tools to data types
Data type
OSM data
GPS data
Friends data
Network data
Teradata
Used tool (explained in section 7.3.2)
GPS tracker
Mobile phone
-
OSM data
OpenStreetMap (OSM) data provide the spatial train and car track for the route between Amsterdam and
Eindhoven. Figure 7.2 shows both the train as well the car track from the OSM database. OSM data are
published under a Open Database License. Besides the tracks between Amsterdam and Eindhoven the
OSM data are not further used. The OSM data are not only for visualizing the data but also for combining
52
Experimental setup
Location-Based Applications using mobile phone location data
the train track and road with other spatial elements such as the cell site and GPS postion. The train track
and road are the benchmark for the two routes. The data can be imported into Postgres, the database
program.
GPS data
The GPS devices created points with multiple properties, as described in chapter 7. The availability of the
devices alternated between the trips. Smoothing methods are used to filter the data. Figure 7.3 shows all
GPS points generated with the GPS devices.
Figure 7.3: Raw GPS points from the GPS devices of all routes
53
Experimental setup
Location-Based Applications using mobile phone location data
Friends data
In the Friends database individual events can be collected to avoid the process of making the events
anonymous and grouping process. Therefore the events of the Friends database can be compared with
GPS points and can be used to validate and calibrate the algorithms. The mobile phone number of the
used mobile phone is found in the Friends data. The mode and trip number are added to combine the
different data types easily. Appendix B explains the other variables.
Network data
The cell data contain basic data about the type of cell, the radius of the cell and the direction of transmitting.
Also it contains spatial data, like the location of the cell and the cell polygon. Furthermore to each cell,
if possible, the Voronoi diagram and BSCM (Best Serving Cell Map) is linked. The five spatial elements
which will be used are the BSCM, the Voronoi diagram, the cell area, the centroid of the cell area and cell
site. Almost all data is extracted from the database of Mezuro, only the Voronoi diagrams are created
using Qgis, an open source geographic information system. The BSCM is created by the telecom provider.
Cells, especially the radius of the cells, are changing daily. Therefore the network data is changing every
day and is related to the date of the data collection. The Voronoi diagram and BSCM have the same data
for each data collection date. Unfortunately both spatial elements are not available for each cell. The
availability of the basic data of the network data is 100%. All cells, seen during the trips, can be found in
the database. Appendix D contains the availability for the Voronoi diagram and the BSCM. The network
data will be used for increasing the accuracy of an MS.
Data of Mezuro
The variables of the mobile phone data of Mezuro are the same of the Friends data, except one: the
Friend telephone number is replaced by the Hashed ID. More information about the mobile phone data of
Mezuro, which is the same, can be found in Appendix B.
7.4.2
Combining data types
The combination of the data types is done in Postgres, a database management system. Database
management systems are able to process queries fast. Postgres has been chosen because the program is
open source but other systems operate in the same way. Figure 7.4 shows the combination of the different
data types. In the simplified version of the database the cell information, the table cell_polygons, is the
main table which connects all other tables. The results of combining the data types are shown in the next
chapter.
54
Experimental setup
Location-Based Applications using mobile phone location data
Figure 7.4: Simplified database model with the database relations when combining the different variables
7.5
Conclusion
In this chapter the data collection is explained and the mobile phone data is compared with the GPS
positions. Analyses are done to give a better understanding of the mobile phone data of Mezuro. Besides
that the results can be used for further studies, the conclusion of the best spatial elements, Voronoi
diagrams, BSCM and cell areas, will be used in the next chapters.
The data collection was not always a success. The car trips were as expected, however traffic jams
caused delays but only the first trip had to be changed. In the other trips enough data were collected
and the GPS devices worked according to plan. During the train trips however, the GPS devices failed to
record the GPS position during some trips. Therefore extra trips were made at other days. During the
other days, two trips had a missed connection in Utrecht. Also some calls had a failed connection with
the other mobile phone. However at the train trips enough data were collected, the details of the data
will be explained in the next chapter.
55
8
Data analyses
In this chapter first the GPS data are combined. Then, basic results of the combined data sources will be
explained to create a better understanding of the value of the mobile phone data from Mezuro to answer
the research questions. Also analyses are done in order to find the best spatial elements for the mapping
method and the mode algorithm. The data of Mezuro have not yet been compared with GPS points and
therefore these general results will test if the theoretical, given, network information meets the practical
network information and could be used for further research.
8.1
Data preparation
In this section the GPS data are combined to one stretch for each trip and the Friends data are analysed
on usefulness.
GPS data
The GPS data have, according to the manufacturer, an accuracy of a few meters. However in stations and
tunnels, the accuracy does not achieve this value. Using multiple GPS devices, the GPS points can be
combined to one track. The GPS position is combined with factors changing when the speed or altitude
has an unrealistic value. In both modes the practical speed has not exceed 140 km/h. To have an error
margin, the unrealistic speed limit is set to 150 km/h.
From the AHN Viewer1 the altitude map of the Netherlands is available. From this site the maximum and
minimum altitude is found. The maximum altitude is 30 metres and the minimum altitude is -7 meters.
An extra buffer of 30 metres is used. The lower limit is set to -40 meters and the upper limit 60 meters.
However after performing the first analyses, a lot of GPS points have an altitude between 60 and 100
meters. Most of the GPS points do not have an abnormal visual deviation. Therefore the upper limit is
raised to 90 meters. See equation 8.1 for the mathematical expression.
100 if –40 < zi < 90 and νi < 150
Fi,t =
(8.1)
1
if zi < –40 or zi > 90 or νi > 150
with Fi,t = weight (used in equation 8.2), z = altitude in meters and ν = speed in km/h
Equation 8.2 shows the combining of GPS devices. The sensitivity analysis is shown in Appendix C. From
this analysis the results show that it is better to not use the unreal values. However during some trips too
many GPS points were deleted and therefore these points are multiplied with a very small weight, one,
compared to the other values with a factor of hundred.
1 ahn.geodan.nl/ahn
56
Data analyses
Location-Based Applications using mobile phone location data
n
P
(x, y)i,t ∗ Fi,t
(x, y)t =
i=1
n
P
(8.2)
Fi,t
i=1
with i = 1 : n devices, time t and Fi,t shown in equation 8.1
The amount of combined GPS points is shown in table 8.1. The corresponding trip details, which can be
found in table 7.3, explain the numbers.
Table 8.1: Amount of combined GPS points for each trip
Trip
number
1
Mode
C
Number of
GPS points
7813
Number of
working devices
2
Number of combined
GPS points
3913
2
3
4
5
6
7
1
2
3
4
5
6
C
C
C
C
C
C
T
T
T
T
T
T
14403
12284
11579
12678
15534
3468
7847
18053
12443
4442
16025
21329
3
3
3
3
3
3
2
4
3
2
4
4
5647
4096
3866
4232
5197
1157
4156
5217
4395
4032
4411
5359
Trip
details
Utrecht - Eindhoven,
delay
Delay
Delay
Amsterdam - Utrecht
Delay
Delay
Smoothing GPS data
When combining GPS devices at time tn , the GPS points between time tn+2 to tn–2 are used to create a
smooth course of the GPS track, see equation 8.4. The time between adjacent events is leading for the
weight of that position. In Appendix C the sensitivity is shown for different factors for the different time
values and GPS points, the result shows an equality in the different time factors.
a
=
1
tn – tn–2
b
=
1
tn – tn–1
c
=
1
tn+1 – tn
d
=
1
tn+2 – tn
(8.3)
(x, y)n–2 · a + (x, y)n–1 · b + (x, y)n + (x, y)n+1 · c + (x, y)n+2 · d
(8.4)
a+b+1+c+d
After determining the average speed between a GPS point and the adjacent GPS point, GPS points with
an unreal speed were deleted. The unreal speed is here set to 162 km/h as a threshold value. Another
speed of 180 km/h was used, however after a sensitivity analysis, which can be found in Appendix C, it
turns out that a limit of 162 km/h performs better. This filtering method is used to take out all values
which are physically untrue, see equation 8.5. The equation is repeated five times to filter the data. The
(x, y)n =
57
Data analyses
Location-Based Applications using mobile phone location data
data are five times filtered because after one iteration the data could have unreal speeds again. After five
iterations no unreal speeds between two points were found. After this step only 0.22 % of the GPS points
were deleted however the values which were deleted, were visually incorrect.
|(x, y)n+1 – (x, y)n |
< 162km/h
(t(n+1) – tn )
(8.5)
During the car trips the GPS device does not collect data when the speed is lower than 5 km/h. During the
train trips the GPS data were collected at all speeds. So in a jam and at train stations no GPS traces are
recorded. However events were created in the jam or at stations. Therefore the not existing GPS points
are interpolated from other GPS points. With two conditions these points are interpolated. First the
distance between the two GPS points before and after the event is at maximum 20 meters. Furthermore
the speed of the two GPS points has to be 30 km/h or lower. If the two conditions are fulfilled, the two
GPS points are interpolated.
After the described smoothing techniques, the GPS points were set as final. Looking at Utrecht Central
Station (train) and the Leidsche Rijntunnel (car), both near Utrecht, the raw GPS traces and the final GPS
traces were plotted, see figure 8.1. At the station all GPS traces were smoothed to the right location and
for the GPS points of the car, most of the GPS points were already near the correct position.
Figure 8.1: Raw GPS traces (orange) and smoothed GPS traces (blue, overlying) in Utrecht. On the left
GPS traces of the car are plotted and on the right side GPS traces of the train are plotted.
Friends data
The number of events is shown in table 8.2. Only the SMS events and call events will be used. The data
events were not evenly distributed and some of the data events appeared with a delay in the Mezuro
database. The delay is probably caused due to the large amount of call and SMS events, however the real
problem cannot be identified. The data events will not be used due to both reasons. The desired amount
of SMS events per minute is four while the desired amount of call events per minute is 0.5.
The amount of events differs over the day for the different modes. With the car the travel time is
different due to the traffic conditions. As said, during the second track traffic jams occur at different
places. In the train the course should be the same because the travel times are equal, however some of
the trips had a delay of 15 minutes. Beside track 7, the amount of SMS events is higher than 200 and the
amount of call events is always higher than 30. The different amount of events does not imply that events
are not necessary or have to be cut. All events can be used if the connected GPS position is available.
58
Data analyses
Location-Based Applications using mobile phone location data
Table 8.2: Number of events for each trip
Trip
number
C
Mode
C
C
C
C
C
C
T
T
T
T
T
T
8.2
1
SMS
events
298
SMS events
per minute
3.97
Call
events
42
Call events
per minute
0.56
2
3
4
5
6
7
1
2
3
4
5
6
386
279
276
286
356
58
312
344
329
320
330
212
3.51
3.99
3.68
3.97
4.00
2.76
3.94
3.51
3.92
3.76
3.88
2.28
62
36
36
33
46
16
51
42
48
44
50
40
0.56
0.51
0.48
0.46
0.52
0.76
0.65
0.43
0.57
0.52
0.59
0.43
Trip details
Only Utrecht Eindhoven, delay
Delay
Delay
Only Amsterdam - Utrecht
Delay
Delay
General results of mobile phone data of Mezuro
In this section analyses describe the performance of mobile phone location data collected in the case-study.
A distinction is made between the theoretical versus practical network performance and analyses done
for the development of the application. All analyses and the types are described in table 8.3. The analyses
are derived from the research questions and framework described in the previous chapter.
First, the lost minutes are explained describing the performance and usefulness of the data collection.
Second, a subsection explains GPS points outside the cell area, determining the theoretical versus the
practical network performance. Then, the distance from the GPS position to the site is calculated, this
distance gives the accuracy of positioning a user with the mobile phone data, resulting in a statement for
the practical network performance. Next, different characteristics are described related to the distance
from the GPS location to the site. Subsequent the distance from the GPS position to other spatial elements
is determined, such as BSCM and the Voronoi diagram. The spatial elements are used for positioning users
and this analysis will be used for the application. At last the number of different cells and the number
of events per cell are defined, giving a view of when a user is connected with the same cell travelling
multiple times at a route. The last subsection gives a conclusion of the analyses.
Table 8.3: Explanation of the purpose of each analysis
Title section
Lost minutes
GPS outside cell area
Distance GPS position to cell site
related to radius
related to urban areas
related to event type
related to speed
Distance GPS postion to different spatial elements
Distinct cells
related to the same cell
related to the trip number
related to segments of the routes
59
Purpose of analysis
Development of the application
Theoretical versus practical network performance
Theoretical versus practical network performance
Theoretical versus practical network performance
Development of the application
Development of the application
Development of the application
Development of the application
Theoretical versus practical network performance
Development of the application
Development of the application
Data analyses
8.2.1
Location-Based Applications using mobile phone location data
Lost minutes
The availability of the different tools is expressed in the percentage of lost minutes. During the data
collection every minute multiple events and GPS positions should have been created. Unfortunately the
GPS tracker and mobile phone failed to make connections at some points. The availability of the tools
is split into GPS positions and events. From the network no information is available to calculate the
availability or lost minutes. For every track number and mode the lost minutes are determined. The
availability of the devices is shown in table 8.4 and table 8.5.
Table 8.4: Lost minutes for the mode car for each trip
Trip
number
1
2
3
4
5
6
7
All
Mode
C
C
C
C
C
C
C
C
Trip length
(minutes)
75
110
70
75
72
89
21
512
GPS availability
(minutes) %
69
92%
110
100%
70
100%
66
88%
72
100%
89
100%
21
100%
497
97%
Event availability
(minutes) %
75
100%
108
98%
70
100%
70
93%
72
100%
89
100%
21
100%
505
99%
Event and GPS availability
(minutes) %
68
91%
107
97%
69
99%
66
88%
72
100%
88
99%
20
95%
490
96%
Table 8.5: Lost minutes for the mode train for each trip
Trip
number
1
2
3
4
5
6
All
Mode
T
T
T
T
T
T
T
Trip length
(minutes)
79
98
84
85
85
93
524
GPS availability
(minutes) %
76
96%
92
94%
82
98%
72
85%
75
88%
91
98%
488
93%
Event availability
(minutes) %
79
100%
87
89%
82
98%
80
94%
84
99%
63
68%
475
91%
Event and GPS availability
(minutes) %
75
95%
87
89%
82
98%
71
84%
74
97%
63
68%
452
86%
Both the GPS points and friends events have a lower availability in the train. Track 6 in the train has a
low availability because the software was set up wrong. More explanations are given in Chapter 7.
The last column exists of the joining of GPS data and friends events. The joining of GPS positions
and events happens only on seconds. Events without GPS positions at the same time on second-level
are not used. After the joining on seconds, the amount of lost minutes is calculated, so minutes with
events which cannot be joined on second-level.Therefore some lost minutes occur not in the GPS data
and friends events but it occurs in the join, see example track number two and mode car.
8.2.2
GPS points outside cell area
When analysing the data, one of the encountered problems is that GPS points are outside the cell area.
When travelling multiple cells are passed and due to the high speed the connection with the cell could be
established in the cell area but the actual transmission outside the cell. It is also possible that the event is
saved in the Friends database a fraction later than the actual event. Moreover the time of the event and
GPS position is not changed after these findings but need to keep in mind during the continuation of the
research. However, the process will be the same as a normal user in the database of Mezuro. So in reality
the case that an MS could be outside the theoretical cell area, happens in the same way as during the
data collection.
60
Data analyses
Location-Based Applications using mobile phone location data
Figure 8.2: Cells with the GPS points with the same time as events with the particular cell
Figure 8.2 shows four examples of GPS positions of events outside the cell area. Upper left demonstrates events with the mode train with a cell which has a cell area with no overlap with the rail track.
Bottom left visualizes events just outside the cell area and does not affect the research further. Bottom
right shows points just outside the cell area but also one event far outside the cell area, more than three
kilometres from the cell area. After a closer look at the event, it turns out that the event should be created
near the other points at the rail track and is created with a delay of two minutes. The event, a beginning
of a call, is created at exactly the same moment as the end of the call. More events with also a wrong GPS
position caused by delays in the telecommunications network have not been found. The delay in events
is possibly caused by the many generated events. Because every minute at least two events have to be
processed, the telecommunications network could not handle it. Normally the amount of events for a
user is lower. This particular event is not deleted from the database because it could happen by another
user in a unusual case.
The number of events outside the cell for each mode and trip number is collected in table 8.7. In total 8 %
of all events have connected GPS points outside the cell area. There is no relation in mode or trip number
with the events outside the cell. When looking at the number of cells with events outside the cell, the
percentage is higher. In total 35 % of all cells does have connected events with GPS positions outside the
cell. The number of events outside the cell could be related with the event type. A SMS is at one time
while a call, with two events at the start and the end, takes longer and therefore more operations in the
61
Data analyses
Location-Based Applications using mobile phone location data
telecommunications network have to be done. The start and end of a call event have, when the mobile
phones are moving, different cells in the database. The percentage of events outside the cell is indeed
significantly higher when calling than when sending a SMS, but the difference is not high, see table 8.6.
Beside events outside the cell, the distance between GPS point and the site could be larger than the radius.
Those events are outside the cell but are also outside the radius. The events are also in the errors indicated
before but are further away than the most GPS points visible in figure 8.2. In total 0.27 %, 11 events, have
a larger distance from the GPS position to the site than the radius of the cell. The reason behind this is an
error in the radius in the network data. Not all cells have the right radius in the database.
Table 8.6: Percentage events and cells with events outside the cell area for each event type
Event type
Number of events
outside cell total
% events
outside cell
S (SMS: SMS event)
V (Voice: call event)
276
65
8%
13%
3558
519
Number of cells
with events total
outside cell
160
525
59
260
% cells with
events
outside cell
30%
23%
Table 8.7: Percentage events and cells with events outside the cell area for each trip and mode
Trip
number
Mode
Number of events
outside cell total
% events
outside cell
1
2
3
4
5
6
7
1
2
3
4
5
6
All
C
C
C
C
C
C
C
T
T
T
T
T
T
Both
18
31
28
30
21
23
4
29
40
37
26
31
23
341
6%
7%
9%
10%
7%
6%
6%
9%
11%
11%
8%
9%
9%
8%
8.2.3
306
444
306
291
317
397
72
322
380
344
313
334
251
4077
Number of cells
with events total
outside cell
12
78
18
104
22
101
18
101
17
108
20
126
4
23
23
134
30
131
27
144
21
120
21
120
15
104
190
538
% cells with
events
outside cell
15%
17%
22%
18%
16%
16%
17%
17%
23%
19%
18%
18%
14%
35%
Distance GPS position - site
With the known GPS position and site position, combined at the same time as an event, the distance
between the site and the GPS position could be determined. The distance is determined in a straight
line. The distance for each track and mode is averaged and shown in table 8.8. The average distance is
larger for the mode car than for the mode train. The average distance in the car is 981 meters while the
average distance in the train is 855 meters. The median distance in the car is 777 meters while the median
distance in the train is 592 meters. So, a large part of the events have a low average distance, while the
extreme values are high. This principle applies much more for train passengers than for motorists. With a
T-test the differences of the two modes are analysed. The distances from the GPS position to the site
between the two modes differ significantly, so the position of an MS in a train can be determined more
accurate than the position of an MS in a car.
62
Data analyses
Location-Based Applications using mobile phone location data
Table 8.8: Average distance and standard deviation of the distance from the GPS position to the site for
each trip and mode
Trip number
1
2
3
4
5
6
7
All
Average distance
C
T
1027 871
952
810
1039 822
1038 895
999
936
944
865
863
990
865
Standard deviation
C
T
933 778
861 796
892 835
793 888
790 927
660 955
569 815 860
Standard deviation
C
T
821 669
707 522
795 571
796 623
774 673
814 533
684 777 596
During the day the average distance grows in the train, the cause is unknown. The maximum average
distance in the train is during track number 5 the highest. The largest difference in the average distance
is more than 100 meters. One of the explanations of the lower average distance at track 2 and track
6, is the missed transfer at Utrecht Central station. Therefore 15 minutes long the average distance is
lower because more cells are situated in and near the station. For the mode car the average distance
decreases during the day. In the last trip only from Amsterdam to Utrecht is travelled and at that part
cells are positioned closer, especially in the cities Utrecht and Amsterdam. However in the rural area
between Amsterdam and Utrecht the cells are not positioned closer together compared with the rural
areas between Utrecht and Eindhoven. While the average distance of track 2 is low, the other tracks have
a higher average distance between the GPS position and site. Also here an explanation could be found
in the lower speed during the trip due to traffic jams. In this chapter this relation will be explained, see
section 8.2.4. The standard deviation of both modes gives the same pattern as described in the previous
paragraph. The standard deviation of the mode car in the morning is almost two times higher than the
last track at the day. The standard deviation decreases from 933 meters to 569 meters.
8.2.4
Relationships with the distance GPS position - site
The distance between the GPS position and cell position is related to different properties of the network
or variables of the trip. The characteristics are for example radius while the distance GPS position to
site is also related to the speed. The network characteristics are both network based as spatial based.
Network characteristics are radius of the cell, the theoretical reach of the transmitter, and event type, a
call, text message or data event. Spatial characteristics are related to the position of the cell, inside or
outside the city and cells are connected road and train track segments. All relations are based on previous
experiments during a test data collection.
The radius is related to the distance from GPS position to the site. Second, the cells are distinguished on
cells inside and cells outside the urban area. After that the distance is related to the event type and speed
of the user. At last a short conclusion gives the most relevant findings of the relations with the distance
from GPS position to the site.
Distance GPS position - site related to the radius
Figure 8.3 shows the cumulative percentage for both the radius of the cells and the distance between the
GPS position and cell site for all events. The cumulative percentages show that the theoretical radius
have not the same shape as the distance from GPS position to the site. So most of the times the radius is
much larger than the distance to the site. In fact, the radius can be decreased for most cells. The highest
value is 6350 meters which have been seen in a train during track 5. Figure 8.4 shows the GPS position,
site and cell for the largest distance between GPS position and site. The real cause of the large distance
between this cell and the GPS position is unknown. An explanation could be that the surrounding area at
63
Data analyses
Location-Based Applications using mobile phone location data
the GPS position is a stretch of woods and with the current connected cell the best connection can be
made because of a direct Line of Sight.
Figure 8.3: Cumulative percentage for the radius and distance between the GPS position and site
Figure 8.4: GPS position, site and cell for the largest distance between GPS position and site
64
Data analyses
Location-Based Applications using mobile phone location data
A visualization is made for the relation between the distance from the GPS position to the site and the
radius of the associated cell, see figure 8.5. The vertical lines in the graph represent the different radii of
the cells. The radii are not continuous and there are only a few different radii in case the radius is larger
than 10 kilometre. The largest cloud of points, in the figure dark blue, is in the area with a radius smaller
than 5 kilometre and a distance smaller than 1000 meters. A linear regression line is calculated and when
the radius increases the distance from the GPS position to the site grows. The regression is explained
with the formula y = 0.08 ∗ x + 405, with radius x and distance from GPS position to the site y, both in
meters. With an R-squared value of 0.155, the relation is only for 16% explained by the events and therefore the linear regression does not explain the variation well. This is caused by the large variance in the data.
From the analysis above, most of the distances to the site are much smaller than the radius. To map the
positions of a user, the radius could be decreased in order to have more accurate results. In figure 8.6 the
cumulative histogram is shown for the ratio between distance from GPS position to site and the radius.
This is done for different radii. The median of the ratio between distance from GPS position to site and
the radius for all radii is 0.15. So in half of the cases the event is created in a range to 15 % of the radius
from the site. For radii larger than 10 km, the median is 0.09. With radii smaller than 5 km, a higher ratio
is seen. The median is thereby 0.2. The 95 % value of the ratio for all radii is 0.5. In 95% of all events an
event is created in the half of the radius, so the theoretical radius could be decreased into a practical
radius in order to increase the accuracy of positioning of the user.
Figure 8.5: Graph of the distance GPS postion - site related to the radius of the cell
65
Data analyses
Location-Based Applications using mobile phone location data
Figure 8.6: Ratio distance from GPS position to site - radius of the cell for different radii
Distance GPS position - site related to urban and rural areas
In the city more cells are installed because the demand is higher due to all citizens. Because of the higher
amount of cells, the distance between cells is smaller. Also the radius of cells in urban areas is on average
smaller. The smaller distance between cells could mean that the distance to the site from the GPS position
is smaller. Therefore the route is split up in parts outside the city and inside the city. However the road
track goes around the city centre and is not situated in the inner city, but those cells are allocated as
inside the city. The train track goes via city centres and the stations are transport hubs with a lot of
passenger movements. Figure 8.7 shows the cells in and outside the city.
66
Data analyses
Location-Based Applications using mobile phone location data
Figure 8.7: Sites in rural areas(blue) and sites in urban areas(orange) for all connected cells
The differences of the distance from the GPS position and site for the different parts of the route is shown
in figure 8.8. A saw figure can be seen for all modes, only cells in Eindhoven have a larger distance to the
site compared to other average distance in urban areas. The distance from the GPS position to the site is
equal to the route between Den Bosch and Eindhoven. At other parts of the route, the average distance is
600 meters in the city and 1100 meters outside the city, so a doubling of the distance. The distance in
rural areas and urban areas differs significantly.
The explanation of the above phenomena can be described in the radii of the cells in rural areas and urban
areas. Figure 8.8 shows also the average radius of the different parts of the route. A saw figure can also here
be seen. The radii outside the city differ significantly with the radii of the cells in the city. Earlier is stated
that the distance from the GPS position to the site is higher when the radius of a cell grows. This explains
the reason why the distance is larger in rural areas than urban areas. The cells in Eindhoven have also
a larger radius on average than in other cities which also was seen at the distance from GPS position to site.
The ratio between the distance of the GPS position and radius differs in rural areas and urban areas less
than the difference of the radius and distance described above. In urban areas the ratio is on average
0.19 while in rural areas the ratio is on average 0.17. After performing a T-test, the difference turns out to
be significant. So in urban areas, the average radius is lower and therefore the distance from the GPS
position to the cell site as well.
67
Data analyses
Location-Based Applications using mobile phone location data
Figure 8.8: Average radius of the cells and average distance from GPS position to site for the different
modes on different parts of the route
Distance GPS position - site in relation to the event type
Table 8.9 shows the average distance from GPS position to site for SMS events and call events. The
distance could be different between the two types because a call needs to have sufficient signal strength
during the whole call while an SMS has to be sent once. The average distance of call events is more than
100 meters higher than for an SMS event. After performing a T-test, the result gave a significant difference.
The difference will not be relevant for the further course of the application but when also data events are
related with the distance to the site in future studies, a better statement can be made. This statement can
be used for a better understanding of the telecommunications network.
Table 8.9: Average distance from GPS position to site for SMS (S) and calls (V)
Event type
S
V
Both
Average distance from GPS position to site
911
1059
930
Number of events
3558
519
4077
Distance GPS position - site in relation to the speed
At last the speed is related to the distance from the GPS position to the site. This relation with the speed
will tell something about the accuracy of positioning when moving and not-moving. The question is if the
location could be determined with a higher accuracy in cities, shopping malls or just in homes than when
moving fast or is it the other way around. Also for the travel time application, when a user is travelling
faster, the distance to the site should not increase very much. Otherwise when mapping the user onto a
road or train track, the correctness increases and the travel times get less accurate. The speed is generated
by the GPS devices and is connected to the distance. The average distance GPS position - site is calculated
for different speed groups, see figure 8.9. The average distance fluctuates with a mean of 1000 meters
to the site. Due to the low amount of events in the speed group around 40 km/h the average distance
fluctuates there. More precise, the group between 35 and 40 kilometres per hour has two events with a
distance to the site of 30 kilometres while the group has only twelve car events. So if the speed group has
outliers and the average number of events is low, the average speed can fluctuate. For the mode train a
steady increase of the average distance can be seen when the speed is higher. At the lowest speed level
68
Data analyses
Location-Based Applications using mobile phone location data
an average distance of 300 meters can be observed while for the highest speed levels an average distance
of 1200 meters is reached. One of the causes is the speed is low at stations. There the radius of the cell is
lower and therefore the distance to the site is lower.
Figure 8.9: The average distance from the GPS position to the site for each speed group, with mode
train(dark blue) and car(dark green) and number of users with mode train(light blue) and car(light green)
A graph is created for the relation distance from GPS position to the site with the speed. Figure 8.10
shows the pivot graph and until a speed of 80 kilometres per hour most of the points have a distance
lower than 200 meters with a few outliers, while with a speed higher than 80 kilometres per hour two
point clouds can be can be observed. A cloud with a high density near the speed of 100 kilometres per
hour, mostly car events, with distances till 3000 meters but most points with distances till 1500 meters.
The other cloud, mostly train events, has a more even distributed pattern with speed from 120 till 140
kilometres and distances up to 4000 meters. So, the speed is not related to the distance from GPS position
to the site.
69
Data analyses
Location-Based Applications using mobile phone location data
Figure 8.10: The distance from the GPS position to the site and the speed at the GPS position
Conclusion
The relations between the distance from the GPS position to the site are explained. Some relations are
statistically significant. First, a relation is created with cells out- and inside the city. For both modes
the distance to the site from the GPS position is higher outside the city than inside the city, the average
values show a relation with the average radius in the particular part of the route. Next the different
event types are related to the distance and it turns out that this relation is also significant. Furthermore
the data create the idea that the distance is higher in case of a larger radius of the cell, however the
regression is not well explained. A relation with the speed and distance to the site can also not be created.
But, the average distance shows an increase when the speed is higher in the case of the mode train. The
conclusions drawn here, can be used in other analyses. The radius can be used to say if the positioning of
the user is reliable.
8.2.5
Distance from GPS position to different spatial elements
The distance to different spatial elements, BSCM, the centroid of the cell area, the cell site and Voronoi
diagram, is visualized in figure 8.11 and shown in table 8.10. First the distance to the site is calculated, on
average it is in the car 990 meters and in the train 865 meters. This can be found in table 8.8. The second
column gives the average distance from the GPS position to the centroid of BSCM. The distance is on
average higher in the car than in the train. The course during the day is for both modes the same as for
the course of the distance to the site.
The distance to the centroid of the Voronoi diagram is on average more than 100 meters shorter than for
the distance to the site or centroid of BSCM. The centroid of the Voronoi diagram has thus the highest
accuracy to the GPS position. When comparing the different spatial elements on their shape, the results
70
Data analyses
Location-Based Applications using mobile phone location data
shows that the position is near the cell site and most of the times near the closest cell site, which is a fact
of a Voronoi diagram. At last the distance from the GPS position to the centroid of the cell is determined.
The average distance is very high and is for both modes around 2500 meters.
Figure 8.11: Different spatial elements and the GPS position
Table 8.10: Average distance in meters from the GPS position to all centroids of different spatial elements,
the site, cell, BSCM and Voronoi diagram
Trip
number
1
2
3
4
5
6
7
All
1
2
3
4
5
6
All
Mode
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
Distance GPS
position - site
1027
952
1039
1038
999
944
863
990
871
810
822
895
936
865
865
Distance GPS pos.
- centroid BSCM
1156
987
1032
1065
961
933
942
1014
827
780
728
848
820
848
805
Distance GPS pos. centroid Voronoi
1012
790
868
857
825
742
725
836
720
690
668
737
769
749
720
Distance GPS pos.
- centroid cell
2820
2597
2802
2955
2753
2695
2881
2758
2610
2265
2275
2391
2488
2158
2369
Next the polygons of the spatial elements are used to ’map’ the different elements on the road or train.
This mapping is done with the Postgres spatial functions. In the polygon the shortest distance to the
train track or road defines the mapping point. The shortest distance can be zero if the polygon overlaps
the train track or road, then the centroid of the line piece, where the shortest distance is zero, with the
polygon is used. Otherwise the mapping point is created where the distance from the train track or the
road to the polygon is the shortest.
71
Data analyses
Location-Based Applications using mobile phone location data
The first column shows the mapping point of the BSCM. The average distance to the GPS position
from the mapping point is 500 meters. Also for the mapping point of the Voronoi diagram an average
distance of 500 meters is achieved. For both spatial elements the average distance to the GPS position
to the train track is better than to the road. The last trip in the train has a high value for the average
distance to the mapping points. The last column presents the average distance to the closest mapping
point from the site. The average values are on average 200 meters higher, however the average distance in
the train or car is remarkably high. The average distance for all trips differs with 200 meters. As said, at
stations more cells have smaller radii because of the higher demand in a small area while the cells next to
the roads have a larger average radius because the demand is lower and with a larger radius more users
can benefit of the cell. A train waits a few minutes before leaving the station and during two trips the
connection between trains was missed so these trips had a waiting time of 10 minutes.
Table 8.11: Average distance from GPS position to mapping points of different spatial elements
Trip
number
1
2
3
4
5
6
7
All
1
2
3
4
5
6
All
8.2.6
Mode
C
C
C
C
C
C
C
C
T
T
T
T
T
T
T
Average distance GPS
pos. - mapping BSCM
566
535
547
603
551
509
504
545
415
478
426
465
493
509
464
Average distance GPS pos.
- mapping Voronoi diagram
548
539
547
593
575
490
420
530
441
406
415
495
456
681
482
Average distance GPS
pos. - mapping site point
795
772
838
859
866
790
755
811
609
588
596
681
653
652
630
Distinct cells
In this subsection the events with the same cell are related to each other. In the previous subsections all
events were analysed apart from each other, while in this subsection the cells are analysed apart from
each other. From Postgres the distinct function is used to perform the same analyses. First, the number
of events for each cell is calculated. Second, same cells seen in different trips are determined to detect the
difference or similarity of the trips in relation to the cells. At last, the tracks are divided into segments
and also here the relation between trips and the same cell is researched for each segment. The possibility
of a fixed location when connecting with the same cell is researched.
Distinct cells related to the number of events
For each cell the number of events of all trips and modes is summed. The relation is shown in figure 8.12.
In total 557 cells have been seen at one or more events. Most of the cells, 85 cells, have only two generated
events. Increasing the number of events per cell, the number of cells decreases. After 20 events per cell,
the number of cells will not rise higher than five cells. The average number of events per cell is five. The
cell with the most events, in total 57 events, is located next to the highway near Den Bosch. The cell has a
radius of almost 20 kilometres and probably due to a clear Line of Sight, the number of events is so high.
72
Data analyses
Location-Based Applications using mobile phone location data
Figure 8.12: Number of events per cell
Distinct cells related to the trip number
When travelling multiple times at the same track, a chance exists of connecting with the same cell at
every trip. Also because the train track and roads are situated side to side at some points in the route,
especially the route Amsterdam to Utrecht, it is possible that a cell is seen in the car as well as in the train.
Figure 8.12 shows the number of cells which have an event with the mode train and with the mode car. In
total 59 cells have been seen in both modes, this is 11 % of the total number of cells. The cells are situated
where the train track is close to the road, as expected, see figure 8.13. Between Amsterdam and Utrecht,
Geldermalsen and Den Bosch and between Den Bosch and Eindoven cells are situated which have been
seen in the car and train. Geldermalsen is halfway between Utrecht and Den Bosch. Furthermore it
demonstrates that the percentage is higher in the car than in the train. The reason is the higher amount
of cells for the mode train. Also the number of distinct cells that have been seen in another trip, same or
different mode, is calculated. In total 69 % of the cells is seen in at least two different trips.
Table 8.12: Number of cells connected in the train and car for both modes
Mode
Number
of cells
Car
Train
Both
241
356
538
Number of cells
seen in train
and car
59
59
59
% cells seen
in train
and car
24
17
11
Number of cells seen
in another trip
with the same mode
187
241
370
% cells seen
in another trip
with the same mode
78
68
69
For the same cells between different trips and for the same cells between modes, the amount of events
is defined, see table F.1 in Appendix F. Also, the distance between the GPS points of events with the
same cells is calculated, see Appendix E. From both tables the main conclusion is that driving in the same
direction the average distance between GPS points is lower than while driving in other directions.
73
Data analyses
Location-Based Applications using mobile phone location data
Figure 8.13: Cell areas seen both in the train and car
Distinct cells related to segments of the routes
To know whether a user connects with the same cell at the same place, both tracks are divided into
segments of 50 meter and each GPS position is assigned to one of those segments. Furthermore the cell is
coupled to each event and thus to each segment. The route is divided into 24394 segments. The number
of different cells and the number of events is combined in table 8.13 and in table 8.14. Table 8.13 is for
the mode car and shows that in most segments 5 or 6 trips are passed. Some segments have only one to
four matches with a GPS position of a trip. At the start of the trip, the GPS connection is established
too late, which result in no match with the segment for that particular trip. The GPS connection failed
at some points during the trip, e.g. tunnels, and therefore the segments have no match with the GPS
positions. Furthermore it turns out events in segments have always different connected cells. The same
cell is seen at different trips in a segment, but the amount of different cells is the half of the amount of
different trips. Multiple events can take place in one segment, so if the events have different cells the
number of different cells can be larger than the number of different trips.
Table 8.14 is for the mode train and have for example compared with the table for the mode car some
segments with more than six different cells. The segments with more than six different cells are situated
at one station, namely at Utrecht. Because the train waits at the station for longer time and two trips
74
Data analyses
Location-Based Applications using mobile phone location data
Table 8.13: Amount of segments ordered by the number of different trips and different cells for 2634
segments
Number of different trips
1
2
3
4
5
6
1
14
1
4
3
5
Number of different cells
2
3
4
5
6
20
21
23
100
313
18
28
158
608
18
126
690
16
232
41
had a waiting time of 10 minutes at Utrecht, many events were generated and therefore different cells
have been seen at the station. More different cells have been seen and more segments have a match with
six different trips. The rail tracks are closer to each other so the matches are better distributed over the
different segments while the different road directions are located further away and therefore the matches
are not evenly distributed. The train route is divided into 2569 segments.
Table 8.14: Amount of segments ordered by the number of trips and different cells for 2778 segments
Number of different trips
1
2
3
4
5
6
1
13
8
2
Number of different cells
3
4
5
6
7
8
30
36
19
35
18
57
102
179
219
1
3
73
323
616
1
164
491
2
170
5
3
9
1
Conclusion
For each cell multiple events are created during the trips. In total 557 cells have been seen at one or more
events, most cells are seen in different trips at the same mode. In total 69% of all cells are seen at different
trips while only 11% of all cells have an event in both the train and car. The tracks of the train and car are
at some places next to each other so that is causing the similarity in cells. When looking at the segments
of 50 meters, most of the segments have events with more than five trips and on average four different
cells. So it can be concluded that at the same segment the mobile phone is not connecting with the same
cell, however a cell is connected with events of multiple trips so at a different segment.
8.3
Conclusion
Some lost minutes exist in the data collection mainly caused by stations and tunnels because of a signal
loss in the mobile phone and GPS devices. Some GPS points are outside the cell area, an explanation
could be the delay in the network architecture or a difference in the practical and theoretical cell polygons.
Statistical significances are found, the relation of the distance between the GPS position and the cell and
cells outside and inside the cities, Utrecht, Amsterdam, Den Bosch and Eindhoven, and the relation of the
distance and event types differ significantly. Furthermore it appeared that in a segment of the tracks no
fixed cells exist, a connection is made with various cells. The radius combined with the distance shows
an increase in the distance from GPS position to the site if the radius increases. Another fact is that
the distance to the site is half of the radius at the 95% confidence interval. To conclude, the theoretical
network characteristics do not match the practical network characteristics.
75
Data analyses
Location-Based Applications using mobile phone location data
The distance from GPS position to the site is on average 930 meters and a median distance of 700
meters has been found. In Part I, literature gives a median distance of 750 meters in suburban areas.
Because urban areas as well as rural areas are passed, the results are similar. The analysis in the last
chapter of Part I, determining the usability of mobile phone data, can be improved by using the values of
this subsection.
The distance from the GPS position to the different spatial elements is determined and the best performing
spatial elements are used in the algorithms to detect the mode or map the positions. The spatial elements
Voronoi diagram and BSCM give better results than the spatial element site when determining the distance
between the GPS position and the spatial element but the increase in accuracy is not large. However when
mapping the position of a user to the train track or road the average distance to the site is approximately
500 meters using respectively the BSCM and Voronoi diagram. When mapping the user to the train track
an average distance of 750 meters can be found. So the spatial elements Voronoi and BSCM show an
increase in accuracy of positioning a user and can be used in the algorithms which will be described in
the next chapter.
The main results for the application are that the availability of Voronoi and BSCM increases the accuracy of the travel times due to the better positioning of users. Furthermore in the city the travel time can
better computed than rural areas because also here the positioning of users is better. The determination
of the different modes modes is difficult when the train line and road are next to each other, so due to the
confusion of the modes travel times can be estimated wrong. When the train line and road are further
away, the determination between modes can be estimated better.
76
9
Development application
With the results of the data collection, algorithms can be created to detect the mode and map the mobile
phone users. For each cell the mode and mapping point will be determined with the help of network data
and the Friends data. The GPS points will be used to compare the mapping point with the actual position.
Both algorithms are calibrated and then validated. The algorithms can be used for the development of
the application ’travel time information’ but can also be used for other applications, like determining
O-D relations.
In this chapter three algorithms are developed and explained. First, the mode algorithm, the determination of the mode for one event, will be explained in the next section. Thereafter more events, seen
during a trip of a user, will be used to create the algorithm for multiple events. The mode algorithm for
one event is used to determine the mode of a trip. The two mode algorithms are followed by the mapping
algorithm. The mapping algorithm creates a point on the road or train track for each cell where the
user is most likely located. The mapping algorithm will be used to detect the route and to calculate the
travel time when developing the whole application. The created algorithms use the spatial elements to
calculate distances between points and polygons. Table 9.1 shows the different spatial elements and the
used spatial types. In figure 9.1 the elements are visualized.
Table 9.1: Spatial elements used in the algorithms
Abbreviation
t
r
c
s
b
v
Element
Train track
Road track
Cell
Site cell
Best Serving Cell
Voronoi diagram
Spatial type
Line
Line
Polygon
Point
Multipolygon
Polygon
77
Development application
Location-Based Applications using mobile phone location data
Figure 9.1: Explained spatial elements used in the algorithms for one single cell: c = cell area, s= cell site,
v= Voronoi diagram, b = BSCM, r = road, t = rail track
9.1
Mode algorithm - event-level
The first described algorithm is to determine the possible mode of a person. Each event has a connected
cell. The connected cell and its spatial elements can determine geographically the mode. The public
transport lines and roads could be both overlapped by the cell area resulting in a decision to assign the
cell to one mode. The distance from the site to the public transport lines and roads will be used to make
the decision. Besides the site the spatial element BSCM is used to detect the mode, because the accuracy
increases which is demonstrated in the previous chapter. The Voronoi diagram has not been used in the
mode algorithm because the BSCM shows slightly better results in the previous chapter. The algorithm
will be tested and calibrated for the route Eindhoven and Amsterdam. The two used modes are train and
car. The algorithm uses the tracks visualized in figure 7.2.
9.1.1
Algorithm
Figure 9.2 shows the decision model for the mode algorithm. The Best Serving Cell Map (BSCM) is used
whenever the cell has a BSCM available. If the BSCM does not exist for a cell, only the distance to the
cell site will be used, see figure 9.2. The figure presents the algorithm schematically and all factors are
explained after the model. The first checks in the model are to test whether the cell area overlaps only
the train tracks or only the road. Another possibility is that the cell area does not overlap with the train
track and road and therefore none of the modes can be assigned to the cell.
Equation 9.1 shows the ratio of the shortest distances to the line elements between the modes train
and car. All variables are visualized in figure 9.1. Both ratios are used in the next equations. y1 is the
ratio between distances using the BSCM and y2 is the ratio between distances using the cell site. In this
equation the train track and road are only used within the cell c. So the shortest distance is calculated of
the polygon(b) or point(s) to the line segments r and t within c. Both values cannot be calculated when
the denominator is zero. Then, the value will not be used in the next steps.
78
Development application
Location-Based Applications using mobile phone location data
Figure 9.2: Decision model mode algorithm, F1 and F2 will be explained in equations 9.2 and 9.3 respectively
|bt|
∈c
|br|
|st|
∈c
y2 =
|sr|
y1 =
(9.1)
The algorithm detects of the train track or the road track are within the cell area. If one of the two tracks
are not within the cell area, F1 gets a value 3 or -3 depending on which track’s mode is not within the cell
area. If the value 3 or -3 is assigned to F1 the mode is automatically assigned in equation 9.5 because the
value of M cannot change in sign when adding F2 and F3 .

if c ∈ r and c ∈/ t
 3
–3 if c ∈ t and c ∈/ r
F1 =
(9.2)

0
if c ∈ r and c ∈ t
The availability of the BSCM of the particular cell is tested. If the BSCM is available the value of F2 is
calculated, otherwise the value of F2 will be zero, see equation 9.3. The value of y1 cannot be calculated
when the denominator is zero. Then, the BSCM overlaps with the road, so the shortest distance from
the BSCM to the road is zero. The value of F2 will be automatically one, so the mode car is in favour
compared to the mode train, under the condition that the distance from the BSCM to the train is not zero.
The same principle holds for the train, however when the shortest distance from the train to the BSCM is
zero, the value of y1 will be zero. The value of F2 gets a value of one, so the mode train is in favour of the
mode car. When either shortest distances are zero or the ratio between the distance is between 0.4 and
2.5, the value of F2 will be zero. None of the modes is in favour compared to the other. The calibration
analyses for the threshold values in equation 9.3 can be found in Appendix C.

if y1 > 2.5 or (|br| = 0 or |bt| 6= 0)
 1
–1 if y1 > 0.4
F2 =
(9.3)

0
if 0.4 < y1 < 2.5 or BSCM (b) does not exist or (|br| = 0 and |bt| = 0)
Next the shortest distances to the cell sites are taken into account. Also here the shortest distances can be
zero and the same principle described in the previous paragraph is used to give a value to F3 . The value
F3 is determined with the variable y2 , see equation 9.4. When both shortest distances are zero or the
ratio between the shortest distances to the cell site is between 23 and 1.5, a zero is assigned to F3 . The
calibration for the threshold values in equation 9.4 can be found in Appendix C.

if y2 > 1.5 or (|sr| = 0 or |st| 6= 0)
 1
–1 if y2 < 32
F3 =
(9.4)

0
if 23 < y2 < 1.5 or (|sr| = 0 or |st| = 0)
The value M of equation 9.5 defines the mode. The value M is the summation of all factors.
79
Development application
Location-Based Applications using mobile phone location data
M = F1 + F2 + F3
(9.5)
When M<0 then the mode will be car. If M>0 then the mode will be train, see equation 9.6. If M=0 then
the distance from the cell to the road or track determines the mode. If |st| > |sr| the mode will be car, else
the mode is train.

train if M < 0



car
if M > 0
mode =
(9.6)
train
if M = 0 and |sr| > |st|



car
if M = 0 and |st| > |sr|
9.1.2
Example
For one of the cells the algorithm is explained. The used cell with its components is visualized in figure
9.1. First, the existence of an overlap of the cell area with train tracks and roads is determined, see figure
9.2. There is an overlap with both the train tracks as the roads so the decision model asks whether the
BSCM exist for the particular cell. Because the BSCM exists, F1 is zero and F2 and F3 have to be computed.
Next, y1 and y2 are calculated. The BSCM is overlapping both the train tracks as the road so the value of
y1 cannot be computed (both distances are zero). y2 gives in this case a distance of 800/150 = 5.3. The
train tracks are five times further away than the road. F2 is zero because as said the polygon of the BSCM
is overlapping both the train tracks as the road so both distances (|br| and |bt|) are zero. The last factor,
F3 , is computed with the value of y2 . The value of y2 is five and therefore the value of F3 becomes one.
The summation of the factors, the value M, will determine the mode. Because M = 1, the determined
mode for this cell is car. In reality a connection with this cell was made both in the car as in the train. The
choice is made to choose one of the modes for each cell and not both modes. When combining events in
a trip, the factors are used to give weights for the correctness of the mode determination of each event.
9.1.3
Validation results mode algorithm
With the above described algorithm, the mode is assigned at every cell. So every cell is assigned to one
mode. The assigned mode is compared with the actual mode. Figure 9.2 shows the results of the algorithm
for all events. Multiple events can be connected with one cell. On average 91% of all events are estimated
correctly. The car events are estimated a bit better than train events.
80
Development application
Location-Based Applications using mobile phone location data
Table 9.2: Result mode algorithm for all individual events
Used mode
C
C
T
T
Algorithm mode
C
T
T
C
Number of events
1953
180
1753
191
All events:
Percentage correct
92%
90%
91%
Table 9.3 shows the result of the algorithm for all cells. Because multiple events use the same cell, the
amount of cells is lower. The percentages are lower than the results from the events, this means more
events are connected with cells which are assigned right. In table 8.12 the cells can be found which are
connected both in the train as well as the car. In total 11 % of all cells are seen during trips in the train as
well as trips in the car. So a large part of the error of 12 % is explainable by cells seen both in the train and
car. 34 cells of the 59 cells seen in the train as well in the car are detected as mode car, while the rest of the
59 cells, 25 cells, are detected as mode train. 75 % of the events with cells which are detected as mode car,
are actually created in the car. When looking at cells which are detected as mode train, 48 % of the events
are actually created in the train. However when looking at the total percentage of correctly determined
events of the cells seen in the train as well as in the car, only 36 % of the events is estimated incorrect.
This is also the reason why the percentage of result for all events is higher than for all cells. Individual
cells seen both in the train as well as in the car are assigned to both categories, correctly estimated and
incorrectly estimated. Most of the cells which have been assigned wrong and are not seen in train and
car, are just further located from the track of the assigned mode than the track of the real mode. Table 9.2
gives the result for each individual event and therefore the percentage is higher.
Table 9.3: Result mode algorithm for all cells
Used mode
C
C
T
T
9.2
Algorithm mode
C
T
T
C
Number of cells
213
28
312
44
All cells:
Percentage correct
88%
88%
88%
Mode algorithm - trip level
The mode has been assigned for each individual cell and therefore for each individual event. However
users are making trips, consisting of multiple events. So to detect the mode of a trip, a combination of
determined modes is created using the mode algorithm described in the previous section. The assumption
is made that a user does not change during their trips. The usage of for example Park and Ride facilities,
changing the mode from car to train or the other way round, is not taken into account in this algorithm.
For a combination of cells of the same mode, but for all trips, the mode will be detected. Because for each
mode approximately 250 cells were created during the data collection, a Monte-Carlo simulation is used
to simulate the different combinations of events.
9.2.1
Algorithm
First, with the value of F1 the mode will be determined if possible. F1 defines if the cell area is overlapping
one particular mode only. If the value of F1 is not zero, it shows that the mode can be set for sure for one
cell. See equation 9.7 for the definition of the method.
81
Development application
Location-Based Applications using mobile phone location data
F1,max = max ({F1,1 , ..., F1,n })
F1,min = min ({F1,1 , ..., F1,n })
(9.7)
with i = 1 to n cells and F1 from equation 9.2
For n cells the combined mode is determined. First, the value M is summed with the use of equation
9.1 until equation 9.5. The value M determines the mode, see equation 9.6. This summation is shown in
equation 9.8. The threshold values for the variables are differed to create the best results, see Appendix C.
n
X
Mn =
(0.7 · F1,i + 0.2 · F2,i + 0.1 · F3,i )
(9.8)
i=1
with i = 1 to n events, F1 from equation 9.2, F2 from equation 9.3 and F3 from equation 9.4
Then the mode is determined with equation 9.9. Equation 9.9 shows the determination of the modes. The
value of Mn defines the mode in most cases, however if F1,max and F1,min is 3 respectively 0 or if if F1,max
and F1,min is 0 respectively -3, the mode is defined without Mn . When F1,max and F1,min is 3 respectively 0,
individual events of the trip overlap with only the road and not with the train track. Also none of the
individual events of the trip overlap only with train tracks and not with the road. So events could overlap
both the train track and road at one event but one or more events have an overlap with only the road.
When F1,max and F1,min is 0 respectively -3, it is the other way around and one or more events have an
overlap with only train tracks and none with only the road.

if Mn < 0 or (F1,min = –3 and F1,max = 0)
 train
car
if Mn > 0 or (F1,min = 0 and F1,max = 3)
mode =
(9.9)

random if Mn = 0
If both values are equal the mode will be randomly picked. However it is better to do not use the
combination with Mn = 0 for further analyses because it is unsure what the mode in reality is.
9.2.2
Validation results
A Monte-Carlo simulation is used to create the results of the combination of cells. The combinations of
cells are from one mode and the combinations were picked randomly from all trips. In total one million
combinations are simulated in order to test the performance of the algorithm. Figure 9.3 visualizes the
result of the simulation. The percentage increases when more cells are available. The difference between
cells from events with the mode ’car’ is slightly better in estimating the correct mode than cells from
events with the mode ’train’. However the difference is marginal. Nevertheless most of the computations
differ significantly from each other. When the number of events is five, the difference is not significant.
The cause of the difference is the higher number of right determination of the mode when the cell is seen
in both the car and train compared with the train is explained in section 9.1.3.
On average on the route Amsterdam-Eindhoven two events are created by a user, see Appendix B.
With two events a correct mode percentage is approximately 93 %. For the mode train the percentage of
correct events are slightly lower than for the mode car.
82
Development application
Location-Based Applications using mobile phone location data
Figure 9.3: Percentage correct mode algorithm using a Monte Carlo simulation for different number of
cells
9.3
Mapping algorithm
The mapping algorithm will be used to create a point on the road track or rail track depending on the
determined road. Each cell is however mapped on both the road and rail track, because some of the
cells can be seen in both the train and car. The mode determination will be used to determine which
mapping point will be employed. First the mapping algorithm will be explained whereafter the results of
the algorithm for the case study will be shown.
9.3.1
Algorithm
First the availability of both the Voronoi diagram as well as the BSCM is checked for each cell, see figure
9.4. With that check a combination of used spatial elements can be used for mapping. If the Voronoi
diagram and the BSCM exist, both spatial elements are used and the location of the cell site is not used.
Table 8.11 shows the best mapping elements and both the Voronoi diagram as well as the BSCM have
a more accurate mapping than the spatial element cell site. To have no outliers in the data, always a
combination of two spatial elements are used. With weights the best spatial element of the combination
is more preferred than the other.
83
Development application
Location-Based Applications using mobile phone location data
Figure 9.4: Decision model mapping algorithm
Three combinations of spatial elements for mapping exist, namely Voronoi, BSCM and distance to site. For
each combination weights are given to each spatial element, see table 9.4. For these weights a calibration
analysis is done, which can be found in Appendix C.
Table 9.4 shows the weights for every mapping technique. In Postgres the mapping techniques are
combined with the ST_Closestpoint technique, for more information see Ramsey [2005] and The PostGIS
Development Group. The code detects the closest point from the spatial elements to the tracks. With the
two points on the tracks, a centroid is calculated with the corresponding weights. This centroid of the
combination could not be located on the track. Therefore the closest point of the centroid on the tracks is
again used to create the final mapping point. Only the parts of the tracks within the cell area are used to
detect the closest point on the tracks. The weight of the BSCM is always larger than the other spatial
element in the combination. At the combination site distance - Voronoi the element Voronoi gets the
weight two and site distance the weight one. Note that for the Voronoi diagram and BSCM the whole
polygon is used while for the cells site a point is used. All weights are differed, see Appendix C.
Table 9.4: Factors of the different combinations of mapping
Site distance - Voronoi
Site distance – BSCM
BSCM – Voronoi
9.3.2
Cell distance
1
1
-
Voronoi
2
1
BSCM
2
2
Example
For the same cell as the example for the mode algorithm, the mapping algorithm is explained. Figure 9.5
shows the different mapping points for the spatial elements for the particular cell. Because the cell is
seen in both the train and car, the mapping points are created on the train track and the road. The blue
mapping points are mapping points created with the position of the cell site. The shortest distance from
the point cell site has been chosen as mapping point. The red points are created from the polygon Voronoi
diagram and the yellow points are created from the polygon BSCM. From figure 9.4, the combination
Voronoi-BSCM is chosen. With the corresponding weights of the spatial elements, the light blue points
are created to define the final mapping points. The final mapping points are situated between the Voronoi
84
Development application
Location-Based Applications using mobile phone location data
and BSCM mapping points and closer to the BSCM mapping points than the Voronoi mapping points
because of the weights.
Figure 9.5: Mapping points for different spatial elements, with b = BSCM, v = Voronoi diagram, c = cell
area, s = cell site and MP-s-r/ts = cell site mapping points, MP-v-r/t = Voronoi mapping points, MP-b-r/t =
BSCM points and MP-vb-r/t = final mapping points (combination Voronoi -BSCM)
9.3.3
Validation results
The distance from the GPS position to the mapping point is calculated and linked with the different parts
of the routes, see figure 9.6. The average distance for the train is approximately 421 meters and for the
car the distance is approximately 526 meters. During the day the average distance does not differ but as
shown in the figure the saw figure is also visible at the mapping points. So in the city a lower average
distance can be found. This could be caused by the smaller average radius in the city, see section 8.2.4.
85
Development application
Location-Based Applications using mobile phone location data
Figure 9.6: Average distance from GPS position to the mapping points for both modes for different parts
of the route
For this algorithm is besides the average distance between the mapping position and GPS position, the
average speed is computed. Because the travel time between two events is always the same, namely the
time between two events, and to know whether it is possible to determine the travel time, the speed
is calculated. The average speed takes into account the direction of travelling, so when for example
the driving direction is north and both mapping points are at the same distance northerly of the GPS
position, the algorithm gives the same result. For every combination of two events the average speed at
the track for the mode is determined. In total almost one million combinations can be found and for each
combination the events are first mapped on the track and thereafter the distance and time between the
events are calculated. The determined average speed is compared with the average speed between the
GPS points at the same time as the events. The results of calculating the differences display a median of
1.5 km/h, see figure 9.7. The 95 % has a value of 16.5 km/h. Values with a difference in speed of 0 km/h
exist, because cells with the same site are coupled. The cells with the same site have the same mapping
point and if the location of the GPS points are the same, a speed difference of 0 km/h is calculated. The
speed difference between the average GPS speed and the average speed from the mapping positions is
also interlinked to the GPS speed. It turns out that when the speed increases, the average speed difference
increases as well. However at all speeds scatter the speed difference is scattered. The average covered
distance in the mapping algorithm is 30 kilometres.
Figure 9.8 gives the relation between the logarithmic average speed difference with the real speed.
The extreme values of the mapping algorithm are only visible at high speeds. Lower speeds have a low
speed difference. The real speed is calculated with GPS points mapped onto the track and therefore the
real speed shown in the figure could deviate from the travelled real speed.
86
Development application
Location-Based Applications using mobile phone location data
Figure 9.7: Cumulative percentage related to the speed difference for all possible combinations
Figure 9.8: The logarithmic average speed difference in relation to the real speed
9.4
Conclusion
All algorithms show good results for determining the mode and speed. In this chapter the first three
parts of the framework, see figure 6.2, are developed. The mode can be determined with an accuracy of
91% when picking one event from the data collection. There is no large difference in the mode train or
car. After creating a mode algorithm for one trip, combining different events, a Monte-Carlo simulation
determines modes with a correctness percentage of 97% or higher with three or more events. The mapping
algorithm is positioning travellers onto the road or rail road track, depending on the determined mode.
The average distance from the GPS position to the mapping position is approximately 470 meters, which
is half the average distance between the cell site and GPS position. To compare two mapping positions,
the traffic measure speed is calculated to check whether the algorithm works. The mapping algorithm
87
Development application
Location-Based Applications using mobile phone location data
creates a median of all combinations of events of 1.5 km/h. The 95 % has a value of 16.5 km/h. These speed
values represent promising results to determine the travel time after creating the complete application. In
the next chapter all algorithms are used in the Teradata database.
These algorithms were developed for the case-study. Both algorithms can be generalized to the whole
Netherlands, with the same results. The chosen corridor in the case-study is representative for the whole
country. Both rural and urban areas are included and the road and rail line are next to each other at some
parts of the corridor. The algorithms so far show promising results to develop the application explained
in Chapter 7. However, the route detection and combining the travel times, see figure 6.2 are decisive
whether the travel times are accurate enough.
88
10
Results implementation application
between Amsterdam and Eindhoven
After validating and calibrating the results in the previous chapter, the algorithms are used in the Teradata
database. This means from all users in the database of Mezuro the mapping position and mode is computed.
Only the route Amsterdam-Eindhoven is investigated. This is the same corridor as the performed casestudy and the reason of to compute the algorithms only for the route Amsterdam-Eindhoven, is that
mobile phone location data is only saved for that particular route. Unfortunately for a longer period the
data is not saved for all routes. When this was saved for the whole Netherlands, other corridors could be
chosen as well. However, because the route is not yet determined for each users, only for corridors the
algorithms can be computed. In this chapter a minimum travel time and travel distance are introduced.
With the route detection, the algorithms can be performed for all routes in the Netherlands. In this
chapter, first the method performing the algorithms in the Teradata database is described and then the
results are presented.
10.1
Method
Mapping algorithm
All cells seen in the case-study have been used to detect users travelling on the corridor AmsterdamEindhoven. First the method to use the mapping algorithm in the Teradata database is explained. If two
cells have been seen in the case-study for one user and on one day, both cells are used in the algorithm.
See figure 10.1 for the method and section 9.3 for the algorithm. The distance between the cells (cell
sites or mapping point) is calculated. If the distance is less than 10 kilometres between the cell sites, the
combination of cells is not being used because a possibility exist that the user is at the same place at the
two events. Besides the possibility of a not-moving user, users can travel by bike along the road or rail
track. Also a user can travel only via local roads and not via the highway. The threshold of 10 kilometres
prevents as much as possible that users are detected with the algorithms but are actual on local roads. If
two cells are found the time interval has to be less than 5400 seconds. The maximum travel time between
Eindhoven and Amsterdam is 1.5 hour so therefore the time limit is set to 5400 seconds. Traffic jams and
train delays for a user travelling between Amsterdam and Eindhoven, so the whole route, will cause that
the user will be excluded. However if the time limit is increased, users, for example, making a retour trip
will be included. Therefore the time limit of 5400 seconds is defined, also because most users are not
covering the whole route. In the mapping algorithm is included that the detected modes from the mode
algorithm for both cells need to be the same.
89
Results implementation application
between Amsterdam and Eindhoven
Location-Based Applications using mobile phone location data
Figure 10.1: Decision tree explaining the method using the mapping algorithm in the Teradata database
Mode algorithm - trip level
The mode algorithm for a trip explained in section 9.2, has also some assumptions in order to perform the
algorithm, see figure 10.2. The assumptions are replacing the detection of a trip which is explained and
visualized in Chapter 7. If two or more cells are equal to the cells seen in the case-study, the cells are used
in the algorithm. However here a time and distance requirement is chosen. The events have to be in a
time frame not longer than 1.5 hours and the largest distance between mapping points of the cell has to
be larger than 10 kilometres. The method allows multiple trips on one day using the time frame of 1.5
hours to determine the next trip.
Figure 10.2: Decision tree explaining the method using the mode algorithm for multiple events in the
Teradata database
10.2
Results
In this section the results are explained for both algorithms. First the results of the mapping algorithm
are described and then the results of the mode algorithm are shown.
10.2.1
Mapping algorithm
After computing the queries in the database, results with more than 15 users can be shown. The privacy
requirements prescribe this. The mapping algorithm is performed for three days in the real database. In
total approximately 60,000 users were found, extrapolating that number to a whole week a number of
users of 140,000 can be found. See table 10.1 for the first results. As shown the average speed, average
distance and average amount of minutes between a combination of mapping points belonging to the
events of the user is calculated for different modes. In total the average speed of the car is 83 km/h while
train users have a slower average speed of 61 km/h. The results does not distinguish the direction of
detected users.
90
Results implementation application
between Amsterdam and Eindhoven
Location-Based Applications using mobile phone location data
Comparing the speeds with the results from the data collection with average speeds for the mode
car and train respectively of 80 and 86 km/h, the speed of the train is much lower. One of the main
explanations which can be given is that some of the users are taking the sprinter, a train which stops
at every station. In the case-study only the intercity is used and the average speed is much higher. The
average speed using the timetable of both trains is for the sprinter and intercity respectively 55 and 90
km/h including stops. So the difference could be explained by users travelling over a short-distance and
taking the sprinter. The difference in speeds during the days could be a traffic jam or interruption in the
train traffic, although this is not verifiable. The number of combinations travelling with the train and
car differs enormously, two times more train combinations are found than car combinations. The cause
could be the production of more events in the train than car. The average distance between the events for
train users and car users is both 33 kilometres. The relation between the travelled distance between two
events, grouped for each 5000 meters, and the number of detected users have for each mode one peak.
Most train users travel around 20 kilometres and most car users travel around 25 kilometres between two
events. The average time between the events is approximately half an hour. The relation between the
average time, grouped in 10 minutes, and the number of detected users is linear decreasing. So most of
the detected users have a travel time between two events of less than 10 minutes.
The number of users for one day for the whole route is approximately 20,000. The number of detected
users is both for the mode train as wells as car. In reality, the average vehicular flow between AmsterdamUtrecht in a peak hour for both directions is for example 170,0001 . For the train no actual numbers could
be found, however with an assumption of on average 500 people for each train, with eight times per
hour a train (intercity and sprinter) and for 18 hours long, the number of train users is approximately
72,000. So in total, between Amsterdam and Utrecht approximately 250,000 people are travelling each day.
Comparing the real numbers with the found numbers, less than 10 % of the travellers are detected. One
of the explanations is that users have another telecom provider. In the dataset almost six million users
can be found so comparing it with the total population of the Netherlands, the found numbers could
be multiplied with three. With this multiplication the actual numbers are not reached. Also, the total
population exists of people who do not have any mobile phone, for example children, and the some people
have two mobile phone numbers for home and work purposes. Another explanation is that users do not
create any events at all when travelling. At last, users can connect with cells which have not been seen in
the data collection but which can be connected when travelling between Amsterdam and Eindhoven.
Table 10.1: Average speed for three days of the week for different modes
Mode
Average speed
(km/h)
Average distance between
mapping points (m)
Average number of minutes
between events (minutes)
Number of
detected users
Car
Train
Both
82.58
60.57
70.02
32,967
32,598
32,757
25.39
30.55
28.33
18314
41880
60194
The average speed for each hour is calculated as well, see figure 10.3. In the missing hours, no or less than
15 combinations are found. For the car one peak period is clearly visible. In the morning peak two hours
have a lower value, the average speed drops to 73 km/h between 8:00 and 9:00. Then the speed remains the
same till the evening with an average speed of approximately 80 km/h. In the evening peak no remarkable
difference is visible In the evening the average speed increases to 102 km/h between 0:00 and 1:00. The
drop which is visible for the value between 1:00 and 2:00, cannot be explained. Most likely an outlier
causes the lower value but an assessment to particular speeds cannot be analysed. The number of users is
on average 30 in the hours during the night. The first hour in the morning shows a high average speed as
well. Unfortunately the median cannot be calculated because the software is missing this function and
due to privacy reasons no workaround can be created.
1 MTR+, http://www.rijkswaterstaat.nl/zakelijk/efficient_onderhoud/maandelijke_telpuntrapportages/technische_informatie_mtr.aspx,
Accessed on 25 February 2014
91
Results implementation application
between Amsterdam and Eindhoven
Location-Based Applications using mobile phone location data
In the train the average speed is around 60 km/h during the day. However a decrease of the average speed can be found in the night after 0:00. This decrease cannot be explained and it is unknown why
the average speed is so low. The number of users after midnight is approximately 100 per hour so one
outlier could not cause the decrease of the average speed.
Figure 10.3: Average speed for train and car users with the mapping algorithm
10.2.2
Mode algorithm - trip level
The mode algorithm with multiple events is performed in the Teradata database for three days. If a user
has two or more events with cells seen in the data collection, the mode algorithm is applied. Because
the average running time of the algorithm is about eight hours, more days are impossible to run for
now. The total number of users during three days is 9900, see table 10.2. Extrapolating the number to
a whole week, the number of users is 23,000. More train users are detected than car users. One of, the
earlier mentioned, reasons is train users create more events compared to car users. The average distance
is approximately 43 kilometres. Because the distance below the 10 kilometres are not in the results the
average distance is so high. Also the average distance is calculated between the first and last event, so
therefore the average distance is higher than the average distance found in the mapping algorithm, see
table 10.1. The average amount of SMS events is 0.9 events for each trip. The average amount of data and
call events is much higher, respectively 5 and 2.9 events per trip. In the train the amount of SMS and data
events is higher than in the car which can be explained by the fact in the car no one is allowed to keep
the mobile phone in your hands. Note the passengers have no restriction and no distinction can be made
in driver and passengers so all users in the car are included in the algorithm, causing an overestimation in
the flow. In the car the average amount of call events are higher than in the train, in the car it is more
comfortable to call than in the train because in the train ears are everywhere. More train users were
detected but because the average amount of events for both modes is the same, it contradicts the reason
92
Results implementation application
between Amsterdam and Eindhoven
Location-Based Applications using mobile phone location data
that train users produce more events. The ratio between the average amount of data, call and SMS events
are confirmed in the statistics of all events, see Appendix B.
Table 10.2: Number of users and average amount per day for different event types for the mode car and
train
Mode
Number of users
Average
distance
Average amount
of data events
Average amount
of SMS events
Average amount
of call events
Car
Train
1,020
2,292
41,515
44,369
4.8
5.2
0.7
1.0
3.3
2.5
Besides the distinction in mode, a distinction in moment of the day is made, see table 10.3. After 12:00
the number of detected users is remarkably lower than before 12:00 while the average distance is slightly
higher. An explanation of the results cannot be found.
Table 10.3: Number of users and average distance for different times of the day during the three days
Before 12:00
After 12:00
10.3
Number of users
Average distance(m)
6,062
3,875
42,229
43,655
Conclusion
The mapping algorithm shows a low average speed of 70 km/h, compared to the maximum speed on the
train tracks and roads. But trains stop, for example, at stations and therefore the average speed decreases.
For the mode car the morning peak can be clearly seen while for the train the same average speed is
visible. The evening peak is not visible in the data, the evening peak is in the Netherlands less severe than
the morning peak. The average covered distance, approximately 33 kilometres, is the same as the average
distance found in the data collection. The number of detected users is much lower than the real amount
of travellers between Amsterdam and Eindhoven. So most of the users have less than two events during
the trips, have another telecom provider or are not connecting with the cells of the data collection.
The mode algorithm on trip level shows results which are not expected. First of all more users are
detected before 12:00 while most of the events happens after 12:00, see Appendix B. Second of all, the average number of call events is higher than the average number of data events. While in the total database are
switched. Because the network architecture is sometimes a black box, which cells are missed during the
data collection, is unknown. More train users have been seen compared to car users what can be explained
by the fact that it is easier for train users to use the mobile phone when travelling. The number of users
detected by the algorithm is not very high. Possible reasons are that only cells related to an event in the
case-study has been used and users have to have at least two events on the corridor Amsterdam-Eindhoven.
When performing the algorithms for the whole Netherlands, in combination with the not yet developed route detection and trip detection, a better statement can be done about the results described in
this chapter. The trip detection, see figure 6.2, makes the minimum travel time and the maximum travel
time redundant, leading to more detected users. However, a minimum travel distance is silt necessary to
make sure the user is travelling not via local routes or other modes.
93
11
Conclusions and recommendations
Part II
In this part the first steps of the chosen application to compute the travel time from mobile phone data is
developed. A framework is established which defines the total development of the application with mobile
phone data. From this framework only a part is developed, see figure 6.2. Furthermore a case-study is done
in order to validate and calibrate the algorithms. These algorithms are used to detect the transportation
mode and to calculate the travel time. In this chapter first conclusions of this part are given and thereafter
the recommendations for further research.
11.1
Answer main research question
To what extent the mobile phone data of Mezuro are able to create the application ’Travel time information’?
The application’s framework, as shown in figure 6.2, demonstrates that the application ’travel time
information’ can be developed with mobile phone location data and the framework will be a useful
guideline to create the complete application. The aim of the application is to calculate the estimated
travel time for different trajectories on the highways or the train routes. The first parts of the application
are created using the mobile phone dataset of Mezuro and those parts show good scientific validation
results, with small differences compared to the collected GPS data. Some comments have to be given,
first if the estimated travel time is more accurate than existing techniques, such as loop detectors, is
unsure. Second, the number of detected users, in the mobile phone dataset of Mezuro is low compared
to the actual number of travellers. When the application is completed with the dataset of Mezuro and
after a comparison with the accuracy of the other data sources, a judgement can be made whether the
application developed with the dataset of Mezuro is useful. However, the main advantage of mobile
phone location data is the possibility of the prediction of travel times using weekly travel patterns and
home-work locations. Other existing techniques could not use those type of information to predict travel
times.
11.2
Findings
The data collection results demonstrate that the telecommunications network architecture comprehends
many uncertainties. Some GPS points belonging to an event are positioned outside the cell area, the
cell area is apparently only theoretical which also can be seen in a bad connection at some places in the
Netherlands while multiple theoretical cell areas overlap the places. However, more detailed data, for
example a more practical radius, will be available in the future. Furthermore, when and how data events
are generated is unknown, the data events are therefore not used of the data collection.
The analyses of the data collection show an average distance from the GPS position to the site of
930 meters. The spatial elements BSCM and Voronoi diagram improve the average distance of the mapping point to the GPS position compared of the distance from the mapping point of the cell site to the
GPS position. Moreover statistical significant relations exist between the distance to the site and the
94
Conclusions and recommendations Part II Location-Based Applications using mobile phone location data
event type and the distance to the site differ significantly inside and outside the cities. The analysis proves
that the distance to the site is half of the radius at a 95% confidence interval. So most cells, especially cells
with a radius larger than 10 kilometre, have radii where the maximum practical radius is never been used.
The algorithm for determining if the user travels with the mode car or train, gives good results. In
the case-study on the route Amsterdam-Eindhoven the mode is guessed correct with a percentage of
almost 90%. The trajectories of the different modes are situated within one kilometre at half of the route
so that explains partly the 10% incorrectness of determining the mode. There is one comparable study
found in literature [Doyle et al., 2011], which has the same level of incorrectness. On average on the route
Amsterdam-Eindhoven two events are created by a user. With two events the correctness of determining
the mode is approximately 93 %, so the determination of the mode is useful for the development of
the application. If more events are combined the percentage of determining the mode correct, rapidly
increases to 100%.
The mapping algorithm shows a median speed difference of 1.5 km/h. The 95% confidence interval
has a speed difference of 16 km/h. In case of high speeds, the deviation is small. At low speeds the
deviation compared with the real speed is larger. The algorithm cannot detect low speeds with a high
accuracy. This is because in a traffic jam multiple events have a connection with the same cell, the
algorithm detects a speed of zero while in reality the user drives slowly.
The algorithms are executed in the dataset of Mezuro and a clear decrease in the average speed is
visible in the morning peak for the transportation mode car. The algorithm detects more covered routes
before 12:00, contradictory to the average number of events per hour. Two times more train passengers are
detected than motorists, it seems that train passengers use their mobile phone more often than motorists.
Also more calls are made within the car than in the train because most likely, motorists can speak in
private.
11.3
Conclusions
The proposed framework gives a guideline to develop the application with the dataset of Mezuro. The
framework can be extended and adjusted for other datasets or for other types of travel time. The algorithms described in this study are the base of the first part of the framework and the validation of the
algorithms in the case-study demonstrates that they can be used in the application. When performing
the algorithms in the dataset of Mezuro between Amsterdam and Eindhoven, no clear statements can be
made to what extent the application can be applied with the dataset. The other parts of the framework,
especially the trip detection, have to be constructed to make that statement.
The taken route in the case-study is representative for whole of the Netherlands. The routes of the
different modes in the case-study are at some points next to each other while at other parts of the routes
the distance between the train track and road are further away. Also, the corridor covers urban areas
and rural areas. However the study does not taken into account the intersections and overlap with other
roads or train tracks. Then, an algorithm needs to define which route is taken. The detection of the route
is already implemented in the framework, see figure 6.2. In this research only the modes train and car
were studied to calculate the travel time. Because events are not constantly generated, the departure and
arrival time are unknown. This makes it difficult to estimate the mode.
If the distance is shorter than 10 kilometres, the user is not detected in the algorithms. The distances
shorter than 10 kilometres could be included, but it has to be certain that the user is not moving. Also the
user could take another mode than detected, such as cycling or walking. Another requirement is that the
user is not moving via local roads. If one of both phenomena occurs, it gives a false positive in the data.
When looking at travel times for pedestrians and cyclists, the mobile phone location data are not suitable.
As said when the travelled distance will be less than 10 kilometres, it is unsure what the mode will be and
95
Conclusions and recommendations Part II Location-Based Applications using mobile phone location data
if the user is travelling. With the average speed, an assumption can be made which mode is used when
the travelled distance is less than 10 kilometres.
11.4
Recommendations for further research
Full development
The application is only partly developed and that is why the ultimate results show no travel times. The
next steps in the framework (figure 6.2) are the route detection and the calculation of travel times. The
route detection will explain on which route and thus trajectories the user is travelling. For predefined
trajectories the travel time is calculated on individual-level. At last all travellers will be grouped to create
the average travel time. Research has to examine whether the amount of detected users is sufficient to
determine accurate travel times.
Predicting travel times
The proposed framework will estimate travel times. Because instantaneous travel times are currently
collected with a high accuracy by for example loop detectors, the added value of the application is
unknown. Therefore an application which calculates predicted travel times on a short-term or long-term
base, is creating more value. The added value for mobile phone location data to compute travel times, is
the known mobility patterns during the month. For one user the daily trips can be detected and with the
known trips, a prediction can be made if and when the user is travelling. Because the home locations are
known, the route of a user can be predicted. For such an application, the framework has to be extended
with for example the predicted trips of users and home location. Also, the dataset of Mezuro has to
generate events real-time or with a very short delay.
Telecommunications network properties
The network configuration of the cells in this research is provided by a telecom company. From the
analyses it turns out that the given radius is only theoretical because GPS positions of an event within a
cell shows that 10 % of these events are outside the given cell area. Therefore more research needs to be
performed to test the real configuration of cells and the parameters for each cell. Hopefully with more
research or network data a better explanation can be given for the actual cell area.
The connected cells during the data collection are used to test whether a user travels on the route
between Amsterdam and Eindhoven. Cells not connected during the data collection but close to the
tracks or road are not taken into the algorithm. This is done to make sure an event with the used cells
could be created on the routes. In the data collection no data events were generated and for that reason
cells handling a lot of data traffic could be missed as well. Therefore, the algorithms have to be more
generalized for all cells along a corridor to capture all users.
The cell area could be decreased by 50%, shown in the analysis in Chapter 8. The mode algorithm
can use half of the radius to detect in the first step the mode. In the first step an explicit rule is used in
the algorithm that if the cell area has no overlap with one of the modes, the other mode is chosen. With a
smaller radius the rule will be more applied resulting in a higher percentage of determined modes.
Case-study
Only the route between Amsterdam and Eindhoven is used in the case-study. The route exists of a train
track and a road track, both lines with no intersections. If more routes are added to the algorithm, the
route needs to be defined and extra research is necessary in order to create an application determining
the travel times. Furthermore at intersections a cell could overlap multiple roads or train lines and an
algorithm needs to detect the right track to map the cell onto the track. Denser networks will cause
an extra difficulty in the mapping algorithm. The distinction between local roads and highways, if the
cell area is overlapping both, can only be assumed based on the time interval between events. However,
the mode algorithm can also be included with the detection of local roads or highways and what the
96
Conclusions and recommendations Part II Location-Based Applications using mobile phone location data
likelihood is. When combining events, the route can be determined, including the main local routes. The
aim of the application is long-distance travel times on the highways.
Advanced location methods
A combination of spatial elements BSCM, Voronoi diagram and the site is used in the mapping algorithm.
More spatial elements could be created to improve the algorithm. Examples are the weighted Voronoi
tessellation or the smooth Voronoi tessellation, explained in Section 2.4. If and to what extent other spatial
elements improve the mapping algorithm, is unknown. The number of used spatial elements can be
increased to three or more, because the existing algorithm uses maximum two types of spatial elements.
97
Conclusions
This final chapter presents the overall conclusions. The purpose of this research was to test the usability
of mobile phone data for Location-Based Applications. With a literature study and a simple analysis the
possible applications are identified. One of the applications, ’travel time information’, is partly developed
with mobile phone data of Mezuro using a framework. First all findings of both parts will be given,
thereafter all conclusions are combined, creating an overall conclusion of developing Location-Based
Applications with mobile phone data.
Main findings
Mobile phone location datasets exist of many types of collection techniques and location techniques. In
this research a classification is proposed distinguished by different accuracies and update frequencies.
It provides a comparison method for the different types of mobile phone datasets. Every type gives
possibilities to use for different applications using the classification.
Results of the case-study lead to the fact that the practical distance to the site is smaller than 50%
of the theoretical radius. The distance to the site differs significantly inside and outside the city. However,
the functioning of the network is further unknown and unpredictable. For example, some of the recorded
GPS points are outside the cell area, at some places the coverage is only theoretical instead of practically
and when data events are generated is unknown. So, the theoretical network characteristics are not
consistent with the practical network characteristics.
A part of the application is developed with the proposed framework as guideline, see figure 6.2. The
created algorithms are validated with a case-study and show practicable results. The mode algorithm can
detect the mode with an accuracy percentage of more than 90 %. The mapping algorithm results in a
speed difference between the average speed of the GPS positions and mapping positions of 16 km/h or
lower in 95% of the combinations of events.
Performing the algorithms in the database of Mezuro, the results show a detection of more train users
than car users on the route Amsterdam and Eindhoven, based on the events while travelling. In reality,
the vehicular flow is higher than the amount of train passengers. The average speed on the route is 73
km/h while the average speed in the car is higher than in the train, which is the same as in reality.
Mobile phone location data could be useful for the application ’travel time information’. More research
and development of the complete application are needed to conclude whether mobile phone location data
are usable for the estimation of long-distance travel times. The accuracy of applications from existing
techniques such as loop detectors and bluetooth travel time systems is high and therefore the application
developed with mobile phone data adds an unknown, but small, extra value to the existing applications.
However, if in the future the travel times could be predicted, the added value of mobile phone location
data instead of existing data sources is larger. When predicting the travel times with the dataset of
Mezuro, the daily mobility patterns can be used.
98
Conclusions
Location-Based Applications using mobile phone location data
Overall conclusions
The main contributions of this study are the development of the classification, the proposed framework to
develop the application and the two methods to create the first part of the application. In the first method
a travelling user is assigned to the two most common long-distance in the Netherlands, the car or the
train. In the second method the user is located from the known cell site onto the road or train track which
depends on the determined mode. First, besides the usage in the application ’travel time information’,
the algorithms can be applied in other applications, such as the applications ’public transport lines
optimization’ and ’transportation planning’. Second, more datasets have to be classified using the classification and with more detailed datasets, more applications can be created using mobile phone location data.
When completing the application with the dataset of Mezuro, a better pronouncement can be given
of what the added value of the application is compared to the existing applications estimating travel times.
However, with the dataset of Mezuro the travel times cannot be predicted due to the data generation
each day. Therefore when constructing other applications, which do not need real-time data, with the
dataset of Mezuro, could have a higher benefit than developing the application ’travel time information’.
The above mentioned applications ’public transport lines optimization’ and ’transportation planning’ are
an example of those types of applications, furthermore because the developed algorithms in this study
could be used. If in the future a new dataset will be available with real-time mobile phone location data,
the application ’travel time information’ can be created with the proposed framework.
Privacy is a main point what needs to be respected. Personal applications will never be created with
mobile phone data without the explicit permission of the user, because it harms the privacy. In the future
telecom providers, hopefully, release more accurate data, however the hashed identification number will
change faster. Some other countries have already the more accurate mobile phone data available for
research, see for example Calabrese et al. [2011a]. With more detailed data, applications, rejected earlier
in this research, could be created. The dataset of Mezuro has a low positioning accuracy, however the
benefit of the dataset of Mezuro is the possibility of saving the Hashed ID for thirty days.
When looking back at the applications shown in Part I and its accuracy level, the spatial accuracy
of the dataset of Mezuro appeared to be approximately 1000 meters in Part II. After mapping the points
on the road or rail the accuracy increases to 500 meters. The accuracy of the classification in table 3.2
after performing the analyses in Part II is the category 100 - 1000 meters or higher. Mapping the points
and using advanced location techniques demonstrate that the accuracy is between 100 an 1000 meters.
The update frequency is set to more than 15 minutes because combinations of events from users can be
created and the average frequency is two events per hour. After comparing the results of Part I with the
new results of Part II, application in class number 4 to 7, shown in table 3.3, can be created with mobile
phone location data of Mezuro which is consistent with the classification of the dataset in Part I. The
question mark for class number 4 should be a check mark, so the application created in Part II should
have sufficient results.
In the future the number of cells and data traffic will increase, which is an advantage for mobile phone
location data. During the research the number of 4G cells is increased. The telecom provider wanted to
create a nationwide coverage and therefore much more cells are needed1 . In the cities so-called femto
cells and micro cells are present. Femto and micro cells cover a small area in for example shopping malls,
stations or even homes. With those cells the signal strength increases and those cells has a smaller cell
area compared to normal cells, thus the localization accuracy with those cells increases. More information
of the network characteristics, for example the practical used radius, is known but not available for this
research. In the future this data can be hopefully used. At last, the data traffic increases2 , thus more
events are generated and available for research.
1 http://tweakers.net/tag/4g/nieuws/.
Accessed on 31 January, 2014
2 http://cloudworks.nu/2014/02/06/wereldwijd-mobiel-dataverkeer-groeit-tussen-2013-en-2018-bijna-met-factor-11/.
on 1 February, 2014
99
Accessed
Bibliography
R. Ahas, A. Aasa, U. Mark, T. Pae, and A. Kull. Seasonal tourism spaces in Estonia: Case study with mobile
positioning data. Tourism Management, 28(3):898–910, 2007.
R. Ahas, A. Aasa, A. Roose, U. Mark, and S. Silm. Evaluating passive mobile positioning data for tourism
surveys: An estonian case study. Tourism Management, 29(3):469–486, 2008.
M. Alger, E. Wilson, T. Gould, R. Whittaker, and N. Radulovic. Real-time traffic monitoring using mobile
phone data. Proceedings on 49th European Study Group with Industry, Oxford, United Kingdom, 2004.
I. Anderson and H. Muller. Practical activity recognition using gsm data, 2006.
Y. Asakura and E. Hato. Tracking survey for individual travel behaviour using mobile communication
instruments. Transportation Research Part C: Emerging Technologies, 12(3):273–291, 2004.
V. Astarita and M. Florian. The use of mobile phones in traffic management and control. 2001 Ieee
Intelligent Transportation Systems - Proceedings, pages 10–15, 2001.
A-E Baert and D. Seme. Voronoi mobile cellular networks: topological properties. In Parallel and Distributed
Computing, 2004. Third International Symposium on Algorithms, Models and Tools for Parallel Computing
on Heterogeneous Networks, 2004., pages 29–35. IEEE, 2004.
H. Bar-Gera. Evaluation of a cellular phone-based system for measurements of traffic speeds and travel
times: A case study from israel. Transportation Research Part C-Emerging Technologies, 15(6):380–391,
2007.
M. Berlingerio, F. Calabrese, G. Di Lorenzo, R. Nair, F. Pinelli, and M-L. Sbodio. Allaboard: a system for
exploring urban mobility and optimizing public transport using cellphone data. In Machine Learning
and Knowledge Discovery in Databases, pages 663–666. Springer, 2013.
V. Blondel, M. Esch, C. Chan, F. Clerot, P. Deville, E. Huens, F. Morlot, Zbigniew Smoreda, and Cezary
Ziemlicki. Data for development: the D4D challenge on mobile phone data, 2012.
R. Borgaonkar and K. Redon. Femtocell: Femtostep to the holy grail. Presented at TROOPERS 2011, 30
March 2011, 2011.
J. Borkowski and J. Lempiäinen. Pilot correlation positioning method for urban umts networks. In
Wireless Conference 2005-Next Generation Wireless and Mobile Communications and Services (European
Wireless), 11th European, pages 1–5. VDE, 2005.
N. Caceres, J. P. Wideberg, and F. G. Benitez. Deriving origin-destination data from a mobile phone
network. IET Intelligent Transport Systems, 1(1):15–26, 2007.
N. Caceres, J. Wideberg, and F. Benitez. Review of traffic data estimations extracted from cellular networks.
IET Intelligent Transport Systems, 2(3):179–192, 2008.
N. Caceres, L.. Romero, and F. Benitez. Inferring origin–destination trip matrices from aggregate volumes
on groups of links: a case study using volumes inferred from mobile phone data. Journal of Advanced
Transportation, 2011.
F. Calabrese. Urban sensing using mobile phone network data. Ubicomp 2011 Tutorial, 2011.
F. Calabrese, F. Pereira, G. Di Lorenzo, L. Liu, and C. Ratti. The geography of taste: analyzing cell-phone
mobility and social events, pages 22–37. Springer, 2010.
F. Calabrese, M. Colonna, P. Lovisolo, D. Parata, and C. Ratti. Real-time urban monitoring using cell
phones: A case study in rome. IEEE Transactions on Intelligent Transportation Systems, 12(1):141–151,
2011a.
F. Calabrese, G. Di Lorenzo, L. Liu, and C. Ratti. Estimating origin-destination flows using mobile phone
100
BIBLIOGRAPHY
Location-Based Applications using mobile phone location data
location data. IEEE Pervasive Computing, 10(4):36–44, 2011b.
F. Calabrese, M. Diao, G. Di Lorenzo, J. Ferreira, and C. Ratti. Understanding individual mobility patterns
from urban sensing data: A mobile phone trace example. Transportation Research Part C-Emerging
Technologies, 26:301–313, 2013.
J. Candia, M. González, P. Wang, T. Schoenharl, G. Madey, and A-L Barabàsi. Uncovering individual
and collective human dynamics from mobile phone records. Journal of Physics A: Mathematical and
Theoretical, 41(22):224015, 2008.
M. Chen, T. Sohn, D. Chmelev, D. Haehnel, J. Hightower, J. Hughes, A. LaMarca, F. Potter, I. Smith, and
A. Varshavsky. Practical metropolitan-scale positioning for gsm phones, pages 225–242. Springer, 2006.
P. Cheng, Z. Qiu, and B. Ran. Particle filter based traffic state estimation using cell phone network data.
In Intelligent Transportation Systems Conference, 2006. ITSC’06. IEEE, pages 1047–1052. IEEE, 2006.
J. Costabile. Wireless postion locations. Presented at Virginia Tech Wireless Symposium 2010, 2010.
K. Curran and S. Hubrich. Optimising mobile phone self-location estimates by introducing beacon
characteristics to the algorithm. Journal of Location Based Services, 3(1):55–73, 2009.
J. Doyle, P. Hung, D. Kelly, S. McLoone, and R. Farrell. Utilising mobile phone billing records for travel
mode discovery. ISSC 2011, June 23-24 2011, Trinity College Dublin., 2011.
N. Eagle, J. Quinn, and A. Clauset. Methodologies for continuous cellular tower data analysis, pages 342–353.
Springer, 2009.
P. Fiadino, D. Valerio, F. Ricciato, and K-A Hummel. Steps towards the extraction of vehicular mobility
patterns from 3G signaling data, pages 66–80. Springer, 2012.
R. Fidel. The case study method: a case study. Library and Information Science Research, 6(3):273–288,
1984.
M. Fontaine, A. Yakkala, and B. Smith. Probe sampling strategies for traffic monitoring systems based on
wireless location technology. Technical report, 2007.
V. Frias-Martinez, J. Virseda, A. Rubio, and E. Frias-Martinez. Towards large scale technology impact
analyses: Automatic residential localization from mobile phone-call data. In Proceedings of the 4th
ACM/IEEE International Conference on Information and Communication Technologies and Development,
page 11. ACM, 2010.
M. Friedrich, K. Immisch, P. Jehlicka, T. Otterstatter, and J. Schlaich. Generating origin-destination
matrices from mobile phone trajectories. Transportation Research Record, (2196):93–101, 2010.
H. Gao and F. Liu. Estimating freeway traffic measures from mobile phone location data. European Journal
of Operational Research, 229(1):252–260, 2013.
F. Girardin, A. Vaccari, A. Gerber, A. Biderman, and C. Ratti. Quantifying urban attractiveness from
the distribution and density of digital footprints. International Journal of Spatial Data Infrastructures
Research, 4:175–200, 2009.
M. Gonzalez, C Hidalgo, and A-L Barabasi. Understanding individual human mobility patterns. Nature,
453(7196):779–782, 2008.
D. Gundlegård and J. Karlsson. Road traffic estimation using cellular network signaling in intelligent
transportation systems. Wireless technologies in Intelligent Transportation Systems, 2009.
A. Haghani, M. Hamedi, Kaveh F. Sadabadi, S. Young, and P. Tarnoff. Data collection of freeway travel
time ground truth with bluetooth sensors. Transportation Research Record: Journal of the Transportation
Research Board, 2160(1):60–68, 2010.
M. Hata. Empirical formula for propagation loss in land mobile radio services. Vehicular Technology, IEEE
Transactions on, 29(3):317–325, 1980.
S. Hoteit, S. Secci, S. Sobolevsky, G. Pujolle, and C. Ratti. Estimating real human trajectories through
mobile phone data. In Mobile Data Management (MDM), 2013 IEEE 14th International Conference on,
volume 2, pages 148–153. IEEE, 2013.
W. Huang, Z. Dong, N. Zhao, H. Tian, G. Song, G. Chen, Y. Jiang, and K. Xie. Anchor points seeking of
large urban crowd based on the mobile billing data, pages 346–357. Springer, 2010.
W. Huayong, F. Calabrese, G. Di Lorenzo, and C. Ratti. Transportation mode inference from anonymized
and aggregated mobile phone call detail records. In Intelligent Transportation Systems (ITSC), 2010 13th
International IEEE Conference on, pages 318–323, 2010.
101
BIBLIOGRAPHY
Location-Based Applications using mobile phone location data
C. Iovan, A-M Olteanu-Raimond, T. Couronné, and Z. Smoreda. Moving and Calling: Mobile Phone
Data Quality Measurements and Spatiotemporal Uncertainty in Human Mobility Studies, pages 247–265.
Springer, 2013.
S. Isaacman, R. Becker, R. Càceres, S. Kobourov, M. Martonosi, J. Rowland, and A. Varshavsky. Identifying
important places in people’s lives from cellular network data, pages 133–151. Springer, 2011.
O. Jaerv, R. Ahas, E. Saluveer, B. Derudder, and F. Witlox. Mobile phones in a traffic flow: A geographical
perspective to evening rush hour traffic analysis using call detail records. Plos One, 7(11), 2012.
Anil K Jain, M Narasimha Murty, and Patrick J Flynn. Data clustering: a review. ACM computing surveys
(CSUR), 31(3):264–323, 1999.
K. Janecek, A.and Hummel, D. Valerio, F. Ricciato, and H. Hlavacs. Cellular data meet vehicular traffic
theory: location area updates and cell transitions for travel time estimation, 2012.
Z. Koppanyi, T. Lovas, A. Barsi, H. Demeter, A. Beeharee, and A. Berenyi. Tracking vehicle in gsm network
to support intelligent transportation systems. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci.,
XXXIX-B2:139–144, 2012. ISPRS Archives.
J. Krisp. Planning fire and rescue services by visualizing mobile phone density. Journal of Urban Technology,
17(1):61–69, 2010.
M. Kwan, W. Cartwright, and C. Arrowsmith. Tracking Movements with Mobile Phone Billing Data: A Case
Study with Publicly-Available Data, pages 109–117. Springer, 2012.
S. Lakmali and D. Dias. Database correlation for gsm location in outdoor & indoor environments. In
Information and Automation for Sustainability, 2008. ICIAFS 2008. 4th International Conference on, pages
42–47. IEEE, 2008.
C. Laoudias. Overview of outdoor and indoor positioning technologies and systems. European University
Cyprus IEEE Student Section Activities, 2012.
J. Laurila, D. Gatica-Perez, I. Aad, J. Blom, O. Bornet, T-M-T Do, O. Dousse, J. Eberle, and M. Miettinen.
The mobile data challenge: Big data for mobile computing research. In Proceedings of the Workshop
on the Nokia Mobile Data Challenge, in Conjunction with the 10th International Conference on Pervasive
Computing, pages 1–8, 2012.
D-B Lin and R-T Juang. Mobile location estimation based on differences of signal attenuations for gsm
systems. Vehicular Technology, IEEE Transactions on, 54(4):1447–1454, 2005.
J Lin. Probe based arterial travel time estimation and prediction–a case study on using chicago transit
authority bus fleet as probes. In CTS-IGERT Seminar, 2009.
Y-B Lin, Y-R Haung, Y-K Chen, and I Chlamtac. Mobility management: from GPRS to UMTS. Wireless
Communications and Mobile Computing, 1(4):339–359, 2001.
S. Maerivoet and S. Logghe. Validation of travel times based on cellular floating vehicle data. In Proceedings
from 6th European congress and exhibition on intelligent transport systems and services, 2007.
R. Maestre, R. Lario, M. Munoz, R. Abad, J. Gonzalez, A. Martin, E. Perez, JL. Fdez-Pacheco, and FdezPacheco. D4d challenge commuting dynamics 4 change. In Mobile Phone data for Development, volume
2013, pages 531–553, 2013.
A. Milani, E. Gentili, and V. Poggioni. Cellular flow in mobility networks. IEEE Intelligent Informatics
Bulletin, 10(1):17–23, 2009.
R. Montoliu, J. Blom, and D. Gatica-Perez. Discovering places of interest in everyday life from smartphone
data. Multimedia Tools and Applications, 62(1):179–207, 2013.
E. Murray. Performance of network-based mobile location techniques within the 3gpp utra tdd standards.
Third International Conference on 3G Mobile Communication Technologies (Conf. Publ. No. 489), 2002.
M. Nanni, R. Trasarti, R. Furletti, G. Gabrielli, P. van der Mede, J. de Bruijn, E. de Romph, and G. Bruil.
Mp4-a project: Mobility planning for africa. In Mobile Phone data for Development, volume 2013, pages
423–446, 2013.
J. Novak, R. Ahas, A. Aasa, and S. Silm. Application of mobile phone location data in mapping of commuting
patterns and functional regionalization: a pilot study of estonia. Journal of Maps, 9(1):10–15, 2013.
M. Olama, S. Djouadi, I. Papageorgiou, and C. Charalambous. Position and velocity tracking in mobile
networks using particle and kalman filtering with comparison. Vehicular Technology, IEEE Transactions
on, 57(2):1001–1010, 2008.
102
BIBLIOGRAPHY
Location-Based Applications using mobile phone location data
J. Paek and R. Kim, J.and Govindan. Energy-efficient rate-adaptive gps-based positioning for smartphones.
In Proceedings of the 8th international conference on Mobile systems, applications, and services, pages
299–314. ACM, 2010.
T. Pei, S. Sobolevsky, C. Ratti, S.-L. Shaw, and C. Zhou. A new insight into land use classification based on
aggregated mobile phone data. arXiv preprint arXiv:1310.6129, 2013.
S. Phithakkitnukoon, Z. Smoreda, and P. Olivier. Socio-geography of human mobility: A study using
longitudinal mobile phone data. Plos One, 7(6), 2012.
A. Poolsawat, W. Pattara-Atikom, and B. Ngamwongwattana. Acquiring road traffic information through
mobile phones. In ITS Telecommunications, 2008. ITST 2008. 8th International Conference on, pages
170–174. IEEE, 2008.
J. Portela and M. Alencar. Cellular network as a multiplicatively weighted voronoi diagram. In Consumer
Communications and Networking Conference, 2006. CCNC 2006. 3rd IEEE, volume 2, pages 913–917. IEEE,
2006.
G. Qi, J. Wu, and Y. Du. Research on the traffic simulation platform based on the real-time mobile phone
data. Applied Mechanics and Materials, 253:1365–1368, 2013.
Zhijun Qiu and Bin Ran. Kalman filtering applied to network-based cellular probe traffic monitoring. In
Transportation Research Board 87th Annual Meeting, number 08-1984, 2008.
D. Quercia, N. Lathia, F. Calabrese, G. Di Lorenzo, and J. Crowcroft. Recommending social events from
mobile phone location data. In Data Mining (ICDM), 2010 IEEE 10th International Conference on, pages
971–976. IEEE, 2010.
K. Raja, W. Buchanan, and J. Munoz. We know where you are [Cellular location tracking]. Communications
Engineer, 2(3):34–39, 2004.
K. Ramm and V. Schwieger. Mobile positioning for traffic state acquisition. Journal of Location Based
Services, 1(2):133–144, 2007.
P. Ramsey. Postgis manual. 2005.
C. Ratti, D. Frenchman, R. M. Pulselli, and S. Williams. Mobile landscapes: Using location data from cell
phones for urban analysis. Environment and Planning B-Planning & Design, 33(5):727–748, 2006.
J. Reades, F. Calabrese, A. Sevtsuk, and C. Ratti. Cellular census: Explorations in urban data collection.
Ieee Pervasive Computing, 6(3):30–38, 2007.
B. Rutten, J. Dixon, and P-J. Pauwels. Datafeeds for business and governmental bodies. 2006.
G. Sagl, M. Loidl, and E. Beinat. A visual analytics approach for extracting spatio-temporal urban mobility
information from mobile network traffic. ISPRS International Journal of Geo-Information, 1(3):256–271,
2012.
M. Samiei, M. Mehrjoo, and B. Pirzade. Advances of positioning methods in cellular networks. Faculty of
Electrical and Computer Engineering, University of Sistan & Baluchestan, Iran, 2010.
R.-P. Schafer, S. Lorkowski, P. Mieth, and I. Schurr. Travel time measurements using gsm and gps probe
data. In 16th ITS World Congress and Exhibition on Intelligent Transport Systems and Services, 2009.
J. Schlaich. Analyzing route choice behavior with mobile phone trajectories. Transportation Research
Record, 2157:78–85, 2010.
J. Schlaich, T. Otterst atter, and M. Friedrich. Generating trajectories from mobile phone data. In
Proceedings of the 89th Annual Meeting Compendium of Papers, Transportation Research Board of the
National Academies, 2010.
Z. Shen and K-L Ma. Mobivis: A visualization system for exploring mobile data. In Visualization Symposium,
2008. PacificVIS’08. IEEE Pacific, pages 175–182. IEEE, 2008.
K. Sohn and D. Kim. Dynamic origin-destination flow estimation using cellular communication system.
Ieee Transactions on Vehicular Technology, 57(5):2703–2713, 2008.
T. Sohn, A. Varshavsky, A. LaMarca, M. Chen, T. Choudhury, I. Smith, S. Consolvo, J. Hightower, W. Griswold, and E. De Lara. Mobility detection using everyday gsm traces, pages 212–224. Springer, 2006.
C. M. Song, Z. H. Qu, N. Blumm, and A. L. Barabasi. Limits of predictability in human mobility. Science,
327(5968):1018–1021, 2010.
V. Soto, V. Frias-Martinez, J. Virseda, and E. Frias-Martinez. Prediction of socioeconomic levels using cell
phone records, pages 377–388. Springer, 2011.
103
BIBLIOGRAPHY
Location-Based Applications using mobile phone location data
J. Steenbruggen, M. Borzacchiello, P. Nijkamp, and H. Scholten. Real-time data from mobile phone
networks for urban incidence and traffic management-a review of application and opportunities.
Research Memorandum, 2010-3, 2010.
J. Steenbruggen, M. Borzacchiello, P. Nijkamp, and H. Scholten. The use of gsm data for transport safety
management: An exploratory review. Research Memorandum, 2011-32, 2011.
B. Swaans, P. de Wolff, and F. Schaapherder. Using floating car data on the basis of gsm’s. In Control in
Transportation Systems, volume 11, pages 573–578, 2006.
Cambridge Systematics. Traffic congestion and reliability: Trends and advanced strategies for congestion
mitigation, volume 6. Federal Highway Administration, 2005.
T. Tettamanti, H. Demeter, and I. Varga. Route choice estimation based on cellular signaling data. Acta
Polytechnica Hungarica, 9(4), 2012.
The PostGIS Development Group. Postgis manual. last checked on 14 January 2014. URL
www.postgis.net/docs.
V. Traag, A. Browet, F. Calabrese, and F. Morlot. Social event detection in massive mobile phone data using
probabilistic location inference. In Privacy, security, risk and trust (passat), 2011 ieee third international
conference on and 2011 ieee third international conference on social computing (socialcom), pages 625–628.
IEEE, 2011.
I. Trestian, S. Ranjan, A. Kuzmanovic, and A. Nucci. Measuring serendipity: Connecting people, locations
and interests in a mobile 3G network. Imc’09: Proceedings of the 2009 Acm Sigcomm Internet Measurement
Conference, pages 267–279, 2009.
E. Trevisani and A. Vitaletti. Cell-id location technique, limits and benefits: an experimental study. Sixth
Ieee Workshop on Mobile Computing Systems and Applications, Proceedings, pages 51–60, 2004.
University of Maryland Transportation Studies Center. Final evaluation report for the capital-its operational test and demonstration program. University of Maryland College Park, 2004.
D. Valerio. Road traffic information from cellular network signaling. Telecommunications Research Center
Vienna Technical Report FTW-TR-2009-003, 2009.
J.W.C. Van Lint. Empirical evaluation of new robust travel time estimation algorithms. Transportation
Research Record: Journal of the Transportation Research Board, 2160(1):50–59, 2010.
M-H Wang, S. Schrock, N. Vander Broek, and T. Mulinazzi. Estimating dynamic origin-destination data
and travel demand using cell phone network data. International Journal of Intelligent Transportation
Systems Research, pages 1–11, 2013.
J. White, J. Quick, and P. Philippou. The use of mobile phone location data for traffic information. In
Road Transport Information and Control, 2004. RTIC 2004. 12th IEE International Conference on, pages
321–325. IET, 2004.
M. Witteman. Efficient proximity detection among mobile clients using the gsm network. Master Thesis,
University Twente, 2007.
R. Xie, H. Xu, and Y. Yue. Using Mobile Phone Location Data for Urban Activity Analysis, pages 30–43.
Springer, 2013.
K. Yadav, V. Naik, A. Singh, P. Singh, and U. Chandra. Low energy and sufficiently accurate localization
for non-smartphones. In Mobile Data Management (MDM), 2012 IEEE 13th International Conference on,
pages 212–221. IEEE, 2012.
H. Zang and J. Bolot. Anonymization of location data does not work: A large-scale measurement study.
In Proceedings of the 17th annual international conference on Mobile computing and networking, pages
145–156. ACM, 2011.
Y. Zhang. Travel demand modeling based on cellular probe data. University of Wisconsin–Madison, Madison, Wis., 2012. Advisor: Bin Ran. Ph.D. University of Wisconsin–Madison 2012. Includes bibliographical
references (p. 138-146).
P. Zikopoulos and C. Eaton. Understanding big data: Analytics for enterprise class hadoop and streaming
data. McGraw-Hill Osborne Media, 2011.
M. Zilske and K. Nagel. Building a minimal traffic model from mobile phone data. In Mobile Phone data
for Development, volume 2013, pages 504–514, 2013.
104
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising