Peer-to-Peer Computing for Mobile Networks: Information Discovery and Dissemination Maria Papadopouli Henning Schulzrinne

Peer-to-Peer Computing for Mobile Networks: Information Discovery and Dissemination Maria Papadopouli Henning Schulzrinne
Maria Papadopouli Henning Schulzrinne
Peer-to-Peer Computing for
Mobile Networks: Information
Discovery and Dissemination
– Monograph –
July 14, 2008
Springer
Berlin Heidelberg NewYork
Hong Kong London
Milan Paris Tokyo
This work is a metaphorical bridge for me, bringing together research
results obtained, while I was in the following academic institutions:
•
•
•
Computer Science Department of Columbia University, as a Ph.D. student
(1996–2002)
Department of Computer Science of the University of North Carolina at
Chapel Hill (UNC), as an assistant professor (2002–2004)
Department of Computer Science of the University of Crete and the Institute of Computer Science of the Foundation for Research and TechnologyHellas (FORTH-ICS), as an assistant professor (2004–)
Parts of this book are based on research results obtained in joint work with
Haipeng Shen, Merkourios Karaliopoulos, Félix Hernández-Campos, George
Tzagkarakis, Panagiotis Tsakalides, Elias Raftopoulos, Manolis Ploumidis,
Manolis Spanakis, Mark Lindsey, Francisco Chinchilla, Thomas Karagiannis, Charalampos Fretzagias, Niko Kotilainen, Lito Kriara, and Konstantinos
Vandikas. Haipeng Shen, Merkourios Karaliopoulos, and Félix HernándezCampos played an important role in the wireless measurement and modeling
research. I am grateful to have the opportunity to closely collaborate with
them.
Thanks go to Jim Gogan, Todd Lane, Kevin Jeffay, Don Smith and their
students for helping to setup the monitoring and data collection system for
our network measurement research while at UNC. I am also grateful to Diane
and Mark Pozefsky for their support while at UNC. FORTH-ICS has provided a state-of-the-art infrastructure to continue my research. Thanks to all
my colleagues at the University of Crete and FORTH-ICS for their support
that made the transition from US to Greece easier. I would like to acknowledge the support of the director of FORTH-ICS, Constantine Stephanidis. I
am also grateful to Anthony Ephremides, Leandros Tassiulas, and Apostolos
Traganitis for their mentoring.
Elias Raftopoulos and Manolis Ploumidis—my first graduate students at
the University of Crete and FORTH—have been enthusiastically participating
in the wireless measurement project. Several other students also contributed
in the implementation of CLS, 7DS and applications that use the peer-topeer paradigm: Denis Abramov and Stelios Sidiroglou-Douskos at Columbia
University; Mark Lindsey, Daniel Plaisted, Julien Jomier at the University
of North Carolina at Chapel Hill; Niko Kotilainen at Jyväskylä University;
and Kostantinos Vandikas, Lito Kriara and Sofia Nikitaki at the University
of Crete. It was a pleasure working with all of them.
I am grateful to my editorial assistant Anthony Griffin for reviewing this
manuscript several times and providing useful feedback. Several people reviewed the monograph and provided feedback: Thanasis Mouchtaris, Anargyros Papageorgiou, Haipeng Shen, Leandros Tassiulas, Apostolos Traganitis,
Panos Tsakalides, and George Tzagkarakis. I would like to acknowledge Antonis Makrogiannakis for helping me with Latex, and Mary-Rose James and
VI
Vana Manasiadi for additional editorial suggestions. I also wish to thank my
literary agents Susan Lagerstrom-Fife and Sharon Palleschi for their patience.
Finally, I would like to gratefully acknowledge the support of several agencies:
•
•
•
•
•
the Greek General Secretariat for Research and Technology (Regional of
Crete Crete-Wise and 05NON-EU-238)
the European Commission (MIRG-CT-2005-029186)
the Department of Computer Science of the University of North Carolina
at Chapel Hill for their generous startup fund and the UNC Junior Faculty
Development Award
the Institute of Computer Science of the Foundation for Research and
Technology-Hellas, for their state-of-the-art infrastructure and generous
startup fund
IBM, for the IBM Faculty Awards in 2003 and 2004
This monograph also marks an academic journey, that started and ended
in Crete. A major visit in this journey: Columbia University and New York
City, places that offered intellectual stimulations with such generosity. In 2002,
I arrived in a very warm and supportive academic family: the Department of
Computer Science at the University of North Carolina at Chapel Hill. Being
an assistant professor in this institution—my first real job—was a particularly rewarding experience. I would really like to thank all of them for their
support in several different ways. My return as a faculty at the Department
of Computer Science in the University of Crete—where I had completed my
undergraduate studies—offered a sense of continuity that has a strong impact
on me.
The following people made an immense impact on enabling this journey:
•
•
•
•
Manolis Maragkakis, a beloved math teacher.
Stelios Orphanoudakis—Professor of the Department of Computer Science
at the University of Crete—a charismatic human being that made a large
impact on the development of FORTH-ICS and forthnet S.A.
George Papadopoulis, my father—at the age of 70 still creative and
active—and Xacousti Papadopouli-Plevraki, my mother, for her kindness
and generosity.
Dr. Eva Papadopouli, my sister, for always being there, ready to help,
advise, care, and love.
This monograph is dedicated to them.
Maria Papadopouli
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1 Wireless data communications . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Mobile information access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Wireless Internet via APs . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Infostations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3 Peer-to-Peer systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Target mobile computing environment . . . . . . . . . . . . . . . . . . . . .
1.3.1 High spatial locality of information and queries . . . . . . .
1.3.2 Heterogeneity in application requirements . . . . . . . . . . . .
1.3.3 Enhancement of information access . . . . . . . . . . . . . . . . . .
1.4 Resource sharing using 7DS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Overview of this monograph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1
4
5
6
7
10
11
12
12
12
14
16
2
7DS architecture for information sharing . . . . . . . . . . . . . . . . . .
2.1 Overview of 7DS architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Communication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.2 Cache management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.3 Power conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Preventing denial-of-service attacks . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Encouraging cooperation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1 Micropayment mechanisms . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Reputation mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4 Location-sensing using the peer-to-peer paradigm . . . . . . . . . . . .
2.4.1 Overview of CLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4.2 Particle filter-based framework . . . . . . . . . . . . . . . . . . . . . .
2.4.3 Performance of CLS and other related systems . . . . . . . .
2.5 Applications using information sharing via 7DS . . . . . . . . . . . . .
2.5.1 Web browsing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5.2 Notesharing and whiteboard tool . . . . . . . . . . . . . . . . . . . .
2.5.3 Multimedia traveling journal . . . . . . . . . . . . . . . . . . . . . . . .
17
17
20
20
21
23
24
25
28
28
30
34
35
37
39
39
41
VIII
Contents
2.6 Related mobile peer-to-peer computing systems . . . . . . . . . . . . . 43
2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3
Performance analysis of information discovery and
dissemination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1 Information discovery schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Simulation assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3 Data dissemination benchmarks . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4 Density of dataholders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5 Impact of energy conservation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.6 Average delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.7 Scaling properties of data dissemination . . . . . . . . . . . . . . . . . . . .
3.8 Models of information dissemination . . . . . . . . . . . . . . . . . . . . . . .
3.8.1 Simple epidemic model . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.8.2 Diffusion-controlled process . . . . . . . . . . . . . . . . . . . . . . . . .
3.9 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
47
47
50
52
52
57
62
62
65
68
69
72
4
Empirically-based measurements on wireless demand . . . . . . 77
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Campus-wide wireless infrastructure . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Monitoring and data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.3.1 Packet header traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.3.2 http traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.3 snmp traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.4 syslog traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3.5 Privacy assurances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.6 Client identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.4 State, history, visits and sessions . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.5 Wireless traffic demand at APs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.5.1 Data acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.5.2 Comparative analysis of wireless traffic load at APs . . . . 87
4.6 Application-based characterization of wireless demand . . . . . . . 88
4.7 Locality of web objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.7.1 http requests model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.7.2 Same-client repeated requests . . . . . . . . . . . . . . . . . . . . . . . 96
4.7.3 Same-AP repeated requests . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.7.4 AP-coresident-client repeated requests . . . . . . . . . . . . . . . 97
4.7.5 Same-building and campus-wide repeated requests . . . . . 99
4.8 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5
Modeling the wireless user demand . . . . . . . . . . . . . . . . . . . . . . . . 105
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.2 Client access patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.2.1 Session duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.2.2 Transient sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
Contents
IX
5.2.3 Revisits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3 Roaming across APs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.4 Arrivals of wireless clients at APs . . . . . . . . . . . . . . . . . . . . . . . . . . 122
5.4.1 Time-varying Poisson process . . . . . . . . . . . . . . . . . . . . . . . 123
5.4.2 Arrival process of visits at wireless hotspots . . . . . . . . . . 124
5.5 Methodology for modeling user demand . . . . . . . . . . . . . . . . . . . . 128
5.5.1 Sessions and flows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.5.2 Models of user demand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.6 Syntrig: a synthetic traffic generator . . . . . . . . . . . . . . . . . . . . . . . 135
5.7 Scalability and reusability in user demand models . . . . . . . . . . . 137
5.7.1 Variation of the session arrival rate within a day . . . . . . 140
5.7.2 Variation of the session-level flow-related variables . . . . . 140
5.8 Evaluation of user demand models . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.8.1 Statistical-based evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.8.2 Systems-based evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
5.9 Singular spectrum analysis of traffic at APs . . . . . . . . . . . . . . . . . 155
5.10 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
5.11 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6
Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
6.1.1 Mobile peer-to-peer computing . . . . . . . . . . . . . . . . . . . . . . 163
6.1.2 Wireless measurements and modeling . . . . . . . . . . . . . . . . 165
6.2 Directions for future research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
6.2.1 Increasing capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6.2.2 Capacity planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
6.2.3 Network interface and channel selection . . . . . . . . . . . . . . 171
6.2.4 Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
6.3 Bio-inspired computing networks . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.4 New horizons in cross-disciplinary research . . . . . . . . . . . . . . . . . . 176
Appendices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
A
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
B
Wireless measurement-based data repositories . . . . . . . . . . . . . 181
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
1
Introduction
“This world!
This small world the great!”
Odysseus Elytis
1.1 Wireless data communications
In the 19th century, the advent of the telegraph and telephone forever changed
how messages were transmitted around the world. Radio, television, computers, and the Internet further revolutionized communication in the 20th
century. Equally important, the effect of Moore’s law1 is transforming a niche
technology into a ubiquitous one, expanding the innovations in an increasingly
networked world. Wireless devices are becoming smaller, easier to use and pervasive. In effect, people are depending more and more on wireless information
wherever they are. At the dawn of the 21st century, pervasive computing
weaves itself into our lives [351, 6, 4, 29, 42, 48, 23, 50, 47, 38, 19, 22, 18].
Today people access local and international news, traffic or weather reports, sports, maps, guide books, music, video files and games via the Internet [27, 52]. Data volume—medical data, personal multimedia, surveillance for
urban areas, web data—is exploding. Similarly, the importance of meta-data,
i.e., semantic annotations of what this data means, is also rapidly growing.
Analysts expect the growth in mobile location-based services in the European
market to reach 622 million euros in 2010, estimating that 18 million users in
Europe will subscribe to location-based billing plans by then. Similarly, there
is a growing interest in the transportation industry to equip vehicles with navigation tools and location-based services [31, 27, 35, 32, 16, 25]; in the medical
1
Historically, according to Moore’s Law (posited by Intel founder Gordon Moore
in 1965), the number of transistors on a chip roughly doubles every two years,
resulting in more features, increased performance and decreased cost per transistor.
2
1 Introduction
community with patient monitoring and assistive technology [9, 7, 28, 21]; and
in the entertainment industry, environmental activities and emergency situations for disaster relief. While in 2004, approximately five million portable
navigation devices were being shipped worldwide, in 2006 this number was almost quadrupled and it has been forecasted to reach 80 million in 2010. There
is also a large increase in the number of PDAs and smartphones world-wide
(Table 1.1 and Figure 1.1).
Examples of vehicle-based services are location tracking, maps, driving directions, driver or trip task lists, address lookup, traffic and routing information, fleet tracking, and inter-vehicle entertainment [32, 35]. Within Germany,
more than 4,000 motorway sensors nationwide gather data to inform motorists
of relevant developments as they happen. Location-aware services have been
deployed to provide information about over 1.5 million locations in North
America, including hospitals, hotels, banks, ATMs, golf courses, museums,
schools, shopping centers, and tourist attractions [32].
In environmental activities, sensors monitor light, temperature, humidity,
pollution, barometric pressure, and the presence of animals, reporting data
that are typically relayed to central points for analysis and interpretation.
Such mechanisms allow biologists to observe and protect habitat with minimal human interference. Entertainment industry uses include mobile gaming,
communication and social networking, such as “friends finder” services, posting messages to a map and deciding who can read it, creating and swapping
location-tagged photos [53, 52]. The growth of wireless data communications
has amplified this trend by making information easier to share, and thus,
increasing the amount of information that is shared.
The use of wifi routers is becoming close to mainstream in the US and
Europe. In 2006, 8.4% and 7.9% of all such respective households have deployed such routers and in that year only 200 million chipsets were shipped
worldwide, nearly half of the 500 million cumulative total [1]. China already
has the same number of mobile-phone users (500 million) as the whole of
Europe.
Popular applications and services from wired networks shift to the wireless
arena and new applications are increasingly being deployed. The proportion
of wireless streaming audio and video traffic increased by 405% between 2001
and 2003/2004, peer-to-peer from 5.2% in 2001 to 19.3% in 2003/4, filesystems
from 5.3% to 21.5%, and streaming from 0.9% to 4.6%. Between January 2006
and March 2006, Verizon wireless customers exchanged more than 171 million
picture and video messages over its nationwide network.
New applications and tools for storing and sharing information, such as
Flickr, YouTube, and Me.dium, have allowed the formation of new types of
social networks and online communities.
The value of the networking environments is growing as fast as the number of its users. However, as transistors continue to shrink, running at higher
speeds, power consumption and heat become potential limiting factors. More
importantly, the demand for information and power is accelerating with the
1.1 Wireless data communications
3
advancement of displays, graphics, and antennae, and the increase in bandwidth capacity. Although there are improvements in energy consumption, battery capacity grows slowly and power remains an important challenge in mobile computing [46]. Furthermore, as the wireless demand grows, more possibilities for single point failures and service degradation exist. Current wireless
devices experience frequent disconnections, packet losses, and delays, while
wireless infrastructures are unable to successfully support applications with
real-time constraints. A denser deployment of wireless networks may alleviate
the problem of intermittent connectivity but would exacerbate the interference, if carried out indiscriminately.
Two distinct aspects of wireless communication that make wireless networks more vulnerable than the wired ones are the fading and the interference
between receiver and transceiver. The phenomenon of fading is characterized
by the time variation of the channel strengths due to the small-scale effect of
multipath fading or the larger-scale effects due to attenuation and shadowing
by obstacles. Examples of various wireless technologies with their bandwidth
requirements, frequency, and effective range are presented in Table 1.2.
PDA
Phone-PDA
2001 2002 2003 2004 2006 2008
15,336 15,714 18,946 23,854 38,320 58,509
4.3
10.8
20.6
29.3
39.9
45
Table 1.1. PDAs and smartphones worldwide (thousands). Source: eTForecasts
report on ”Worldwide PDA Markets” [14].
Fig. 1.1. The market growth in handheld devices (in millions).
4
1 Introduction
Technology
Bluetooth
Maximum bit-rate
724 Kbps
Frequency
2.4 GHz
Infrared
<4 Mbps
> 105 GHz
ieee802.11b
1 Mbps
11 Mbps
2.4 GHz
3g
cdpd
144 Kbps vehicle
384 Kbps pedestrian
1-2 Mbps stationary
19.2 Kbps
1.885 -2.2 GHz
Effective range
10 m
20 m
100 m
10 cm − 2 m
outdoors 550 m
indoors 50 m
outdoors 160 m
indoors 50 m
50 km
1.8-2.5 GHz
Table 1.2. Examples of various wireless technologies with their bandwidth requirements, frequency, and effective range.
1.2 Mobile information access
Mobile information access is the underlying querying and data acquisition
mechanisms via which a wireless device searches for, and receives information from other devices while mobile. The mechanism describes the system
architecture, its main components, and its interactivity model. The latter
characterizes whether or not the communication between the “data-querier”
and “data-provider” is synchronous. In synchronous access, a user specifies a
data request in real-time, and the system accesses the information from the
source or its local cache. Thus, a dependency between the request for data and
the corresponding response exists. Alternatively, in the asynchronous case, the
request is triggered by an event or an application and the system does not
wait for a response.
Prefetching or hoarding is a type of asynchronous access in which, prior to
its disconnection, a device prefetches the data from the file system. It aims
to alleviate user-perceived latencies by providing data while the device remains disconnected and reintegrating upon reconnection [226, 242]. Hoarding
strategies exploit the detection of “file working sets” [337] and semantic relationships among files. Designed for traditional file-systems settings, hoarding
is appropriate when the system can predict and locate the information to be
prefetched. However, it can be inadequate in dynamic environments when a
device searches for new data while mobile.
The mobile information access can be classified according to its dependency on an infrastructure and interactivity model into the following three
main categories, the first two of which require an infrastructure:
1. wireless Internet via APs
2. data access via infostations
3. data access using the peer-to-peer paradigm
1.2 Mobile information access
5
1.2.1 Wireless Internet via APs
A wireless access point (AP) is a device that connects other wireless-enabled
devices in its wireless range to form a wireless network. Usually, it connects
to a wired network, and can relay data between wireless-enabled devices in
its range and devices of the wired network. Within the range of an AP, a
wireless end-user has a full network connection with the benefit of mobility.
Many APs can be connected together to create larger networks that allow
“roaming” between them; APs relay packets between each other, so that a
packet can be delivered to its final destination, a roaming client. In contrast
to infrastructure-based networks, ad-hoc networks operate in a self-organizing,
autonomous manner.
APs may also form mesh networks. In general, mesh networks are ad-hoc
multi-hop networks with a mesh topology, that consist of mostly stationary
wireless devices that cooperate with one another to route packets, forming the
network’s backbone. In addition to the routing capability for gateway/bridge
functions as in a conventional wireless network, these mesh routers support
routing mechanisms for mesh networking. With gateway functionality, mesh
routers can be connected to the Internet. Non-routing mobile devices or mesh
clients can connect to mesh nodes and use the backbone to communicate
with one another over large distances and with nodes on the Internet. Clients
with an Ethernet interface can be connected to mesh routers via Ethernet
links. Thus, mesh networks are heterogeneous, hybrid and possibly multioperatored networks, composed of wired and wireless, stationary and mobile
devices. Unlike mesh routers that may not have power constraints, typical
mobile clients require the support of power-efficient mechanisms.
Mesh networks extend high-speed local area networking services to a wider
area. A number of community wireless mesh networks exist, such as the Seattle
Wireless and Roofnet networks. The latter is a 38-node multi-hop ieee802.11
network spread over four square kilometers of an urban area. Commercial
mesh Internet access services and technologies include MeshNetworks Inc.,
Ricochet, Meraki Networks, and Tropos Networks.
The wireless Internet via APs aims at “continuous” wireless Internet access
broadly defined by three types of wireless networks, namely, wireless wide area
networks (WANs), wireless local area networks (LANs), and wireless personal
area networks (PANs). Examples include: cdpd, 3g wireless, ieee802.11 and
two-way pagers [137, 299]. Table 1.3 presents some examples of U.S. wireless
networks and their wireless transmission technology.
Wireless WANs are licensed, strictly regulated wireless networks used by
cell phones and wireless modems; examples include cdpd, tdma, gprs, gsm,
3g wireless, and two-way pagers. Wireless WAN access is typically characterized by low bit-rates and long delays. Unlike wireless WANs, wireless LANs,
such as ieee802.11, HiPerLan, dect, operate in unlicensed spectrum.
In several cities worldwide, nonprofit, educational, and commercial organizations have installed ieee802.11 APs to provide free wireless access to
6
1 Introduction
Technology
tdma
gsm/gprs
cdma
cdpd
Carrier
AT&T (Cingular), Digital PCS, CellularOne
Omnipoint, AT&T(Cingular), Voicestream, Unicel, PinPoint Wireless
AirTouch, Verizon, General Wireless, Sprint PCS, MCIWorldCom
Qwest, Bell Atlantic Mobile
Digital PCS, BellAtlantic/Nynex, AT&T Verizon Wireless, Omnisky
Table 1.3. Examples of U.S. wireless networks and their wireless transmission technology.
the Internet (e.g., Figure 1.2). In the late 1990s and early 2000s, APs grew
rapidly in popularity, as they were low-cost and simple mechanisms to expand
the wireless connectivity of an existing infrastructure.
Wireless PANs are short-range, low-power networks via bluetooth, homerf,
rfid, irda, and ieee802.15 technologies. Such networks are already deployed
in home and office environments.
These new technologies and uses raise new issues related to ethics, security, privacy, confidentiality, and legislation. Take as an example the rfid tagging: While there are interoperability issues, researchers predict that within
a twenty-year period, rf tags will be pervasive, first as passports, driver’s
licenses, medical bracelets, credit cards, and then, as implantable chips in
humans. Even more data will be captured, stored, and analyzed. Implanting
rf tags in humans provokes numerous ethical, legislation, and privacy-related
concerns [163, 15].
1.2.2 Infostations
An infostation is a wireless-enabled server attached to a data repository. Wireless devices in the range of an infostation can query the infostation to acquire
data. Although typical infostations are stationary, we can envision robots
roaming an area and acting as mobile infostations. Like APs, infostations can
be stand-alone servers or clustered with other infostations and connected over
terrerstrial links, such as t1, sonet, and/or fiber.
An infostation located in popular areas—such as at traffic lights, building
entrances, cafes, and airport lounges—can provide information access to users
in their short-range, operating according to the server-client paradigm.
In general, a client can acquire the data from an infostation in an asynchronous or synchronous manner. For instance, an infostation may multicast
the data periodically, while clients subscribe to this multicast channel to receive the relevant information. The infostation paradigm can be extended to
a network of infostations that act as proxies, caching data and forwarding
requests to other infostations or to the Internet. Infostations were first mentioned by Imielinski and Badrinath in the DataMan project [303, 198].
1.2 Mobile information access
7
Fig. 1.2. The New York City wireless public access points as of May 2002 [34]. The
wireless access points are depicted as solid triangles.
1.2.3 Peer-to-Peer systems
A peer-to-peer system is a distributed system without any centralized control
or infrastructure. The software running at each peer host is equivalent in
functionality, so that peers can dynamically share their resources by both
requesting and offering services, rather than being confined to either client
or server roles. Peer-to-peer systems are distinguished by the following main
criteria:
•
•
•
self-organization
autonomy
symmetry
The peer-to-peer paradigm does not require the support of any infrastructure and is based on the resource sharing among wireless devices. These
devices (or simply peers) cooperate dynamically based on some policies that
specify their cooperation and functionality. Unlike the traditional client-server
model, in peer-to-peer computing, there is no centralized powerful device or
cluster of devices and participants (peers) communicate to discover and share
resources. Examples of such resources are computing power, data, and network
bandwidth.
The peer-to-peer concept was originally introduced in the context of distributed systems, but in the mid-1980s, the term was used by local area network vendors to describe their connectivity architecture.
8
1 Introduction
The term reappeared in 1999 with the widespread popularity of Napster [272] and by early 2001, Napster claimed over 60 million registered users
sharing terabytes of music files. Like Napster, Gnutella [159] and Freenet [49]
are two other peer-to-peer systems that gained popularity in early 2000s by
enabling users to share data in a fixed wired network. While Napster had
focused on sharing music files and Gnutella any type of file sharing, Freenet
facilitated encrypted and anonymized distributed storage.
In early peer-to-peer systems, such as Gnutella and Freenet, peers were
“blindly” sending their requests to many other peers without keeping track
of which peer had a specific document, resulting in large searching delays.
Later, peer-to-peer systems, such as CAN and Chord [334], imposed a consistent mapping between an object key and a peer in the network. Each peer
maintains information about a number of other peers in the system, creating
a logical topology that provides some guarantees about searching delays.
In the late 1990s, the research community had been investigating replicated
storage systems based on the peer-to-peer architecture meant for wide-scale,
Internet-based use. Examples of these research efforts include the Ficus [281],
JetFile [165] and Bayou [332], with main focus on update policies, data consistency, and reconciliation algorithms. Since then, research in peer-to-peer
systems has considered mostly wired-based infrastructure and use, aiming to
improve scalability, robustness, and efficiency in routing, indexing, and information searching and dissemination.
Skype is a popular Internet telephony program that applies the peer-topeer paradigm. The peer-to-peer paradigm has been also utilized for content
distribution, such as OS or anti-virus updates, in a wired-based infrastructure
of PCs. Examples of such systems are Limewire [24], OpenFT [36], BitTorrent [199] and Avalanche [158].
BitTorrent has quickly emerged as a viable and popular alternative to file
mirroring for the distribution of large content [85]. To share a file or group
of files through BitTorrent, clients first create a file with meta-data, such
as a description of the files to be shared, the host that coordinates the file
distribution, suggested names for the files, their lengths, the piece length used,
and a sha-1 hash code for each piece to be used to verify the integrity of the
received data. After the creation of this file, a link to it is placed on a website
or elsewhere, and it is registered with a tracker which maintains lists of the
current participants. A client that has downloaded a file may also act as a
dataholder, providing a complete copy of the file.
The information theory community has proposed some routing protocols
in ad-hoc networks based on cooperative diversity schemes. These schemes
send information through multiple relays concurrently. The destination can
then choose the best of many related packets or combine information from
multiple packets to reconstruct the original data. Avalanche, for example,
uses network coding techniques that allow each PC in the distribution network to generate and transmit blocks of information. Avalanche peers produce
linear combinations of the blocks they have already cached. Such combina-
1.2 Mobile information access
9
tions are distributed together with a tag that describes the parameters in the
combination. Any peer can generate new unique combinations from the combinations it already has. A peer can decode and build the original file when it
has sufficient independent combinations. The network encoding ensures that
any block uploaded by a given peer can be of use to any other peer.
Today, in wireless campus infrastructures, the web and peer-to-peer are
the most dominant application types both in terms of number of flows and
bytes. One of our recent measurement studies [300] showed that around 30% of
the flows (or 20% of total bytes) accessed via the wireless infrastructure of the
UNC campus in April 2005 had been generated by peer-to-peer applications
and around 70% of clients had at least one flow generated by a peer-to-peer
application. Additionally, BitTorrent peer-to-peer file sharing was found to be
the biggest consumer of bandwidth, accounting for about 30% of the total data
transferred in an application-based traffic classification study in the Roofnet
mesh network [85].
While web requests accounted for a minority of the data transferred, they
contributed a larger number of flows than any other application (68% of the
flows were web, compared to the 3% that were BitTorrent).
Since the appearance of wireless ad-hoc networks, the peer-to-peer paradigm
has been playing a prominent role in routing protocols for such networks [92,
324, 359, 139, 326, 297, 157, 194, 218, 192]. Typical ad-hoc networks assume
cooperative devices that will relay a packet until it reaches its final destination in dense, large-scale, mostly-connected wireless networks. More recently,
mesh networks have been instantiating the peer-to-peer paradigm with their
“grass-roots” approach to provide wireless access with a minimum infrastructure that creates a mostly stationary multi-hop network. Sensor networks—
often composed by devices, unattended, with limited capabilities—may form
ad-hoc networks for monitoring various environmental conditions. There are
two clear trends in the networking horizon: more and more networks become
from centralized, to distributed, to autonomous, self-organized and pervasive.
Devices become smaller, more networked and more programmable.
Another manifestation of the mobile peer-to-peer paradigm has taken place
in rural areas of developing nations with vehicles offering web content to computers with no Internet connection. Specifically, the United Villages project
[343] provides villagers in Asia, Africa, and Latin America with a digital
identity and access to locally-relevant products and services using a storeand-forward, “driven-by WiFi” technology. The mobile APs are installed on
existing vehicles (e.g., buses and motorcycles) and automatically provide access along the road. Whenever a mobile AP is within range of a real-time
wireless Internet connection, it transfers the data from and for those kiosks.
In this work, our attention shifts to wireless networks that are sparser and
frequently disconnected from the Internet. In such networks, a device is not
always connected to the Internet, nor within wireless range of another device.
Real-life networks exhibit a large diversity in application requirements, device
characteristics, connectivity, density and cooperation, and scale.
10
1 Introduction
1.3 Target mobile computing environment
Environments that exhibit the following two characteristics particularly motivated this research:
•
•
frequent disconnections from the wireless Internet due to mobility
high spatial locality of information
A network of wireless devices is characterized by high spatial locality of information when wireless devices in close geographic proximity access similar
data. For example, devices running location-based services that are in close
proximity request similar type of data, such as traffic reports, and popular
tourist sites.
This networking environment may encompass a wide range of wirelessenabled devices with different energy and storage constraints, various network
interfaces, mobility patterns, and incentives for cooperation with each other.
It may include handheld devices (such as iPAQs, palm pilots, and mobile
phones) with memory and power constraints, devices with higher availability
in storage and power (such as laptops or vehicular wireless-enabled systems),
and infostations with sufficient storage and no power constraints. Devices
may be autonomous, not necessarily connected to the Internet, mobile or
stationary.
Currently, mobile users access information using a wireless LAN or WAN
infrastructure. Most wireless data WAN access, such as Vindigo [348] or
RIM [315], is only available in major metropolitan areas. Although ieee802.11
networks have become widely available in universities, corporations, and public areas providing wireless LAN access; areas abound in which communication
infrastructure is either not available or overloaded, and expensive to access.
Examples are: in emergency situations, disaster relief, rescue operations, inside a tunnel or in a rural area.
Given the exorbitant license fees paid out in recent government auctions
of spectrum, the bandwidth expansion route is bound to be expensive. For
example, European telecommunications giants spent $100 billion in 2000 for
3G license fees [164]. Similarly, the cost of tessellating a coverage area with
a sufficient number of APs or infostations coupled with the cost of associated high speed wired infrastructure may be prohibitive. Though conditions
vary widely, building underground fiber networks in highly congested urban
areas can cost $100 or more per foot of cable installed. In contrast, placing
fiber underground in the suburbs costs $7 to $25 a foot. More importantly,
the deployment of APs without capacity planning or mechanisms for dynamic
AP-configuration and self-organization—in terms of power control, channel
selection, user admission control and bit-rate selection—may result in interference and degradation of the wireless access.
For the next few years, continuous connectivity to the Internet world-wide
will not be available at low cost for mobile users roaming a metropolitan area;
1.3 Target mobile computing environment
11
devices will continue to experience changes in the availability of bandwidth
and frequent interruptions of connectivity due to host mobility.
1.3.1 High spatial locality of information and queries
The growing popularity of location-dependent services, collaborative applications, peer-to-peer systems, and interactive games running on mobile devices
will result in high spatial locality of information. For instance, in an urban
environment, an airport, or a commercial center, users with wireless-enabled
devices access local and world news, sports news, train schedules, weather
reports, maps, and routes. Similarly, users in a corporation, in an academic
department, or at a gathering, may share photos or video clips from their
recent vacation; while people standing in the line of a theater, or in front of a
sculpture in a museum, may share reviews about the play or exhibition.
Unique Users
Oct−01
Nov−01
Dec−01
Jan−02
Feb−02
Mar−02
1,540,000 1,560,000 1,580,000 1,600,000 1,620,000 1,640,000 1,660,000 1,680,000
Fig. 1.3. The unique number of Avantgo users that subscribe to the NYTimes news
on-line information, respectively. Source: The New York Times on the Web [279].
An increasing number of wireless Internet and information providers target handheld devices, e.g., Avantgo (Figure 1.3), Vindigo, Omnisky Corp..
For example, Avantgo regularly listed The Wall Street Journal, The New York
Times, and USA Today as the top ten user sites at www.avantgo.com/channels.
Similarly, Vindigo licenses its technology to newspapers and hosts the service
on behalf of its partners. Newspapers simply supply the listings in a structured
format and update them periodically. In a different networking environment,
12
1 Introduction
a highway, vehicles with wireless access request weather and traffic reports,
maps, and routes, generating queries with high spatial locality information.
1.3.2 Heterogeneity in application requirements
Applications dictate particular requirements for reliability, delay, and bandwidth that can vary greatly. Unlike voice communication, many wireless applications are delay-tolerant, i.e., they possess loose delay constraints (of the
order of minutes). In pervasive computing, context-based information may
change dynamically and can be inherently imprecise. Depending on the application, users may have flexible requirements regarding information accuracy,
freshness, precision, and media quality. Often users may trade the response
time for less timely or lower resolution data. In other cases, up to a few hours of
delay can be tolerated, as long as messages eventually reach their destination
(e.g., tourists with wireless-enabled cameras that wish to send photographs
home).
1.3.3 Enhancement of information access
As discussed previously, mobile information access via an infrastructure of
APs or infostations exhibits frequent disconnections and low bit-rates. Our
main challenge is to provide complementary mechanisms that enhance the
information access when mobile devices face disconnections to the Internet.
To achieve this, we proposed a mobile peer-to-peer computing paradigm that
enables resource sharing when an infrastructure is not always available. This
paradigm was also analyzed, evaluated and compared with more traditional
mobile access methods, namely, via APs and infostations.
1.4 Resource sharing using 7DS
We propose 7DS, an architecture and set of protocols that enable resource
sharing among peers that are not necessarily connected to the Internet. 7DS
encompasses three facets of cooperation: data sharing, message relaying, and
bandwidth sharing.
7DS may relay, search for and disseminate information, and share bandwidth. It operates in a self-organizing manner, without the need for an infrastructure and serves as the underlying information and service discovery
protocol. We assume that 7DS runs in the middleware and 7DS-enabled devices communicate with each other via wireless LANs.
7DS stands for “Seven Degrees of Separation”, a variation of the “Six
Degrees of Separation” hypothesis, which states that any person can be connected to any other person through a chain of acquaintances with no more
than five intermediaries. An analogy to our system can be made, particularly
1.4 Resource sharing using 7DS
13
with respect to data recipients and the device with the original copy. The six
degrees of separation was a popularized version of the small world concept, a
term coined by the sociologist Stanley Milgram in the 1960s in the context of
his experiments on the structure of social networks2 . 7DS was inspired by the
idea that there will be a growing number of “on-line” communities of mobile
users that gossip, share information and resources via their wireless-enabled
devices.
7DS-enabled devices can interact either in a peer-to-peer (P-P) or serverto-client (S-C) manner. The S-C mode is asymmetric; there are 7DS-enabled
servers that respond to queries and non-cooperative, potentially resourceconstrained clients.
Throughout the text, the term 7DS node or 7DS host or simply host are
used interchangeably to indicate any 7DS-enabled device, and 7DS peer or
simply peer any 7DS-enabled device that employs the peer-to-peer paradigm.
These different modes of operation allow 7DS to instantiate different mobile
information access schemes when possible, and provide complementary access
through peers, when an infrastructure is not available.
Hosts can be handheld devices that are mobile and power constrained, stationary PCs, and servers or infostations connected to the Internet and a power
outlet. A 7DS-enabled server can be either a dual-homed device connected to
the Internet or a wired infrastructure of other servers, or an autonomous infostation. It can be mobile or stationary. An example of mobile server is a
robot that roams in a museum and disseminates information to visitors with
handheld devices. 7DS running on handheld devices will use different power
conservation and collaboration methods than 7DS-enabled servers.
7DS nodes can collaborate by data sharing, forwarding messages or caching
popular data objects. For example, an autonomous 7DS server may monitor
for frequently requested data, request it from other peers, and store it locally
to serve future queries. The fixed information server (FIS) is an instantiation
of the S-C scheme with a stationary server and is equivalent to the infostation
model. Thus, 7DS can be viewed as a generalization of the infostation concept.
In information sharing, peers query, discover, and disseminate information.
A 7DS host acquires data from other peers (in P-P) or from the infostation
(in S-C) within its wireless coverage using single-hop broadcast to periodically
query for data. Instead of operating with high transmission power to reach an
AP or an infostation that is far away, a host forwards its messages or requests
for data to its peers in close proximity. In that way, hosts can conserve more
power and better utilize wireless bandwidth. Replication introduces a tradeoff
among data consistency, security vulnerabilities, management overhead, and
availability. 7DS assumes that, in the face of disconnections, users can trade
the data consistency and currency over data availability.
Motivated by the high spatial locality, which is intrinsic in positioning,
we also applied the peer-to-peer concept to location-sensing. Specifically, we
2
We have not explored if a similar hypothesis is true here.
14
1 Introduction
designed a collaborative location-sensing system (CLS) that adaptively positions wireless-enabled devices using the existing communication infrastructure
and without the need of specialized hardware. CLS enables hosts to cooperate
by sharing their position estimates, and use these estimates along with signal
strength measurements, to iteratively determine their position.
To conserve power, 7DS periodically activates the network interface. During the on interval, 7DS hosts communicate with their peers. In its asynchronous mode, the on and off intervals are equal but not synchronized, while
in synchronous mode, the on and off intervals are synchronized among hosts
but not necessarily equal.
When bandwidth sharing is enabled, 7DS allows a host to act as an
application-layer gateway and share its connection to the Internet with other
hosts. When a peer is unable to access the Internet, it may ask other peers to
act as gateways. Alternatively, hosts can buffer their messages locally and relay them to peers. Specifically, in message relaying, a host forwards its queued
messages to another peer or AP. To prevent message looping and better utilize
the buffer, 7DS may restrict the number of times that a message is forwarded
and delete old and duplicate ones.
1.5 Overview of this monograph
This work explores two main research domains:
•
•
mobile peer-to-peer computing
wireless networking measurements and modeling
Its first part presents a novel framework for mobile wireless data access based
on the peer-to-peer paradigm. Unlike typical peer-to-peer approaches in wired
networks, 7DS does not try to establish permanent caching or service discovery mechanisms due to the highly dynamic environment. Instead, 7DS hosts
acquire the data from other peers within their wireless coverage using singlehop broadcast. The thrust of this research is the information dissemination
in mobile networks, which raises several questions: How fast does information
spread in such networks? What is the impact of cooperation, data popularity,
and wireless range on information diffusion? How do the different mobile information access and caching paradigms compare? The peer-to-peer paradigm
is then applied in location-sensing and its performance is evaluated. Given the
dearth of large-scale, non-controlled 7DS-like environments, we run extensive
simulations to study these issues. We also experiment with novel applications
that use 7DS as their underlying information discovery mechanism.
Although ieee802.11 APs and clients are rapidly deployed, there are still
areas of limited or no wireless coverage. 7DS can “bridge” the access via
wireless infrastructures and peers through caching and relaying.
To uncover the weaknesses and distinct characteristics of wireless networks,
measurement-based studies are critical. Eager to better understand the characteristics of wireless access and workload, we perform empirical analysis and
1.5 Overview of this monograph
15
modeling studies. For this purpose, extensive real-traces were acquired from
a large-scale campus-wide ieee802.11-based infrastructure. Their prevalence
impels us to analyze them and examine the spatial locality of the wireless
information, access patterns, and workload characteristics. Several questions
stimulate this research effort: How loaded are the APs and what type of applications are accessed? What is the impact of different caching paradigms
in wireless networks? How do users arrive at APs? How do they roam across
APs? What are the right structures to model the user-initiated activity in a
wireless network?
Most of the performance analysis studies of wireless networking protocols
employ as input for the traffic demand traces based on various constant-bitrate udp and tcp flows or “infinite” udp/tcp sources to simulate asymptotic
conditions. We are eager to explore models that can reflect realistic workload
conditions and at the same time are simple, flexible, and expressive enough to
allow us to “manipulate” them in order to simulate or emulate different conditions with respect to the application mix, roaming pattern, and traffic load.
We capture the user-initiated activity through flows, sessions, i.e., episodes of
continuous wireless access in the infrastructure, and disconnections. Furthermore, we present a methodology for modeling the demand and specifically,
the client associations at APs, sessions, and flows.
As more mobile peer-to-peer applications and delay-tolerant networks
(DTN) [12] are being deployed, it should be easier to acquire traces from
such testbeds and apply the proposed modeling methodology to study their
access patterns. Such models can then be imported in performance analysis
studies to offer more meaningful insight about the performance of various type
of protocols.
To summarize, this work presents the following:
1. The design and implementation of 7DS, a novel system that enables information dissemination and sharing among mobile hosts.
2. The evaluation of the impact of the wireless range, host density, querying
mechanism, power conservation, and cooperation on data dissemination
via extensive simulations.
3. A discussion on theoretical models for data dissemination that use random
walks and diffusion-controlled processes.
4. A brief presentation of CLS, a location-sensing system that employs the
peer-to-peer paradigm to enhance the position estimates.
5. A measurement-driven evaluation of the spatial locality property of web
requests and caching schemes in a large-scale wireless infrastructure.
6. A measurement-driven analysis and modeling of the access patterns and
user workload in large-scale wireless infrastructures.
7. Accurate and scalable models of user workload and a discussion of the
scalability and reusuability tradeoffs.
8. A performance analysis of a wireless LAN that highlights the impact of
various traffic models.
16
1 Introduction
1.5.1 Outline
Chapter 2 gives an overview of the main components of 7DS, CLS, and applications that have been integrated with 7DS. Its main results have also
appeared in [151, 220, 344, 293].
Chapter 3 evaluates several mobile information access schemes with extensive simulations and presents some theoretical data diffusion models. Most
of the results of Chapter 3 have appeared in [283, 284, 285].
The empirical studies included in this book used extensive traces collected
from the wireless infrastructure at UNC. Chapter 4 introduces the wireless
infrastructure, monitoring tools, data acquisition process and traces, and lists
the type of publicly available wireless traces. The main definitions and concepts for modeling the workload are presented, followed by an analysis and an
application-based characterization of the wireless workload. Finally, we also
examine the spatio-temporal locality of the web requests accessed from the
wireless infrastructure and evaluate several caching paradigms using extensive
traces. A detailed discussion of the workload analysis and characterization
study can be found in [115, 181, 300].
Chapter 5 discusses our multi-level modeling of the wireless demand,
namely, the associations and generated traffic in a large-scale wireless network. It provides an empirical modeling of wireless user access: arrivals at
APs and roaming patterns across APs. Specifically, it analyzes the duration
of a client association at an AP and the roaming between APs and proposes
an algorithm that predicts the next AP for a client. It then shifts the perspective from client- and AP-level to an infrastructure-wide view and models
main features of the wireless user activity, namely, the episodes of continuous
wireless access and the flow generated during those episodes. A more detailed
description of this research can be found in [115, 288, 182, 289, 179, 217, 342].
Finally, Chapter 6 summarizes our results and discusses directions for future work.
2
7DS architecture for information sharing
This chapter focuses on the architecture components that enable information sharing via 7DS. Firstly, the communication, cache management, and
power conservation are presented, followed by a discussion about mechanisms
to stimulate cooperation and prevent denial of service attacks. To support
location-based applications, we introduce a positioning system, the Cooperative Location-sensing System (CLS) that also applies the peer-to-peer
paradigm via 7DS. This chapter gives an overview of CLS, and shows how
7DS can act as the underlying information discovery mechanism for different
location-based and collaborative applications.
2.1 Overview of 7DS architecture
A major contribution of computer science—that has played a dramatic role
in society and other sciences—is the creation of new paradigms, technologies,
and tools for communication and interaction. The World Wide Web and Internet have been catalysts for the creation of collaborative applications and
tools. Powerful drivers for on-line collaborations have been “group-forming
networks” that allow users to self-organize and form groups, such as eBay,
Wikipedia, and the Open Source Initiative. On-line collaboration has been enriched with new applications and tools for storing, sharing, and experimenting
with multimedia data, such as Flickr, YouTube, Me.dium, My Space, facebook,
and JumpCut. These technologies have allowed the formation of new types
of social networks, interactions, and online communities. The communication
paradigms, interaction rules, and network topologies can vary and have a
great impact on the performance of the information diffusion. Social network
analysis has emerged not only as a popular topic of speculation, but also as a
key technique in sociology, anthropology, geography, economics, biology and
computer science.
7DS facilitates collaboration of mobile devices by instantiating three main
information access methods: via an AP, using an infostation, and applying
18
2 7DS architecture for information sharing
the peer-to-peer paradigm. It acts as the underlying information discovery
mechanism for applications that run on the local device, enabling the peerto-peer data sharing when access via an AP or server fails.
The novel aspect of 7DS is its instantiation of the peer-to-peer paradigm
in a mobile wireless network. The design and implementation of this aspect
will be the focus of this chapter. When an application requests a data object,
7DS first checks its cache, and if the data is not available or has expired, it
tries to acquire it from the Internet. For example, in the case of web browsing,
a data object is a web page including all its embedded files. If the local web
browser fails to connect to the web server, 7DS attempts to acquire the page
from another peer in the wireless LAN.
Fig. 2.1. Example of information sharing using 7DS. The arrows show the message
exchange for the 7DS communication. The light-shaded area denotes the wireless
LAN, the darker-shaded area the Internet, and the thunderbolt-like shape the wireless WAN connection that is not currently available.
Figure 2.1 illustrates an example of 7DS use. Mobile host A (MH A) tries
to access a data object. The local 7DS instance running on host A detects
an unsuccessful attempt to connect to the Internet and tries to retrieve the
page from peers that are within its wireless range. Both hosts B and C (MH B
and MH C, respectively) are within the range of host A and receive the query.
2.1 Overview of 7DS architecture
19
7DS Peer
Application
Application
Application Client
HTTP
IP
Other
7DS Peer
Application Server
7DS
IP
Position
GPS
CLS
Other
7DS Peer
Fig. 2.2. 7DS architecture: an underlying information discovery mechanism for
location-based applications, in conjunction with positioning systems (e.g., GPS and
CLS).
Unlike host B, host C has a copy of the data in its cache and responds to host
A’s query.
To facilitate the interaction with 7DS, applications use pairs of attributes,
(name, value), to describe the data that they are willing to share with other
application instances running on peers. For each application, 7DS maintains
an index of the local cache that is populated with data that can be shared.
This data may have been acquired from other peers or servers. Figure 2.2
illustrates the general 7DS architecture coupled with positioning systems and
applications.
In contrast to Gnutella and other peer-to-peer mechanisms in wired networks, a 7DS peer does not maintain connections with other peers but only
multicasts its queries to a well-known multicast group. In addition, 7DS—in
the default mode—restricts the query propagation to the wireless LAN. Unlike
Napster, 7DS operates in a distributed fashion without the need for a central
indexing server. Napster also requires user intervention for uploading files,
whereas 7DS does this automatically. Furthermore, our setting is orthogonal
to the service discovery in the wide area network. In service discovery, there is
typically an infrastructure of cooperative servers that create indices to locate
20
2 7DS architecture for information sharing
data based on the queries and the content of the underlying data sources of
their local domain [106].
2.1.1 Communication
Applications in a 7DS-enabled system employ insert and query messages to
communicate with their local 7DS instance using soap, which is simply xml
over http. Specifically, insert messages indicate what data can be shared with
other peers and stored in its local cache (Figure 2.3).
The communication among 7DS peers is implemented by the following
message types, all in xml format:
•
•
•
queries
reports
advertisements
Queries describe the requested data items with predefined application-specific
attributes, and are generated by the application when the relevant data cannot
be found locally. In addition, queries include attribute pairs with undefined
values to be bound during a matching process at a peer. 7DS supports various
types of queries, such as, queries with a list of attributes that must match,
nested boolean operations, and different types of matches (e.g., case-sensitive
exact match, regular expression match).
A 7DS host actively queries for a data object when it periodically multicasts a query for that object to a predefined group until it receives the
relevant data. Active querying is the default querying mechanism. After receiving a query, a peer extracts the embedded attributes and performs an
attribute-matching search in its local cache. In the case of a match, the peer
broadcasts a report that describes the relevant data found in its local cache.
This generated report reflects the received query, with a subset of its attributes bound via this matching process that is performed locally. A report
can be self-sufficient or include a url for a subsequent retrieval of the complete data object (e.g., Figure 2.4). After a predefined interval, the querying
host selects the most relevant report—among the received ones—based on
application-specific criteria.
Advertisements are messages periodically multicast from the 7DS-enabled
servers to announce their presence. Upon the receipt of such advertisements,
a 7DS host may send its queries to the server. As opposed to active querying, this type of querying—defined as passive querying—is targeted at powerconstrained devices that participate in 7DS only when the requested data is
likely to be available.
2.1.2 Cache management
Primary information propagation occurs through the use of caching rather
than reliable state maintenance, and 7DS does not attempt to resolve inconsistency among copies of a data object. 7DS organizes and indexes its cache,
2.1 Overview of 7DS architecture
21
7DS Peer
Fig. 2.3. Interaction of 7DS with applications. The communication between 7DS
and an application is via soap. Only the communication components of 7DS and its
interaction with an application are illustrated. The squares inside 7DS indicate the
logical modules of 7DS and the arrows the sequence of interaction between them.
which can be viewed, browsed, and managed through a graphical user interface (GUI). The current prototype displays the content of the cache in a
directory-like structure (Figure 2.5). The GUI can be extended to support
grouping of the cache content by predefined categories and searches using the
meta-data attributes of the stored objects. To protect the user’s privacy, 7DS
only shares reports and pages that correspond to publicly available objects.
The cache management includes setting of access permissions of files and directories, deleting expired objects or specific files, and updating the index.
7DS can be easily extended to support the prefetching operation. Through
a GUI, users can mark which pages need to be prefetched or updated regularly,
and upon their expiration 7DS will generate the corresponding queries.
2.1.3 Power conservation
Using a battery monitor and power management protocol, 7DS aims to adapt
its communication pattern to reduce energy consumption, especially when
22
2 7DS architecture for information sharing
<?xml version”=1.0” encoding=”UTF-8”?>
<ds:Report xmlns:ds=”http://www.cs.unc.edu/~maria/7ds/”>
<ds:Object> <ds:ObjectType>Hypermap</ds:ObjectType>
<ds:ID>300</ds:ID>
<ds:SourcePeer>192.168.1.100</ds:SourcePeer>
<ds:PathToFile>F8F84640FD800549694E2B4C5A6C7198.xml</ds:PathToFile>
<ds:IsPrivate>false</ds:IsPrivate>
<ds:Application>
<Description>SVG Project Demo</Description>
<Type>Meeting</Type>
<Start Time>1051905600000</Start Time>
<SVGMapYCoordinate>-1495.0</SVGMapYCoordinate>
<EndTime>1051909200000</EndTime>
<SVGMapID>0</SVGMapID>
<SVGMapXCoordinate>2206.0</SVGMapXCoordinate>
<TimeLastModified>1051890743424</TimeLastModified>
<GUID>F8F84640FD800549694E2B4C5A6C7198</GUID>
<EventDate>1051848000000</EventDate>
<Creator>Tim Ross</Creator>
</ds:Application>
</ds:Object>
Fig. 2.4. Example of a 7DS report with attributes names, such as “Description”,
“Type”, “StartTime”, “TimeLastModified”, and their corresponding values in xml
format.
the expectation of successful data access is low. However, estimating this
likelihood presents a considerable challenge and the use of advertisements can
only provide some hints.
In general, the following parameters impact the power consumption of a
network interface:
•
•
•
size of packets sent and received
number of packets sent and received
time the network interface is on
To reduce the power consumption, these parameters need to be kept low.
7DS can employ a simple mechanism that periodically activates the network interface, resulting in an alternation of on and off intervals, that takes
place in an asynchronous or synchronous manner. In its asynchronous mode,
on and off intervals are equal but not synchronized, while in synchronous mode,
the on and off intervals are synchronized among hosts, although not necessarily
equal.
2.2 Preventing denial-of-service attacks
23
Fig. 2.5. The interface for setting the permission of the cached objects in 7DS.
Hosts may potentially decide on a channel and time interval to communicate and turn their network interface on only during the agreed-upon interval,
further reducing the reception of unnecessary traffic. The creation of groups,
such that only members of the same group participate in data sharing during
the agreed-upon interval and at the specified channel via encrypted messages,
may reduce the energy-spending and protect privacy.
Based on the battery level and energy constraints, 7DS may adapt its
querying mode (active or passive), type of collaboration (data sharing and
forwarding), and power conservation. An evaluation of the different communication patterns is presented in Chapter 3.
When a lower power wireless network interface is available in addition
to the ieee802.11 one, 7DS can use the low-power network interface for the
communication between server and peers to decide on data availability, while
the ieee802.11 radio remains mostly off, and is used only when a large data
object needs to be exchanged.
2.2 Preventing denial-of-service attacks
Mobile devices and wireless networks are vulnerable to different type of attacks
aiming to exhaust their resources. One type of such attacks are the denial-ofservice attacks that may target different layers. For example:
24
•
•
2 7DS architecture for information sharing
Physical layer: creating interference, exhausting the power of devices
tcp/ip layer: syn flood, syn+ack flood, tcp connection reset attack,
bandwidth exhaustion attack
Overlay network layer: routing attack, eliminating peers by exhausting
their power, misbehaving relay devices and caches
Application layer: application-specific attacks by disseminating false information, storage flooding attack
•
•
1.
2.
3.
4.
5.
6.
Host
Host
Host
If no
Host
Host
Q sends a query
R receives the query
R waits for a random time interval T
challenge for host Q was multicast during T, host R challenges host Q
Q sends its response
R verifies host Q’s response to the challenge
Fig. 2.6. Responder R challenges querier Q to prevent denial-of-service attacks.
Challenging a host using hash cash [69] is a typical method for preventing
denial-of-service attacks. These challenges force the host to execute a nontrivial computational task, such as discovering the input in a hash function
given the output and a part of the input, before the actual information sharing
(Figure 2.6). By challenging the querier at each query, 7DS penalizes malicious
users for overloading the network with queries. A potential problem arises
when a responder cooperates with a malicious querier, for example, by sending
“trivial” challenges or when the querier itself sends “trivial” challenges. To
forestall this problem, 7DS can force responders to sign their message, and in
that way, other hosts in the wireless LAN can verify the source of the challenge.
Furthermore, hosts can use the synchronous approach to reduce the impact
of flooding by a malicious user, since hosts will have their network interface
on only some specific periods of time—likely unknown to the malicious user.
2.3 Encouraging cooperation
In peer-to-peer systems, cooperation is crucial. In the first generation of peerto-peer systems, a large percentage of users shared no files [17]. With the
exception of BitTorrent, most of the peer-to-peer systems still do not give
incentives to users to cooperate. While devices are naturally motivated to cooperate in rescue operations, meetings, or home- or personal-area networks,
they may have fewer incentives to collaborate in other environments. This lack
of incentives is exacerbated by energy constraints and the possible presence of
2.3 Encouraging cooperation
25
selfish or malicious devices that falsely promise to cooperate, disseminate erroneous information, violate the protocols at different layers (e.g., by causing
interference and utilizing the shared resources in a selfish manner) or generate denial-of-service attacks [259, 26]. While poor protection of resources can
impede the use of a peer-to-peer system, high costs to access the resources
can dissuade them. To encourage cooperation, two general approaches have
been introduced in literature [37]:
•
•
Micropayment-based (e.g., reward devices for relaying packets or responding to queries)
Reputation-based (e.g., devices observe behavior of other devices and misbehaving devices are punished)
2.3.1 Micropayment mechanisms
The following micropayment mechanisms can be used in 7DS:
•
•
electronic checks (e-checks)
a token-based approach
In e-checks and token-based approaches, nodes remunerate each other for the
services they provide to each other. Whereas e-checks do not need trusted
hardware, the token-based approach requires a tamper-resistant hardware
module in each device for the management of tokens and cryptographic coding of messages, increasing the cost and energy expenditure of mobile devices.
Both approaches include an authentication, a micropayment, and an information exchange mechanism.
e-check mechanism
7DS could employ the e-check approach proposed in [88], where hosts sign up
for 7DS with a trustee entity or “bank” and acquire an amount of virtual currency as an e-check from that bank. To control the losses from uncollectible
transactions, the bank maintains an account limit for each host. As in typical
credit models, there is a risk factor, which 7DS can tolerate. e-checks are cryptographically bound to each transaction, which prevents forgery by another
host that overhears the exchange of an e-check.
A public-key credential-based architecture can be used: the bank acts as a
trusted third party that can authenticate each other offline using appropriate
credentials. Each host has its own public key, which is encoded in the credentials along with some restrictions. To minimize losses the credentials are
short-lived, and thus, frequently refreshed. 7DS downloads new credentials
when the host accesses the Internet, while the bank can limit the amount of
micropayments a peer may send to others during a period of time.
The number of credentials issued to a host depends on its usage pattern,
service, and trustworthiness. Furthermore, a bank may decline to issue new echecks or extend the credit line to non-trustworthy hosts. The tradeoff between
26
2 7DS architecture for information sharing
reducing the loss and avoiding disruption of cooperation is an interesting
research topic.
1.
2.
3.
4.
5.
6.
Host R sends its credentials
Host Q verifies that host R is known to the bank and is authorized for 7DS
Host Q sends an e-check
Host Q waits for some time for the data from host R
If the time expires, host Q sends a NACK to host R
Host R verifies that the e-check is genuine, and
if genuine, host R stores it and sends the data to host Q
7. If host R receives a NACK from host Q, it resends the data to host Q
Fig. 2.7. e-check payment for responding to a query: verification of credentials and
e-check exchange.
Let us now assume that 7DS multicast queries are free, but hosts pay to
receive the complete data objects after selecting a report that includes a url
to that object. Moreover, let us consider that host Q has multicast a query,
and host R has responded by sending a report with the relevant data in its
local cache. In its report, host R also indicates the amount of payment required
for the transmission of the complete data. Hosts Q and R authenticate each
other, and then verify each other’s capabilities. Host Q verifies that host R
is known to the bank and is authorized to charge host Q’s account for this
transaction. Host R verifies that host Q is authorized by the bank to proceed
with the specific transaction. When a transaction is completed, host Q receives
the data object, and host R receives an e-check from host Q. The e-check is
encoded as credentials that authorize payment for that specific transaction.
Host Q creates its credentials signed with its rsa key [44, 328] and sends them
to host R.
The credentials include the time they were issued, thereby constraining
the amount of payment per responder during a time interval and limiting the
risk of double-depositing the e-checks by a responder. Note that there is no
guarantee that host R will transmit the data to host Q after receiving host
Q’s e-check.
The communication between the bank and hosts can take place using established cryptographic protocols, such as ipsec [65, 66]. Periodically, hosts
provide their collected e-checks to the bank that, in turn, verifies the transactions and updates the relevant accounts (in the above example, it increases
R’s account and decreases Q’s). The bank can employ the same verification
method that host R used to check Q’s credentials. Furthermore, it can generate short-term credentials for the host over the secure link, with a new public
key being refreshed each time.
2.3 Encouraging cooperation
27
An advantage of e-checks is that they do not need trusted hardware. On the
other hand, certain constraints discourage cooperation, namely the frequency
of contact with the bank in order to upload received e-checks and obtain
new ones, the account limit, and expiration of e-checks. The e-check system
is designed to tolerate manageable losses, rather than preventing them, and
does not provide anonymity.
Token-based micropayment approach
Unlike e-checks, the token-based mechanism assumes the existence of a secure
module (i.e., trusted and tamper-proof secure hardware), and a trustee agent
or “bank” that distributes some virtual currency or tokens. A token-based
micropayment approach was proposed by Buttyan et al. [97] to support message relaying in mobile ad-hoc networks. In their system, hosts register with
the trustee agent and receive a number of tokens, which are stored in their
“purse”, a counter that resides in the secure hardware and indicates the wealth
of the host. Tokens come in a single “denomination”, without any monetary
value, and can be employed by 7DS to pay hosts that respond to queries.
To prevent a node from illegitimately increasing its own counter, the
counter is maintained by the secure module in each node. The tokens that
are loaded into the packet are protected from illegitimate modification and
detachment from their original packet by cryptographic mechanisms.
A public key infrastructure with public key certificates to verify the public
key of a peer can be used. In its secure module, each host keeps its own public
and private key, a public key certificate from a certificate authority, and the
counter.
1.
2.
3.
4.
5.
6.
If counter is not sufficiently loaded, return warning
Verify host R’s public key certificate, if valid continue
Form query
Insert query in the list of pending queries
Send query to host R
If no data sent for pending queries within a predefined time interval,
decrease counter, and send NACK
7. If data received for pending query,
decrease counter, and send ACK
Fig. 2.8. The querier Q runs these steps on its secure module.
As soon as the querier successfully responds to the challenge, the micropayment and data exchange take place. Through an authenticated key agreement protocol, such as the authenticated Diffie-Hellman or Station-to-Station
28
2 7DS architecture for information sharing
1.
2.
3.
4.
5.
Verify public key certificate. If valid, continue
Form response with data
Send data
If ACK received, increase counter
If NACK received, increase counter and resend data
Fig. 2.9. Operations running on the secure module of the responder R.
(STS) protocol [128], the two hosts can establish a shared key. Before sending a query, a host can run the STS protocol, so the parties’ key pairs can
be generated anew. The public keys are certified, so that the parties can be
authenticated. The STS protocol expires after some time, so for each query
hosts need to rerun it. An STS channel is established between the secure modules of the two hosts and a shared key is generated to be used for encrypting
all messages exchanged between them. Through this secure module, 7DS can
prevent hosts from double-spending.
When requesting the complete data object, the querier (host Q) and the
responder (host R) perform the operations described in Figures 2.8 and 2.9,
respectively, on their secure module.
2.3.2 Reputation mechanisms
In reputation-based trust models, the higher the reputation, the more trustworthy the peer. To avoid malicious peers, peers communicate and share resources with only trustworthy devices. Reputation-based systems require stable identities to hold peers responsible for their actions. However, the creation
of multiple identities, the abuse of identities, and the provision of fake or dishonest feedback ratings can be relatively easy in ad-hoc wireless networks.
Thus reputation-based systems, and more generally, the provision of security
in such networks is arduous due to its offline nature, lack of continuous access
to a trustee entity, and power constraints of the devices [369, 195]. Furthermore, resource sharing in 7DS is a relatively short-term exchange. All the
above characteristics make the micropayment-based approach more appropriate and simpler to use in the context of 7DS.
2.4 Location-sensing using the peer-to-peer paradigm
Location-sensing has been impelled by the emergence of location-based services in the transportation industry, emergency situations for disaster relief,
the entertainment industry, and assistive technology in the medical community. To support location-dependent services, a device needs to estimate its
2.4 Location-sensing using the peer-to-peer paradigm
29
position. For example, the gps-enabled navigation systems allow users to compute a route to guide them. However, gps typically breaks down near obstacles, such as trees and buildings, and does not work indoors.
Location-sensing systems can be classified according to their dependency
on and use of:
•
•
•
•
•
•
•
•
specialized infrastructure and hardware
signal modalities
training
methodology and/or use of models for estimating distances, orientation,
and position
coordination system, scale, and location description
localized or remote computation
mechanisms for device identification, classification, and recognition
accuracy and precision requirements
The distance can be estimated using time of arrival (e.g., gps, PinPoint [365])
or signal-strength measurements (e.g., Radar [71], Ekahau [13]), if the velocity of the signal and a signal attenuation model for the given environment,
respectively, are known. The coordination system can be absolute or relative,
while the location description physical or symbolic. Accuracy and precision are
typical metrics for evaluating a positioning mechanism. A result is considered
to be accurate, if it is consistent with the true or accepted value for that result.
Precision refers to the repeatability of a measurement and is an indication of
how sharply a result has been defined. It does not require us to know the
correct or true value. A survey of positioning systems can be found in [183].
Positioning systems may employ different modalities, such as:
•
•
•
•
•
•
•
ieee802.11 (Radar [71, 171], Ubisense [39], Ekahau [13])
infrared (Active Badge [323])
ultrasonic (Cricket [301, 302], Active Bat [307])
Bluetooth [171, 77, 148, 318, 94, 64, 171]
4g [322]
vision (EasyLiving [236, 29])
physical contact with pressure (Smart Floor), touch sensors or capacitive
detectors
A location-sensing system may infer the position using statistical analysis or
pattern matching techniques on measurements acquired during a training and
run-time phase. The popularity of the ieee802.11 network, its low deployment
cost, and the advantages of using it for both communication and positioning,
make it an attractive choice. Most of the signal-strength based localization
systems can be classified into the following two categories:
•
•
signature or map-based
distance-prediction based
30
2 7DS architecture for information sharing
The first type creates a signal-strength signature or map of the physical space
during a training phase and compares it with analogous run-time measurements [71, 239, 364]. To build such maps, signal-strength data is gathered
from beacons received from APs at various predefined checkpoints during a
training phase. Thus, each checkpoint in the map associates the corresponding
position of the physical space with statistical measurements based on signalstrength values acquired at those positions. Such maps can be extended with
data from different sources or signal modalities, such as ultrasound from deployed sensors to improve location-sensing [301, 171].
In other situations, a dense deployment of a wireless infrastructure for
communication and location-sensing may not be feasible due to environmental, cost, and regulatory barriers. Ad hoc networks exploit cooperation by enabling devices to share positioning estimates [327, 185, 104, 275, 116, 146, 365].
CLS is a novel location-sensing system using two features:
•
•
the peer-to-peer paradigm
probabilistic-based frameworks for transforming measurements from various sources to position and distance estimates
CLS applies the peer-to-peer paradigm by enabling devices to gather positioning information from other neighboring peers, estimate their distance from
their peers based on signal-strength measurements, and position themselves
accordingly [151]. Periodically, CLS can refine its positioning estimates by
incorporating newly received information from other devices.
CLS adopts a grid-based representation of the physical space; each cell of
the grid corresponds to a physical position in the physical space. The cell size
reflects the spatial granularity/scale. Each cell of the grid is associated with a
value that indicates the likelihood that the node is in that cell. These values
are computed iteratively using one of the following approaches:
•
•
A simple voting algorithm, through which a local CLS instance casts votes
on cells of the grid. A vote on a cell indicates the likelihood that the local
device is located in the corresponding area of that cell.
A particle filter-based model.
CLS can incorporate additional information to improve its location estimates.
Examples of such information are: position estimates from different network
interfaces (e.g., Bluetooth, rf tags, ieee802.11), contextual semantics (e.g.,
topological information about the environment, mobility patterns, hotspots
of the area), and signal-strength-based signatures of the physical space, to
improve the location estimation.
2.4.1 Overview of CLS
CLS aims to enable devices to determine their location in a self-organizing
manner without the need for extensive infrastructure or training. The design
of CLS was driven by the following desired properties:
2.4 Location-sensing using the peer-to-peer paradigm
•
•
•
•
31
tolerance to multiple network failures (e.g., AP failures or disconnections)
ability to incorporate application-dependent semantics and various types
of measurements
relatively low computational complexity
use in both indoor and outdoor environments with pedestrian mobility
CLS can be integrated with a broad range of applications running on
devices of different computing capabilities. Some of these devices may have a
priori knowledge of their location that they can provide reliably. We refer to
them as landmarks. A device that runs CLS to position itself is referred to as
a node or non-landmark peer.
A node tries to position itself on its local grid through a voting process in
which devices participate by sending position information and casting votes
on specific cells. Each iteration of a local CLS instance (i.e., running at a peer)
consists of the following steps (Algorithm 1):
Algorithm 1 An iteration of the voting process at a CLS instance
1. Gather position information from other peers
2. Record measurements from the received messages
3. Transform this information to a probability of being at a certain cell of its local
grid
4. Add this probability to the existing value that this cell already has
5. Report a position that corresponds to the centroid of the set of cells with maximal
weight
At the beginning of a run, each peer broadcasts messages to its one-hop
neighbors that include its positioning information, specifically, its local id,
maximum wireless range, and position, if known or computed. We refer to
this broadcast update as a positioning message.
We assume that an AP is configured with its position coordinates and can
act as a landmark and send positioning messages in the form of beacons. A
peer records the signal strength values with which it receives these messages
and responds by broadcasting its own position estimates.
Each local CLS instance transforms these signal-strength values to either
distance or position estimates based on a radio attenuation model or a pattern matching algorithm, respectively. Such algorithms relate signal-strength
measurements, acquired from messages exchanged between devices, to their
position on the terrain or their distance. Based on the position information of
the sender and this distance estimation, the receiver estimates its own position
on the local grid. When the local CLS estimates its own position, it broadcasts
this set of information, i.e., CLS entry, to its neighbors. Each node maintains
a table with all the received CLS entries. We denote the grid of the node k
as Gk and let v(i, j) denote the probability that the cell (i, j) ∈ Gk is the
position of node k. The region of the grid, Gh,k , i.e., set of cells for which peer
k votes as possible region of node h.
32
2 7DS architecture for information sharing
Fig. 2.10. An example of accumulation of votes on grid cells of a host at different
time steps. The brighter an area, the more voting weight has been accumulated on
the corresponding grid cells. The brightest area corresponds to a potential solution.
The grid cell is too small to be distinguishable.
2.4 Location-sensing using the peer-to-peer paradigm
33
Each node tries to position itself on its local grid. To determine its location,
each node h gathers position estimates from other peers, and computes its own
location using the Algorithm 2.
Algorithm 2 Position estimation at node h
1. Initialize the values of the grid Gh with all cells containing zeros.
2. If a signature of the environment is available, compare it with run-time measurements, and for each cell c of the grid, assign a vote of weight w(c) (according
to specified criteria).
3. For each received distance estimation at a peer k with a known or estimated
position, perform the following steps:
a) Transform the coordinates of peer k to the coordinate system of the grid.
b) Determine the region of the grid, Gh,k , i.e., the set of cells for which peer
k votes as possible region of node h. The determination can be based on a
position-based or distance-based algorithm. If the peer k is a non-landmark,
the distance between the two peers can be computed according to a radio
attenuation model or a pattern matching algorithm.
c) Increase the value of each cell in Gh,k by vk , where vk is the voting weight
of node k.
4. Assess the values of the cells in the grid and accept or reject the attempt for
location-sensing.
This is essentially a voting process, in which a node casts votes on the cells
of its grid on behalf of other peers. Votes may have different weights. The larger
voting weight a cell has acquired, the more likely it is for the corresponding
node to be located in that cell. The set of cells in the grid with maximal
value indicates the potential region. Figure 2.10 shows a snapshot of the grid
as three landmarks vote on the location of an unsolved host. The brighter an
area, the more voting weight has been accumulated on the corresponding grid
cells. The brightest area corresponds to a potential solution.
When a training phase prior to voting is feasible, CLS can build a map
or signature of a physical space, which is a grid-based structure of the space
augmented with measurements from peers. Examples of signal strength-based
signatures are: position-level and distance-level ones. At run-time, a local CLS
instance performs the following steps:
1.
2.
3.
4.
acquisition of signal-strength measurements from peers
creation of a signal-strength map of the space using these measurements
generation of a run-time signature
comparison of the run-time and training signatures
For the signature comparison, various criteria can be derived based on the statistical characteristics of the signal-strength measurements, such as confidence
intervals and percentiles [344].
34
2 7DS architecture for information sharing
Landmarks and nodes that are first to position themselves determine—to
some extent—the accuracy of the location estimation of the remaining nodes,
since their positioning estimates and errors are propagated in the network
through the voting process. To minimize the impact of such errors, CLS imposes the following two conditions:
•
•
The number of votes in each cell of the potential region must be above a
threshold. We refer to this threshold as the solution threshold (ST).
The number of cells in the potential region must be below a threshold,
denoted as the local error control threshold (LECT).
In effect, ST controls how many nodes with known location must agree with
the proposed solution. A high ST reduces the error propagation throughout
the network, but delays the positioning estimation. On the other hand, LECT
determines the precision of each step. Another metric for filtering the local
error can be the diameter of the region that corresponds to the maximum
Euclidean distance of cells with the maximal voting weight.
Additional distance estimates from nodes with known locations increase
the voting weight and narrow down the potential region. The values for the
ST and LECT could be determined based on network characteristics, such as
the density of nodes and landmarks, and accuracy of the distance estimations.
To prevent CLS from failing to report a position, both thresholds can be adaptively relaxed after rejecting potential solutions. Once the above conditions
are satisfied, CLS reports the centroid of the potential region as the estimated
location of the device.
CLS can be implemented in a centralized or distributed fashion, depending on whether or not the computations are performed on a server or peers.
Furthermore, in the centralized case one or more servers can be deployed
depending on the topography of the terrain.
2.4.2 Particle filter-based framework
In probabilistic terms, CLS can be formulated as the problem of determining
the probability of a node being at a certain location given a sequence of signalstrength values. Assuming first-order Markov dynamics, the above problem
can be expressed using the network graph depicted in Figure 2.11, where xk
is the node location (system state) at time instant k = 1, . . . , T . Notice that
xk cannot be observed directly (it is “hidden”). Besides, for each location
xk , a measurement vector yk (containing the signal-strength values) is available, that depends on the hidden variable according to a known observation
function.
Due to the Markov assumption, each node location, given its immediately
previous location, is conditionally independent of all earlier locations, that is
P (xk |x0 , x1 , . . . , xk−1 ) = P (xk |xk−1 ).
(2.1)
2.4 Location-sensing using the peer-to-peer paradigm
x1
x2
x3
xT
y1
y2
y3
yT
35
Fig. 2.11. State space model for the proposed location-sensing system. Clear circles indicate hidden state variables, grayed circles indicate observations, horizontal
arrows indicate state transition functions and vertical arrows indicate observation
functions.
Similarly, the observation at the k-th time instant, given the current state, is
conditionally independent of all other states
P (yk |x0 , x1 , . . . , xk ) = P (yk |xk ).
(2.2)
Based on this model, location-sensing can be formulated as the problem
of computing the location xk of a node at time k, given the sequence of
observations y1 , y2 , ...yk , up to time k, that is, determining the a posteriori
distribution P (xk |y1 , y2 , . . . , yk ).
To estimate the above a posteriori probability, which is actually a density
over the whole state space, we use particle filter. Particle filtering is a technique for implementing a recursive Bayesian filter by Monte Carlo sampling.
According to this technique, the a posteriori P (xk |y1 , y2 , . . . , yk ) is expressed
as a set of samples
x(L) = (x, y)(L) ,
L = 1, 2, . . . , N
(2.3)
distributed among the whole state-space. The denser the samples in a certain
region of the state-space, the higher the probability that the node is located
in that region.
Unlike Kalman filters, particle filters do not impose any constraints on the
format of the involved distributions and noise models, or on the linearity of
the involved functions. This makes them particularly well-suited to locationsensing.
2.4.3 Performance of CLS and other related systems
Several CLS variants have been implemented and evaluated via extensive simulations and empirical measurements. For the empirical evaluation, we run
experiments using ieee802.11 signal-strength measurements. CLS has a satisfactory accuracy level without the need of specialized hardware and extensive
training. It can be easily extended for outdoor environments and different
mobility patterns.
36
2 7DS architecture for information sharing
We found that the density of landmarks and peers has a dominant impact
on positioning. CLS can utilize signal-strength maps of the physical space by
superimposing statistical properties of the signal-strength values acquired during the training phase on their corresponding positions. Such maps can significantly improve its performance. Through empirical experiments, we showed
how the different statistical properties of signal-strength measurements, the
particle-filters model, the AP failures and additional peers affect the performance of CLS. Pre-processing the signal-strength measurements by removing
the outliers can further improve the accuracy of CLS.
Currently the training is static, in that it does not consider the placement
of rogue or new APs and changes in the configuration, position or orientation
of APs and density of users or objects in the area. Such changes may affect the
signal-strength values and the signal-strength matching process. A desirable
feature is a dynamic calibration phase in which CLS can detect changes in the
infrastructure (e.g., position of APs) and incorporate them into the map. The
tradeoff between the increased complexity and overhead of the training and
runtime phases and the improvements in the accuracy and precision needs to
be addressed.
Our simulation results indicate that topological information about the environment (e.g., about hotspot areas, presence information of users, existence
of walls, user mobility patterns) can enhance the performance of the system.
Part of our future research effort is to incorporate such heuristics into the
probabilistic framework of CLS and extend the performance analysis study.
Recently significant work has been published in the area of location-sensing
using RF signals. Like CLS, Radar [71] employs signal-strength maps that integrate signal-strength measurements acquired during the training phase from
APs at different positions with the physical coordinates of each position. Each
measured signal-strength vector is compared against the reference map and
the coordinates of the best match will be reported as the estimated position.
Bahl et al. [72] improved Radar to alleviate side effects that are inherent
properties of the signal-strength nature, such as aliasing and multipath.
Ladd et al. [239] proposed another location-sensing algorithm that utilizes
the ieee802.11 infrastructure. In its first step, a host employs a probabilistic
model to compute the conditional probability of its location for a number of
different locations, based on the received signal-strength measurements from
nine APs. The second step exploits the limited maximum speed of mobile
users to refine the results and reject solutions with a significant change in the
location of the mobile host.
Niculescu and Badri Nath [276] designed and evaluated a cooperative
location-sensing system that uses specialized hardware for calculating the angle between two hosts in an ad-hoc network. This can be done through antenna
arrays or ultrasound receivers. Hosts gather data, estimate their position, and
propagate them throughout the network. Previously, these authors [275] introduced a cooperative location-sensing system in which position information
of landmarks is propagated towards hosts that are further away, while at the
2.5 Applications using information sharing via 7DS
37
same time, closer hosts enrich this information by determining their own location. Another location-sensing system in ad-hoc networks performs positioning
without the use of landmarks or gps and presents the tradeoffs among internal parameters of the system [104]. The location-sensing systems presented
in [327] and [176] are the closest to CLS and are compared in detail in [151].
Active Badge [350] uses diffuse infrared technology and requires each person to wear a small infrared badge that emits a globally unique identifier
every ten seconds or on demand. A central server collects this data from fixed
infrared sensors around the building, aggregates it and provides an application programming interface for using the data. The system suffers in the case
of fluorescent lighting and direct sunlight, because of the spurious infrared
emissions these light sources generate. A different approach, SmartFloor [20],
employs a pressure sensor grid installed in all floors to determine presence
information. It can determine positions in a building without requiring users
to wear tags or carry devices. However, it is not able to specifically identify
individuals.
Examples of localization systems that combine multiple technologies are
UbiSense [39] and Active Bats [8]. UbiSense can provide a high accuracy using
a network of ultra wide band (uwb) sensors installed and connected into a
building’s existing network. The uwb sensors use Ethernet for timing and
synchronization. They detect and react to the position of tags based on time
difference of arrival and angle of arrival. An rftag is a silicon chip that emits
an electronic signal in the presence of the energy field created by a reader
device in proximity. Location can be deduced by considering the last reader
to see the card. rfid proximity cards are in widespread use, especially in
access control systems. The Active Bats architecture consists of a controller
that sends a radio signal and a synchronized reset signal simultaneously to the
ceiling sensors using a wired serial network. Bats respond to the radio request
with an ultrasonic beacon. Ceiling sensors measure time-of-flight from reset to
ultrasonic pulse. Active Bat applies statistical pruning to eliminate erroneous
sensor measurements caused by a sensor hearing a reflected pulse instead of
one that travelled along the direct path from the Bat to the sensor. A relatively
dense deployment of ultrasound sensors in the ceiling can provide within 9 cm
of the true position for 95% of the measurements.
Mathematical models that have been used extensively in localization
are Kalman, particle filters (e.g., [329, 141, 142, 184]), and Monte Carlo
algorithms—also based on particle filters—(e.g., [212, 339, 191, 188]).
2.5 Applications using information sharing via 7DS
To demonstrate the information discovery and caching mechanisms of 7DS,
we implemented a prototype and experimented with web browsing and some
location-based and collaborative applications.
38
2 7DS architecture for information sharing
Fig. 2.12. 7DS configuration. Users can change 7DS parameters via this interface.
For example, they can set the frequency that a query is broadcast (BroadcastQueryInterval) to 15 s, or for web pages without any specified expiration field, the user
can set a default one.
The 7DS prototype was written in Java on Linux and also ported to Windows. The Glimpse search engine was used initially but was replaced with
Lucene [256], when it became a performance bottleneck. Lucene provides incremental indexing, persistent and non-persistent operations, a built-in lexical analyzer, and a small heap. 7DS peers operate in the ad-hoc mode of
ieee802.11.
Figure 2.12 presents the GUI of the current 7DS implementation that
allows users to configure parameters, such as the update view time period,
broadcast query time period, and query timeout. The central 7DS interface
that displays the queries and corresponding responses is shown in Figure 2.13.
Several aspects of the current 7DS implementation can be improved; For
instance, the code is large and complex. However, it can be simplified significantly using libraries included in recent Java versions. All methods required
by the applications could be collected to a single class. The time to load web
pages—or files for the supported applications—can be also improved. In addition, further experimentation and extension of 7DS to run on smaller devices,
such as smart phones and tablet computers, is required.
2.5 Applications using information sharing via 7DS
39
Fig. 2.13. 7DS main interface. In the upper part of the interface, the user can enter
a url or form a keyword-based query or view the cache manager or configuration.
Query results are in the lower part. In this example, query 1 is pending, whereas
there are responses for queries 0 and 2. Queries 0 and 2 have been expanded, showing
their received reports that include urls.
2.5.1 Web browsing
Although the web is not primarily a location-dependent or collaborative application, it was selected because of its prevalence.
7DS instances share web pages by sending queries containing urls. After
receiving a query, a peer searches its cache, and if a match exists, it forms
and broadcasts a report. Such reports can be viewed by the user, who may
select the most relevant report and initiate an http get request to acquire
the complete web page (Figure 2.13). Each 7DS instance runs a miniature web
server, which responds to the http get requests.
2.5.2 Notesharing and whiteboard tool
The notesharing and whiteboard applications attempt to improve the collaboration of participants in a seminar, classroom, or meeting by enabling them
to circulate a presentation, share and merge their notes for any slide, send
queries, and respond to queries. Apart from the core notesharing feature, the
40
2 7DS architecture for information sharing
Fig. 2.14. The main interface of the buddylist and whiteboard.
application includes a remote control presentation functionality, buddy list,
and virtual whiteboard. Notesharing uses Microsoft PowerPoint—a popular
format for presentations—and allows users to take notes for a particular slide.
Let us assume a setting in a classroom with audience and speaker running
the notesharing application on their 7DS-enabled devices. By default, the
speaker’s device is the master host, and whenever a slide changes, the system
notifies the peers in the (notesharing) multicast group about these changes.
Peers may remain synchronized with the current presentation or discard these
notification messages. Furthermore, a user may change a slide of the current
presentation being displayed by clicking a button on the main user interface.
The master host can disseminate the local presentation to peers. Users can
search for a specific slide while the application alerts the speaker by changing
the color of the title in slides with pending queries.
The buddy list implementation is similar to the messenger-type of applications available on the Internet. When a device joins the multicast group, the
name of its local user is added to the list, while it is removed when it leaves,
and highlighted when the local user sends a question.
Users can exchange notes in real-time during a meeting. The application
maintains an internal list of notes for every slide; objects that include a list of
topics, author, slide number, and a description. Once a topic and a description
are set for a specific slide, the user may click “submit” to add the notes to
the internal list. The new notes are then sent to the multicast group and/or
added to the 7DS cache. Notes can be imported from Microsoft PowerPoint,
and exported back at the end of the presentation. Users can not only take
2.5 Applications using information sharing via 7DS
41
notes but also draw pictures on the virtual whiteboard, show their drawing
on the current screen, and clear the whiteboard (Figure 2.14).
The notesharing and whiteboard tool was developed using Microsoft Visual
C++ on a standard PC and the embedded version with Microsoft Embedded
Visual C++. Both versions—desktop and embedded—are based on the Microsoft Foundation Classes (MFC). More information about the notesharing
and whiteboard application can be found at [209].
2.5.3 Multimedia traveling journal
Fig. 2.15. The system can superimpose pictures or other multimedia files on a
Google map at certain positions that correspond to the locations at which the attached pictures were taken. A marker indicates the number of files associated with
that location. A user may click on a photograph to expand it or enter notes.
The multimedia traveling journal application enables users to build interactive multimedia journals that associate multimedia files with locations
on maps. It runs on top of 7DS, and through 7DS, it allows local peers to
share files associated with certain locations. The multimedia files and maps
are stored in the cache of the local 7DS instance. A user can add pictures to
a certain point on the map by clicking on the map and browsing the image
files corresponding to that location (Figure 2.15). Moreover, the user can add,
modify, or delete comments on a certain multimedia file, change its permission, and rate its content. A multimedia file can be public or private, and only
public files are shared with other peers.
42
2 7DS architecture for information sharing
The multimedia traveling journal searches other 7DS peers for multimedia
files associated with a given area which has been marked on the map by the
user. It forms a 7DS query and multicasts it in the 7DS manner. Furthermore,
it maintains and displays the list of neighboring 7DS peers, updating it upon
the receipt of a 7DS response. Areas on the map associated with multimedia
files can be distinguished by a marker that also indicates the number of the
available relevant files.
A user may search for multimedia content related to a certain location in
the following manner: First, the user indicates the region of interest by marking the corresponding area on the displayed map (e.g., the white rectangular
on the map illustrated in (Figure 2.16). Then, the local 7DS instance will
search for relevant data in its cache, on the web, and in the cache of other
peers. Specifically, the local 7DS instance will first check its local cache for
multimedia files associated with that area. If the search is successful, it will
display a marker with a number indicating the number of multimedia files
associated with that location. In the case that no relevant data can be found,
7DS’s web client attempts to acquire it from the Internet by accessing a predefined web site. Finally, if the web client fails to acquire the requested data
(e.g., in the case of intermittent connectivity to the Internet or unavailability
of a web server), 7DS will form a media query and multicast it to its peers.
The queries are formed using location-based or rate-related criteria. The
response of a peer includes the multimedia files, reviews, and ratings and can
be displayed. The user frontend of the application is a web browser-based
interface that communicates with the local application server using http. It
consists of a Google Maps map frame on the right and a photo bar on the
left side of the window. It employs JavaScript and AJAX[3, 2] to produce a
dynamic and interactive application, instead of just a static web page. Its
backend runs on 7DS. It receives all queries from the frontend through 7DS’s
proxy server, and supports the typical 7DS functionality by adding or deleting
photos, querying photos from 7DS neighbors or handing out photos from the
local cache. 7DS can also cache Google Maps files, enabling the application to
work without an Internet connection.
CLS and/or gps—running as underlying location-sensing mechanisms—
periodically record the coordinates of the current position of the device with
a timestamp in a positioning trace. Users can upload pictures and videos with
their associated timestamp. The multimedia traveling journal can correlate
the timestamp information of the multimedia content with the positioning
trace and associate the multimedia files with certain areas of a map. The
application can also display user’s position- or movement-related information
on a map, provide “post-it” related functionality, and support various type
of devices. Specifically for thin clients (e.g., smart-phones), we implemented
the multimedia traveling journal using a more centralized approach, in which
a client acquires multimedia files from a predefined web server. A comparative performance analysis on delay characteristics of the peer-to-peer and
centralized approaches can be found in [231].
2.6 Related mobile peer-to-peer computing systems
43
Fig. 2.16. The user can mark the area for which more pictures are requested. Via
the local 7DS, the application searches for pictures in the defined area and may
select them from a specific user. The local 7DS peers appear in a window.
2.6 Related mobile peer-to-peer computing systems
In the wired WAN domain, several peer-to-peer systems gained popularity in the early 2000s. A non-exhaustive list of them includes: Napster,
Gnutella, Freenet, Kazaa, BitTorrent, eDonkey2000, emule, DC++, Morpheus, Bearshare, iMesh, Grokster, Ares, Soulseek, GreenTea, Shwup, and
Avalanche [316, 262]. Skype is a voice over ip system that applies the peerto-peer paradigm with a growing number of users. In these systems, peers are
typically stationary clients. Measurement studies have also shown the use of
these applications over wireless LANs.
Unlike these systems that are used mostly over wired networks by stationary peers, the United Villages project applies the mobile peer-to-peer
paradigm by enabling APs, that are installed on vehicles to transfer data
when they are within range of a real-time wireless Internet connection. In
the context of relaying, a mesh networking-related company PacketHop recently released a set of specialized software that can be embedded into mobile
devices, allowing the device to act as a relay node.
Imielinski and Badrinath were among the first to propose an infrastructure
for supplying information services, such as e-mail, fax, and web access to
mobile users by placing infostations at traffic lights and airport entrances.
Infostations—first mentioned in the context of the DataMan project [303]—
44
2 7DS architecture for information sharing
use a single server/multiple clients model in which the server broadcasts data
items based on received queries.
As in the case of 7DS, Portolano [138] also aims to provide service discovery
to mobile clients with intermittent connections, assuming a hybrid world of a
wired infrastructure and wireless links. Its emphasis is on user interfaces that
allow mobile clients to discover the semantics of any service and present an
interface suited to the client’s needs and resource limitations. Odyssey [277]
was one of the first platforms designed to enable applications on mobile devices
to adapt their media quality based on bandwidth availability without explicit
knowledge of one another. For example, when the bandwidth available to
a video player drops, it could switch to a video stream with fewer colors
and coarser resolution rather than stop completely its transmission. Mobile
Chedar [232]—an extension to the Chedar [67] peer-to-peer middleware—
provides mechanisms for data streaming between Mobile Chedar nodes and
between the Mobile Chedar and Chedar networks. Proem [230] is middleware
for developing applications for mobile ad-hoc networks, providing mechanisms
for presence and discovery services.
MOBY [187] enables access to services in wide-area networks using the
Jini technology. Unlike 7DS—which does not require any registration to an
external server—MOBY is based on super-peer architecture: the network is
divided into domains by Mnode super-peers. As in a fixed overlay network,
the links between Mnodes are preconfigured. LightPeers [119] is a lightweight
platform for mobile peer-to-peer networking, targeted to enable mobile devices with limited capabilities to produce, organize, present, and share digital
material.
2.7 Conclusions
Peer-to-peer computing manifests several attractive characteristics:
•
•
•
self-organization, autonomy, and decentralization
relatively low cost of ownership and sharing by using existing infrastructures and by distributing the maintenance costs
relatively low cost of accessing resources by enabling resource sharing and
low-cost interoperability
To be effective, mobile peer-to-peer computing applications depend on a substantial deployment, cooperation, interoperability, and scalability. In resource
allocation, often a tension between cooperation and competition exists. Typically, the scarcer the resources are, the less collaborative the systems tend
to be. In other cases, a system may adjust its cooperation-competition policies dynamically depending on the availability of a resource. Given the energy
constraints and the nondeterministic characteristics of the environment, such
resource allocation algorithms are non-trivial.
2.7 Conclusions
45
While poor protection of resources can impede the use of a peer-to-peer
system, high cost and strict conditions to access the resources can dissuade
it. The design of a mobile peer-to-peer system needs to balance these two
needs. To prevent denial of service attacks, encourage cooperation, and better
allocate resources, the use of micropayment- and/or reputation-based mechanisms can be important. However, these mechanisms should have a relatively
low overhead, in order to not discourage the active participation of peers.
Security and game theory can be applied to address these problems [98].
Increasingly wireless devices collect a large amount of information that
can be analyzed to reveal the personal and social context of the user. Such
information can be used to support various location-based and context-aware
services. At the same time, this abundance of information makes users vulnerable to intrusion of privacy threats. These threats include the identification of
the position of the device and potentially, the identity of the subject using the
device, which can be acquired directly or inferred using statistical analysis.
Malicious users can abuse such information by spamming users with advertisements or disclosing it inappropriately. Thus, a tradeoff between enhancing the
information access and disclosing private information inappropriately is exposed. The larger the availability of information, the more likely is to enhance
the information access but the larger the vulnerability in privacy threats.
To sustain long-term use, mobile peer-to-peer systems need to be flexible
and dynamic and privacy will play an important role in their adoption. Currently, 7DS offers a crude distinction between private and non-private objects
and a finer way to describe their privacy requirements is needed. However,
privacy is context sensitive and depends on the social context, user activity,
ownership of the device, application, and personality of the user. Depending
on the context, the system—with or without any user intervention—may decide about the privacy and cooperation policies. Thus, it is critical to provide
mechanisms that allow a fine-level description of the privacy requirements and
draw a balance between enhancing the service and protecting user privacy.
Information retrieval and querying are at the core of 7DS. Providing
semantic-based annotation, discovery, and retrieval of the multimedia information can further enhance the access of information. The development of
methods for contextual-knowledge representation and reasoning that involves
modeling contextual aspects (e.g., people, devices, locations, and events) is
also necessary.
To assist the deployment of mobile peer-to-peer computing systems, a
fruitful approach would include the development of the following components:
•
•
•
a general infrastructure for mobile peer-to-peer applications and a toolkit
that new applications could use
robust mobile peer-to-peer applications with friendly GUIs that can also
control the distribution of data and form context- and semantic-based
queries
protocols that ensure anonymity and privacy
46
•
2 7DS architecture for information sharing
mechanisms that encourage cooperation among peers in an energy-efficient
manner
Mobile peer-to-peer computing may enhance the formation of on-line communities of mobile users and create new socio-technological paradigms. The
mobile peer-to-peer paradigm—with its distinct feature of cooperation—can
be applied to facilitate the information access and sharing among devices for
the support of context-aware services. An underlying objective of such services is the recognition and characterization of the users’ contexts without
interrupting them from their main tasks. This involves research in domains
that span from networking and systems to contextual information representation and reasoning, multi-modal user interfaces and graphics. Thus, mobile
peer-to-peer computing, combined with context-aware computing, opens up
exciting challenges in computer science, demanding interdisciplinary research
and innovative paradigms.
3
Performance analysis of information discovery
and dissemination
This chapter focuses on the impact of the wireless coverage range, querying
mechanism, density of hosts, and cooperation on information dissemination. It
presents performance analysis results acquired via extensive simulations and
discusses theoretical models.
3.1 Information discovery schemes
Pervasive computing environments evolve rapidly, encompassing a range of
heterogeneous networked systems that have been integrated into physical objects. These environments include a plethora of new human-computer interfaces for seamless interaction across a range of devices, varying from wearable
platforms to large displays, sensors, and networked physical objects in interactive rooms, urban settings or rural areas. Examples of wearable platforms
include not only laptops and PDAs but also external, on-body sensors, various prosthetics and implantable electronics (e.g., cochlear implants, visual
prosthetics, ocular video implants).
Urban environments with users accessing wireless-enabled devices and running location-based services inspired the design of 7DS. We anticipate that
such environments—especially during rush hours in a platform of a train or
a commercial center or a campus—will manifest high spatial locality of information. More generally, we expect that pervasive computing spaces with
wirelessly-enabled physical objects that exhibit high spatial locality of information can apply the peer-to-peer paradigm to enhance their wireless access,
particularly in areas with weak signal or limited access to the Internet.
The cooperation among mobile devices ripples throughout in this work.
Cooperation in this context is realized through data sharing, querying and
data forwarding, relaying messages to an Internet gateway, and caching popular data objects. In general, 7DS devices operate in different modes based on
their cooperation strategies, power conservation schemes, and query mechanisms.
48
3 Performance analysis of information discovery and dissemination
7DS-enabled devices can interact either in a peer-to-peer (P-P) or serverto-client (S-C) manner. In P-P, 7DS-enabled devices cooperate with each
other according to the peer-to-peer paradigm. Unlike P-P, S-C is asymmetric:
there are 7DS-enabled servers that respond to queries and non-cooperative,
potentially resource-constrained clients.
Throughout the text, the term 7DS node or 7DS host or simply host are
used interchangeably to indicate any 7DS-enabled device, and 7DS peer or
simply peer any 7DS-enabled device that employs the peer-to-peer paradigm.
These different modes of operation allow 7DS to instantiate the different mobile information access schemes when possible, and provide complementary
access through peers, when an infrastructure is not available.
A 7DS host acquires the data from the local cache of a peer (P-P) or
server (S-C) within its wireless coverage using single-hop multicast. Due to
the highly dynamic environment, 7DS does not try to establish permanent
caching or service discovery “paths”.
To determine the impact of the various modes of operation on mobile
data access, variations on P-P and S-C are also proposed. For example, an
extension of S-C, the hybrid S-C schemes, allow some types of cooperation
among clients. Other P-P schemes enable forwarding of queries or responses
to extend their coverage.
7DS employs a simple mechanism that periodically activates the network
interface, resulting in an alternation of on and off intervals that takes place in
an asynchronous or synchronous manner (Table 3.1). During the on intervals,
nodes may communicate with their peers.
•
•
In asynchronous mode, on and off intervals are equal but not synchronized.
In synchronous mode, the on and off intervals are synchronized among
hosts, although not necessarily equal.
Power conservation
Asynchronous (default)
Synchronous (“sync”)
Description of on and off intervals
equal but not synchronized
not equal but synchronized
Table 3.1. Power conservation schemes in 7DS.
The search for data objects takes place using active or passive querying (as
shown in Table 3.2). Clients search for a data object by passively or actively
querying for it.
•
In passive querying, a client multicasts its queries only when it is in the
range of an information server. A server announces its presence to clients
in its wireless range through advertisement messages. Upon the receipt of
such an advertisement, a client in passive querying responds by sending
its queries.
3.1 Information discovery schemes
•
49
In active querying, a client periodically multicasts its query for a data
object until it receives the relevant data. Active querying is the default
querying mechanism.
Scheme
Active (default)
Passive
Query transmission
Periodic broadcast
Upon the receipt of an advertisement from a server
Table 3.2. Querying schemes in 7DS.
Depending on the type of cooperation, three variations of P-P are proposed:
•
•
•
data sharing (DS)
forwarding (FW)
both data sharing and forwarding enabled (DS+FW)
When forwarding is enabled, upon the receipt of a query or data, 7DS peers
rebroadcast it. To prevent flooding the network, a host ignores the query
or data, if it has already rebroadcast this query or data during the last ten
seconds. For example, host A queries for the data and host B receives host
A’s query. Assuming that host B does not have the relevant data, it will
rebroadcast host A’s query. When another host residing in the range of host
B (e.g., host C) receives host B’s message, it will rebroadcast host A’s query, if
it does not have any relevant data. Host B will receive the query rebroadcast
by host C but will ignore it. In all P-P-based schemes, all nodes are mobile
with active querying enabled.
Depending on the mobility of the information server, S-C schemes are
classified into two categories:
•
•
mobile information server (MIS)
fixed information server (FIS)
In straight S-C, clients are mobile, noncooperative, receiving data only from
the server via active querying, with the energy conservation mechanism disabled. The hybrid S-C schemes assume passive querying and fixed server.
Table 3.3 summarizes the 7DS schemes with their querying mechanism.
Scheme
FIS
MIS
P-P
Hybrid
Cooperation
only server
only server
all hosts
server, all clients
Server mobility
stationary
mobile
N/A (no server)
stationary
Options
DS
DS
DS, FW, DS+FW
DS, FW, DS+FW
Querying
active
active
active
passive
Table 3.3. Summary of the schemes with their querying mechanism.
50
3 Performance analysis of information discovery and dissemination
To investigate the impact of transmission power, cooperation, querying,
and energy conservation, P-P and S-C schemes—along with their variants—
are evaluated. For example, to examine the impact of the cooperation, we
compare P-P with straight S-C. The contrast of MIS with FIS reflects the impact of server mobility, whilst the comparison of DS with DS+FW highlights
forwarding. Such performance analysis is amenable to an analytical solution
only for very simplified user mobility and interaction patterns. Furthermore,
modeling user mobility and interaction is challenging, not only due to the
difficulty of setting up large-scale testbeds for empirical studies, but also due
to the dependency on the specific environment. Thus, to assess the performance of information dissemination, extensive simulations for different modes
of operation of 7DS were performed. In this work, the emphasis is on the
short-term behavior of the information dissemination. Its long-term behavior
has been studied by another group [254] and a brief summary of their main results is also presented. Preliminary analytical results using diffusion-controlled
processes theory are also discussed.
3.2 Simulation assumptions
The simulations are not tied to the 7DS implementation, as we wish to uncover
the general trends and prominent parameters. To simplify the analysis, the
following assumptions are made:
•
•
•
There is a single data object to be queried.
At the beginning of each simulation experiment, only one node has the
data object, while all the remaining ones are interested in acquiring it.
In S-C, the servers are the original dataholders.
The performance of the data dissemination is evaluated using the following
parameters:
•
•
the percentage of hosts that acquire the data as a function of time
the average delay between sending their first query and successfully receiving the data
Our simulations assume a two-dimensional world, with nodes roaming in
a 1 km × 1 km area according to the random waypoint mobility model. This
random walk-based model is frequently used for individual (pedestrian) movement [92, 324, 359]. The random waypoint breaks the movement of a mobile
host into alternating motion and rest periods. Each mobile host starts from
a different position and moves to a new randomly chosen destination. The
coordinates of a destination are selected according to a Uniform distribution
from the interval [0 , 1) km. Each node moves to its destination with a constant speed selected randomly from a Uniform distribution in the interval
(0 , 1.5) m/s. When a mobile host reaches its destination, it pauses for a fixed
3.2 Simulation assumptions
51
amount of time, then chooses a new destination and speed (as in the previous
step) and continues moving.
The query interval consists of an on and off interval. The broadcast is
scheduled at a random time during the on interval. The asynchronous mode
is the default power conservation method, while schemes with the synchronous
mode enabled are explicitly denoted with the word “sync”. In schemes with
no power conservation, the off interval is equal to 0 and there is a concurrence
of the on and query intervals during which the exchange of queries, reports,
and advertisements takes place. A cooperative dataholder responds to a query
by sending the data object.
As all simulations assume one data object, the host density reflects the
popularity of the data. By varying this density, the impact of the popularity of the data can be highlighted. For example, a density of ten nodes per
square kilometer may correspond to the dissemination of a local news article across users with wireless-enabled devices during a rush hour at Grand
Central Station in Manhattan.
We used the ns-2 simulator [144] with the mobility and wireless extensions
from the CMU Monarch project [54]. 300 different scenarios were generated,
each defining the distribution, movement, wireless range, and type of each
host that participates in an experiment. Simulations were run using these
scenarios, for the different schemes of Table 3.3.
The radio propagation Pr is based on the two-ray ground reflection model,
in which the received power at a distance d is estimated by
Pr (d) =
Pt Gt Gr h2r h2t
d4
(3.1)
where Pt is the power of transmitted signal, hr and ht are the heights of receiver and transmitter antenna, respectively, and Gr and Gt are the gains of
signal at the receiver and transmitter, respectively [311]. Varying the transmission power, through the high, medium, and low levels, the resultant wireless
ranges are approximately 230 m, 115 m, and 57.5 m (Table 3.4). The wireless
LAN is based on ieee802.11.
Parameter
Pause time
Mobile user speed
Server advertisement interval
Forward message interval
Transmission power
Wireless ranges (trx power)
Value
50 s
(0,1.5) m/s
10 s
10 s
281.8 (high), 281.8/24 (medium), 281.8/28 (low) mW
230 (high), 115 (medium), 57.5 (low) m
Table 3.4. Simulation parameters in 7DS.
52
3 Performance analysis of information discovery and dissemination
Parameter
Power conservation
Query interval
Simulation time
Shape of the environment
Value
Asynchronous
15 s
25 min
1 km×1 km
Table 3.5. Default setting in 7DS simulations.
3.3 Data dissemination benchmarks
The performance analysis focused on the following benchmarks:
•
•
the percentage of nodes that acquire the data object (i.e., have become
dataholders) as a function of time
the average delay for a mobile host to receive the data objects since the
transmission of its first query
The percentage of new dataholders reported in the plots was computed excluding the original dataholder. Only these dataholders were considered for
computing the average delay.
To explore the temporal evolution of data diffusion, the simulation time
was varied from 25 minutes to 50 minutes. The 95% confidence interval for the
average percentage of dataholders is within 0-11% of the computed average,
with the variance tending to be higher for low host density.
3.4 Density of dataholders
7DS proves to be an effective data dissemination tool for high transmission
power. Even for a sparse network, 77% of nodes will acquire the data during
the experiment, while for denser networks, this percentage becomes 96% or
more. The effect of cooperation can be highlighted by the comparison of P-P
with FIS. Figures 3.1 and 3.2 show the percentage of dataholders as a function
of the density of hosts in P-P and S-C with a query interval of 15 s. For
example, in a setting of 25 hosts, P-P outperforms FIS by 55%. In particular,
in P-P, 99.9% of hosts will acquire the data after 25 minutes, compared to
42% of the users in FIS. For lower transmission power, P-P outperforms FIS
by up to 70% (e.g., DS for 25 peers and medium transmission power, as shown
in Figure 3.2). The impact of data sharing among peers is also apparent in
hybrid schemes, becoming more evident in settings with ten hosts or more per
square kilometer and medium or high transmission power.
Forwarding in addition to data sharing does not result in any substantial
performance improvement due to the low probability for a querier to reach a
dataholder during a simulation run only via a multi-hop neighbor. A nth-hop
neighbor of host A is any host B that can reach host A by at least n hops,
3.4 Density of dataholders
53
100
Dataholders (%)
80
60
40
P-P: DS
P-P: DS (power cons.)
P-P: DS+FW (power cons.)
S-C: FIS
S-C: MIS
Hybrid: DS+FW
Hybrid: FW
Hybrid: DS+FW (power cons.)
20
0
5
10
15
20
25
Density of hosts (#hosts/sq.km)
Fig. 3.1. Percentage of dataholders after 25 minutes for high transmission power.
which are relay hosts different from A and B. In settings with a larger number
of relay hosts, the impact of forwarding is expected to be more significant. Forwarding without data sharing also improves performance. For example, hybrid
schemes with forwarding-enabled outperform FIS by up to 40%, depending
on transmission power and peer density.
The performance of P-P improves substantially as the number of hosts increases. On the other hand, the performance of FIS and MIS remains constant
as the number of hosts increases, given that a data exchange takes place only
when a querier is in close proximity to the server. Depending on the transmission power, MIS outperforms FIS by approximately 22%, 16%, and 6%,
respectively. The only difference between MIS and FIS is the mobility of the
server, and thus, the relative speed between the server and a client. Due to
the higher relative speed, a client is in the range of the server more frequently
in MIS than in FIS and thus acquires the data faster.
Power conservation
Asynchronous
Synchronous (“sync”)
On period Off period
7.5 s
7.5 s
1.5 s
13.5 s
Table 3.6. Power conservation scheme parameters in 7DS.
54
3 Performance analysis of information discovery and dissemination
100
Dataholders (%)
80
P-P: DS
P-P: DS (power cons.)
P-P: DS+FW (power cons.)
S-C: FIS
S-C: MIS
Hybrid: DS+FW
Hybrid: FW
Hybrid: DS+FW (power cons.)
60
40
20
0
5
10
15
20
25
20
25
Density of hosts (#hosts/sq.km)
(a) Medium transmission power
100
P-P: DS
P-P: DS (power cons.)
P-P: DS+FW (power cons.)
S-C: FIS
S-C: MIS
Hybrid: DS+FW
Hybrid: FW
Hybrid: DS+FW (power cons.)
Dataholders (%)
80
60
40
20
0
5
10
15
Density of hosts (#hosts/sq.km)
(b) Low transmission power
Fig. 3.2. Percentage of dataholders after 25 minutes.
3.4 Density of dataholders
55
100
Dataholders (%)
80
DS (high trx power)
60
DS (medium trx power)
DS (low trx power)
DS+FW (high trx power)
DS+FW (medium trx power)
40
DS+FW (low trx power)
20
0
0
500
1000
1500
Time (s)
2000
2500
3000
(a) Ten cooperative hosts
100
DS (high trx power)
DS (medium trx power)
80
DS (low trx power)
DS+FW (high trx power)
DS+FW (medium trx power)
Dataholders (%)
DS+FW (low trx power)
60
40
20
0
0
500
1000
1500
2000
2500
3000
Time (s)
(b) Twenty five cooperative hosts
Fig. 3.3. Effect of forwarding on density of dataholders in peer-to-peer with data
sharing enabled (DS).
56
3 Performance analysis of information discovery and dissemination
2000
DS (high trx power)
DS (medium trx power)
DS (low trx power)
1500
DS+FW (high trx power)
DS+FW (medium trx power)
Average delay (s)
DS+FW (low trx power)
1000
500
0
0
500
1000
1500
Time (s)
2000
2500
3000
2000
2500
3000
(a) Ten cooperative hosts
2000
DS (high trx power)
DS (medium trx power)
DS (low trx power)
1500
DS+FW (high trx power)
DS+FW (medium trx power)
Average delay (s)
DS+FW (low trx power)
1000
500
0
0
500
1000
1500
Time (s)
(b) Twenty five cooperative hosts
Fig. 3.4. Effect of forwarding on delay in peer-to-peer with data sharing enabled
(DS).
3.5 Impact of energy conservation
57
3.5 Impact of energy conservation
Empirical studies have shown that wireless network interfaces consume substantial power even in an idle state [147]. Asynchronous energy conservation
results in a 50% energy savings but also some degradation in data dissemination, as the network interface is on only half the time. Figures 3.1 and 3.2
illustrate the—relatively small—degradation in data dissemination due to the
reduced time interval in which hosts can communicate.
For a fixed query interval, the smaller the on interval, the higher the energy savings but also the larger the degradation of data dissemination. To
ameliorate the performance, the synchronous mode is enabled, and the on
and off intervals of all hosts are synchronized. In that case, even a small on
interval does not appear to cause any degradation of the data dissemination.
More specifically, Figure 3.5 (a) illustrates P-P schemes with data sharing
and Figure 3.5 (b) hybrid schemes with data sharing and forwarding. The
query interval is 15 s, in which, during the first 1.5 s the network interface is
on, and switches off during the remaining time (13.5 s), as summarized in Table 3.6. In an ideal setting without packet losses and need for retransmission,
the number of messages exchanged in P-P without power conservation and
those with synchronous power conservation is the same. Therefore, the power
consumption due to packet transmission and reception is the same, while the
power spent on keeping the network interface on is reduced. Given that the
network interface is on for only 10% of the time, the synchronous mode may
result in up to 90% reduction in energy dissipation. However, in networks
with high traffic demand, retransmissions cause further energy expenditure.
In such cases, the availability of a lower power network interface for control
packets, in addition to the regular one for the data exchange can be crucial.
Furthermore, in situations with high traffic demand, the efficient channel assignment (or network interface selection) to a group of peers that is likely to
share information—or more generally, resources— can be important. In these
cases, it is necessary to evaluate the different modes of operation under more
realistic traffic loads, mobility patterns, and link conditions.
Let us now highlight the performance of data dissemination as a function of
the query interval. Its degradation as the query interval increases is relatively
small in FIS compared to P-P due to the fact that opportunities of data
exchange occur less frequently in FIS than in P-P: a querier will be in the
range of a dataholder less frequently in FIS than in P-P.
Figures 3.6 (a) and 3.7 correspond to a relatively sparse network of five
hosts per square kilometer while Figures 3.6 (b) and 3.8 show the results
for a denser network consisting of 25 hosts per square kilometer. In P-P, the
impact of the query interval is more prominent. For example, in the case of
medium transmission power, when the query interval increases from 15 seconds
to 3 minutes, the degradation in the performance of data dissemination is approximately 17%. Further analysis to estimate the optimal querying mecha-
58
3 Performance analysis of information discovery and dissemination
100
Dataholders (%)
80
No power cons. (high trx power)
No power cons. (medium trx power)
No power cons. (low trx power)
Sync power cons. (high trx power)
Sync power cons. (medium trx power)
Sync power cons. (low trx power)
60
40
20
0
5
10
15
20
25
Density of hosts (#hosts/sq.km)
(a) P-P with data sharing
100
Dataholders (%)
80
No power cons. (high trx power)
No power cons. (medium trx power)
No power cons. (low trx power)
Sync power cons.(high trx power)
Sync power cons. (medium trx power)
Sync power cons. (low trx power)
60
40
20
0
5
10
15
20
Density of hosts (#hosts/sq.km)
(b) Hybrid scheme with data sharing and forwarding
Fig. 3.5. Impact of synchronous mode on data dissemination.
25
3.5 Impact of energy conservation
59
100
Dataholders (%)
80
60
40
P-P:DS
P-P:DS (power cons.)
20
P-P:DS+FW (power cons.)
S-C:FIS
0
20
40
60
80
100
Query interval (s)
120
140
160
180
(a) 5 hosts per km2
100
Dataholders (%)
80
60
P-P:DS
P-P:DS (power cons.)
P-P:DS+FW (power cons.)
S-C:FIS
40
20
0
20
40
60
80
100
120
140
160
180
Query interval (s)
(b) 25 hosts per km2
Fig. 3.6. Percentage of dataholders as a function of the query interval. Schemes
with power conservation enabled use the sync mode and high transmission power.
60
3 Performance analysis of information discovery and dissemination
100
Dataholders (%)
80
P-P:DS
60
P-P:DS (power cons.)
P-P:DS+FW (power cons.)
S-C:FIS
40
20
0
20
40
60
80
100
Query interval (s)
120
140
160
180
(a) Medium transmission power
100
Dataholders (%)
80
60
40
P-P:DS
P-P:DS (power cons.)
20
P-P:DS+FW (power cons.)
S-C:FIS
0
20
40
60
80
100
Query interval (s)
120
140
160
180
(b) Low transmission power
Fig. 3.7. Percentage of dataholders as a function of the query interval with five
hosts per km2 . Schemes with power conservation enabled use the sync mode.
3.5 Impact of energy conservation
61
100
Dataholders (%)
80
60
40
P-P:DS
P-P:DS (power cons.)
P-P:DS+FW (power cons.)
S-C:FIS
20
0
20
40
60
80
100
Query interval (s)
120
140
160
180
(a) Medium transmission power
100
Dataholders (%)
80
60
40
P-P:DS
P-P:DS (power cons.)
P-P:DS+FW (power cons.)
S-C:FIS
20
0
20
40
60
80
100
Query interval (s)
120
140
160
180
(b) Low transmission power
Fig. 3.8. Percentage of dataholders as a function of the query interval with 25 hosts
per km2 . Schemes with power conservation enabled use the sync mode.
62
3 Performance analysis of information discovery and dissemination
nism taking into consideration the traffic in the wireless LAN and host coresidency time is required.
3.6 Average delay
An important performance metric is the average delay a host experiences from
the first query until it receives the data. For each test, the average delay of the
nodes that acquired the data by the end of simulation was computed, considering only the hosts that had received the data by the end of the simulation.
The average of all 300 sets—excluding the ones without new dataholders—was
reported. For a 25-minute simulation time, the average delay as a function of
the probability of acquiring the data was computed.
In P-P with data sharing, no energy conservation, and high transmission
power, the average delay is as high as 6 minutes for sparse networks and drops
to 77 seconds for dense networks, while for low transmission power, it climbs
to 13 minutes. For high transmission power in FIS, it is 6 minutes, while for
low transmission power, it reaches 9 minutes. Evidently, (sync) P-P with data
sharing performs better than FIS, even in the case of low host density. For
example, for the same average delay (6 minutes), the probability of acquiring the data in P-P doubles. This becomes clear when we compare P-P in
Figures 3.5 (a) and 3.10 and FIS in Figure 3.9 (a).
Figures 3.9, 3.11, and 3.12 compare FIS and P-P with data sharing and
no power conservation enabled. To attain these figures, the simulation results
for the probability that a host acquires the data, and the average delay it
experiences have been combined. For example, in the case of one server in a
square kilometer area with high transmission power, each “point” (data entry)
of the curves in Figure 3.9 corresponds to a distinct simulation time. Each
point combines the percentage of dataholders at the corresponding simulation
time and their average delay until they become dataholders. The percentage
of hosts that acquire the data in P-P with high transmission power reaches
40% with an average delay of 135 s while for the same delay, 30% of hosts
will acquire the data in FIS. In FIS, a 40% probability of acquiring data
corresponds to an average delay of 6 minutes, whereas using (sync) P-P this
probability doubles, even for low densities of peers. For a higher average delay
of 10 minutes, 85% of hosts will acquire the data using P-P, and 50% using
FIS. In the case of medium transmission power, with an average delay of 315 s,
a host will get the data with a probability of 15% and 22% using FIS and P-P,
respectively.
3.7 Scaling properties of data dissemination
In both P-P and FIS, when the area is expanded but the density of hosts
and their transmission power are kept fixed, the performance of data dissemination remains the same, indicating the robustness of our simulation results.
3.7 Scaling properties of data dissemination
63
One server in an area of 1x1
Four servers in an area of 2x2
Nine servers in an area of 3x3
700
600
Average Delay (s)
500
400
300
200
100
0
20
25
30
35
40
Dataholders (%)
45
50
55
(a) All hosts with high transmission power
1200
One server in an area of 1x1
Four servers in an area of 2x2
Nine servers in an area of 3x3
1000
Average Delay (s)
800
600
400
200
0
5
10
15
20
Dataholders (%)
25
30
(b) All hosts with medium transmission power
Fig. 3.9. Scaling property in FIS: fixed density of servers. Average delay of FIS as
a function of the percentage of dataholders with one server per square kilometer.
64
3 Performance analysis of information discovery and dissemination
900
800
700
Average delay (s)
600
500
No power cons. (high trx power)
No power cons. (medium trx power)
No power cons. (low trx power)
Sync power cons. (high trx power)
Sync power cons. (medium trx power)
Sync power cons. (low trx power)
400
300
200
100
0
5
10
15
20
25
Density of hosts (#hosts/sq.km)
Fig. 3.10. Average delay for P-P with data sharing.
Figure 3.9 shows this scaling property in FIS for high and medium transmission power.
Another interesting scaling property is related to the effect of the density
of cooperative hosts and their wireless coverage density, whilst keeping the
total area of wireless coverage fixed.
Let us assume two deployments of servers with different density of servers
and transmission power but of the same aggregate wireless coverage. For simplicity, let us also consider the free-space model for the radio communication.
The deployment of the larger density of servers is more effective in terms of
power expense and wireless throughput utilization. We found that for fixed
total wireless coverage, the higher the density of cooperative hosts, the better
the performance in FIS and P-P. An intuitive explanation is that the two deployments become equivalent by “scaling down” the deployment of the lower
density of servers to match the other. After this “scaling”, the speed of the
hosts in the deployment with the initially lower density of servers scheme
doubles. Thus, this setting “becomes” the same as the other one in terms of
area, transmission coverage of each server, and server density but with hosts
moving faster. Therefore, the probability of a host to get into the coverage
area of a server increases.
Figure 3.11 compares two FIS settings with the same total wireless coverage density of cooperative hosts (servers). The first includes one server in
a 2 km × 2 km area with high transmission power and the latter four servers
3.8 Models of information dissemination
65
1600
One server in 2x2 (high trx power)
Four servers in 2x2 (medium trx power)
1400
Average delay (s)
1200
1000
800
600
400
200
0
0
5
10
15
20
25
30
35
40
Dataholders (%)
Fig. 3.11. Scaling property in FIS: fixed total wireless coverage of servers. Average
delay to receive the data and percentage of dataholders for different densities of
information servers.
in a 2 km × 2 km area with medium transmission power. The setting with a
higher density of servers performs better. For example, for a 20% probability
of acquiring the data, FIS with a higher density of servers produces an average delay of 500 s. For the same wireless coverage but with a lower density
of servers, the average delay doubles. Similar phenomena holds for P-P, as
illustrated in Figure 3.12 for various host densities.
3.8 Models of information dissemination
This section discusses our initial efforts to analytically study the wireless
data dissemination and further generalize our simulation results. Information dissemination can be realized through gossiping algorithms, which have
been studied analytically. For example, Ravishankar and Singh [312, 314, 313]
presented an optimal broadcasting algorithm, considering a one-dimensional
world in which nodes are placed on a line. Percolation theory [131] has been
also employed for estimating the expected time for a message to spread among
all nodes placed on a lattice. Such studies use the shape theorem that typically assumes a system in two-dimensional co-ordinates, in which each lattice
site is either empty or occupied, and in which the set of occupied sites At at
66
3 Performance analysis of information discovery and dissemination
1600
1 initial dataholder & 5 cooperative hosts in 2x2 (high trx power)
1 initial dataholder & 5 cooperative hosts in 1x1 (medium trx power)
1400
Average Delay (s)
1200
1000
800
600
400
200
0
0
20
40
60
80
100
80
100
Dataholders (%)
(a) 5 cooperative hosts per square kilometer
1600
1 initial dataholder & 20 cooperative hosts in 2x2 (high trx power)
1 initial dataholder & 20 cooperative hosts in 1x1 (medium trx power)
1400
Average Delay (s)
1200
1000
800
600
400
200
0
0
20
40
60
Dataholders (%)
(b) 20 cooperative hosts per square kilometer
Fig. 3.12. Average delay to receive the data as a function of the percentage of
dataholders in P-P with data sharing schemes.
3.8 Models of information dissemination
67
100
Dataholders (%)
80
60
5 hosts
10 hosts
40
15 hosts
20 hosts
25 hosts
20
0
0
500
1000
1500
Time (s)
2000
2500
3000
(a) High transmission power
100
Dataholders (%)
80
60
40
5 hosts
10 hosts
15 hosts
20 hosts
20
25 hosts
0
0
500
1000
1500
2000
2500
3000
Time (s)
(b) Medium transmission power
Fig. 3.13. Performance of P-P with DS and power conservation enabled as a function of simulation time and various cooperative host densities.
68
3 Performance analysis of information discovery and dissemination
time t grows and attains a limiting geometry. Simple epidemic models have
also appeared in [283] and diffusion-controlled processes in [284, 286].
Section 3.8.1 presents a simplistic epidemic model and Section 3.8.2 discusses a novel approach to model data dissemination borrowed from particle
kinetics as well as diffusion-controlled processes.
3.8.1 Simple epidemic model
Mathematical models for the spread of epidemic diseases have been widely
studied [73, 140]. In our case, a disease is equivalent to a data object. These
models analyze the fraction of infected individuals (i.e., mobile nodes) among
a finite population (e.g., an ad-hoc network) and the probability with which
the entire population is infected after a given time.
7DS aims to prefetch and disseminate data for mobile hosts not necessarily connected to the Internet. Its effectiveness, as a data dissemination and
prefetching tool, depends on a variety of parameters, such as node density
in a certain region, node mobility, transmission power, cooperation strategy,
querying mode, and energy conservation. It does not appear to be amenable
to an analytical solution except for simplified versions.
The assumption that in any time interval h, any given dataholder will
transmit data to a querier with probability hα + o(h) can substantially reduce
the complexity. Note that in order for a function f (.) to be o(h), it is necessary
that the limit of f (h)/h is equal to zero as h goes to zero. But if h goes to
zero, the only way for f (h)/h to approach zero is for f (h) to go to zero faster
than h does. That is, for h small, f (h) must be small compared with h [319].
A simple epidemic model can be then used to compute the expected delay
for a message to be propagated to the population of an area, as described in
[321]. For the epidemic model, the following assumptions are made:
•
•
•
A population of N peers at time 0 consists of one dataholder (the “infected” node) and N − 1 queriers (the “susceptibles” ones).
Once a peer acquires the data, the data will be locally stored permanently.
In any time interval h, any given dataholder will transmit data to a querier
with probability hα + o(h).
If X(t) denotes the number of data holders in the population at time t, the
process {X(t), 0 ≤ t} is a pure birth process with rate λk
(N − k)N α, k = 1, .., N − 1
λk =
(3.2)
0,
otherwise
Thus, when there are k dataholders, each of the remaining mobiles will get
the data at a rate equal to kα. If T denotes the time until the data has been
spread amongst all the mobiles, then T can be represented as
T =
N
−1
X
i=1
Ti ,
(3.3)
3.8 Models of information dissemination
69
where Ti is the time to go from i to i + 1 dataholders. As the Ti are independent exponential random variables with respective rates λi = (m − i)iα,
i = 1, .., m − 1, the expected value of T is given by
E[T ] =
N −1
1 X
1
.
α i=1 i(N − 1)
(3.4)
3.8.2 Diffusion-controlled process
We apply a theoretical framework based on diffusion-controlled processes,
random walks and kinetics of diffusion-controlled chemical processes [280] to
model FIS. Let us first describe a diffusion process that is closely related to
information dissemination. Consider a diffusion process that takes place in
a medium with randomly distributed static traps and two types of particles,
namely, S-type (stationary traps or sinks) and M-type (mobile particles) [196].
In such a static trapping model, particles of M-type perform diffusive motion
in d-dimensional space while particles of S-type are static and randomly distributed in space. M-type particles are absorbed by S-type when they collide
with them. The simple trapping model assumes traps of infinite capacity.
The diffusion-controlled processes focus on the survival probability, that is the
probability that a particle will not get trapped as a function of time.
Rosenstock’s trapping model in d dimensions assumes a genuinely ddimensional, unbiased walk of finite mean-square displacement per step and
has a survival probability φt that for large t follows
( d )
2
1
d+2
(3.5)
]( d+2 )t
log(φt ) ≈ −α[log
1−q
where α is a lattice-dependent constant, and q denotes the concentration of
the independently distributed, irreversible traps.
One question is: when Eq. 3.5 is a useful approximation? To answer this
question, most studies have relied on simulations, but so far there is no information available on the range of validity of Eq. 3.5. In [174], Havlin et al.
presented evidence suggesting that Eq. 3.5 is a useful approximation when
ρ > 10
(3.6)
where ρ is the scaling function
ρ = ln
1
1−q
2
d+2
d
t d+2 .
(3.7)
This value of ρ corresponds to a survival probability which is equal to 10−13
in both two- or three-dimensional spaces. Havlin et al. argued that pure simulation techniques will always lead to an exponential decay at sufficiently
long times, rather than to the correct decay given by the theoretically-proven
70
3 Performance analysis of information discovery and dissemination
Eq. 3.5. Their evidence for the new lower value of ρ is based on two numerical
techniques that they developed. One of these is practical for high trap concentrations only (q ≥ 0.9). This case of high trap concentrations is analogous
to our case.
Information sharing in FIS takes place between the server and the querier.
When a 7DS querier is in the range of the server, it acquires the data. It is
easy to draw the analogy between FIS and the trapping model:
•
•
•
The stationary information servers can be modeled as S-type particles
(stationary traps) and the mobile clients as M-type particles.
A data acquisition “corresponds” to a trapping event. When a querier
acquires the data (or “an M-type particle gets trapped”), it remains a
dataholder (or “is trapped”) for the remaining time, and thus the survival
probability corresponds to the probability of not acquiring the data.
The term 1 − φt expresses the fraction of hosts that acquire the data at
time t.
100
Trap model (high trx power, 4 servers in 2x2)
FIS (high trx power, 4 servers in 2x2)
FIS (high trx power, 1 server in 1x1)
80
Trap model (medium trx power, 1 server in 1x1)
Dataholders (%)
FIS (medium trx power, 1 server in 1x1)
60
40
20
0
0
500
1000
1500
Time (s)
2000
2500
3000
Fig. 3.14. Simulation (FIS) and analytical trapping model results.
Figure 3.14 illustrates the analytical and simulation results for data dissemination as a function of time. The analytical results for the trapping model
are derived from Eq. 3.5 (Rosenstock’s trapping model) for high and medium
transmission power.
3.8 Models of information dissemination
71
Let us define the wireless density of servers q as π R2 Ns /A2 , where Ns is
the number of servers placed in an area of size A × A, and R is the wireless
range equal to 230 m and 115 m for high and medium wireless range, respectively. For the results in Figure 3.14, we used the FIS simulations, described
in Section 3.4.
To investigate if our simulation results were consistent with the diffusion
model of Eq. 3.5, our simulation scenarios were extended for longer time periods. We calculated the α of the survival probability with maximum likehood,
and compared the simulation results with the theoretical data. Let us fix the
duration t, dimension d, and wireless density of servers q in Eq. 3.5. The
number of dataholders can be viewed as a binomial random variable with the
number of trials the number of initial non-dataholders (Nndh ) in the scenario
and a probability of success (the probability of acquiring the data) equal to
1-φt . We run the simulation scenario for a number of times (e.g., 30). In each
iteration i (i = 1, 2, ..., 30), there are Xi dataholders at the end of the run
(simulation time t). In this case, the likelihood to acquire the data can be
solved analytically. We obtain
P
(Nndh − Xi )
.
(3.8)
φt = i P
i Nndh
The intuitive explanation is that the probability of not becoming a dataholder
is the sample proportion of non-dataholders over the 30 iterations [105]. Then,
maximum likelihood can be used for the estimation of α. Following these steps,
in the case of FIS with high transmission power and a duration of 20,000 s,
the maximum likelihood estimation of α is equal to 0.0332. We repeated the
estimation for the FIS scenario, where there is one server in a square kilometer
area, varying only the duration. The average value of α was computed as ᾱ =
P
t αt , where αt is the value of α estimated for a simulation duration t=2,000 s,
6,000 s,. . ., 20,000 s. Figure 3.15 does not indicate any convergence of α to a
specific value. Our conjecture is that this is due to the parameters of this
scenario, and particularly, the relatively small area of the terrain compared
to the large wireless coverage of the server (radius of 230 m compared to one
square kilometer of the terrain). Using Eq. 3.5, (1 − φt ) × 100% matches
our simulation results for the percentage of dataholders at time t for the FIS
scheme we described.
To evaluate the goodness-of-fit of the trap model with our simulation data,
we computed the coefficient of determination between the average percentage
of dataholders in the simulations and the trapping model using α = ᾱ in
Eq. 3.5. The coefficient of determination was found to be 0.9921. Figure 3.16
shows the 95% confidence interval of each FIS simulation scenario and the
trapping model. The variance in our simulations is large but the model is
within the simulation envelope.
72
3 Performance analysis of information discovery and dissemination
Fig. 3.15. The α parameter of Eq. 3.5, estimated using maximum likelihood for
FIS, high transmission power, and varying the tracing period.
3.9 Discussion
7DS can instantiate a different data access mechanism using either the serverto-client or the peer-to-peer paradigm. This chapter focused on the performance analysis of some simple 7DS schemes, each using a different paradigm
(e.g., FIS, MIS, and P-P). Specifically, it analyzed the impact of the density
of servers (FIS and MIS schemes) and peers (P-P schemes), their wireless
range, querying frequency, and energy conservation on the performance of
information diffusion.
The server-to-client paradigm in the context of information access for mobile devices has been employed by infostations. A typical infostation is a server
that broadcasts data items based on received queries or a predefined schedule.
Imielinski and Badrinath were among the first to study the performance of infostations. In their research, they mostly addressed issues related to efficient
scheduling algorithms for the server broadcast that minimize the response
delay and power consumption of mobile devices and efficiently utilize the
bandwidth of the broadcasting channel [198, 303, 79]. Imielinski et al. [198]
explored methods for accessing broadcast data in such a way that running
time (which affects battery life) and access delay (waiting time to receive
3.9 Discussion
73
100
FIS (high trx power, 1 server)
Trap Model (high trx power, 1 server)
90
Dataholders (%)
80
70
60
50
40
30
0
0.5
1
Time (s)
1.5
2
4
x 10
Fig. 3.16. Simulation (FIS) and diffusion model (Trap model) with one server of
high transmission power for a longer time horizon (95% confidence interval).
data) are minimized. The provision of an index- or hash-based access to the
data transmitted over the wireless channel can significantly improve the battery utilization. Barbara et al. [79] studied a taxonomy of cache invalidation
strategies and the impact of clients’ disconnection times on their performance.
Assuming a deployment of infostations enabling a wide-area wireless network access, Ye et al. [362] evaluated a prefetching operation for mobile users.
They designed a prefetching algorithm for a map-on-the-move application
that exploits a hierarchical representation of information in multiple levels
of detail. Based on location, route, and speed information, their algorithm
predicts future data access and delivers maps on demand for instantaneous
route planning, at the appropriate level of detail. When a mobile device enters the infostation coverage area, it prefetches a fixed amount of bytes that
corresponds to a map with a certain level of detail. The effectiveness of infostations was compared to a traditional wide-area wireless network, by varying
the infostation density and coverage. Unlike FIS, in which mobile hosts have
no wide-area network access, in [362] devices are constantly connected to a
low-speed wireless network. Specifically, when these devices are within the
infostation coverage, they use a high bandwidth link, whereas, outside these
74
3 Performance analysis of information discovery and dissemination
regions, their requests are passed to the server via a conventional cellular basestation. They also showed that it is more efficient to have a larger number of
infostations with small range than fewer infostations with large range.
The performance analysis study that is closest to the one presented in
this chapter is a followup work by Lindemann and Waldhorst [254]. They
modeled the spread of multiple data items assuming finite buffer capacity at
mobile devices and a least-recently-used buffer scheme. Their analysis explored
several variants of 7DS with and without power conservation as well as with
and without support of fixed information servers. They mainly concentrated
on its long-run performance and reported the following interesting results:
•
•
Neither the transmission range nor the selected variant of 7DS has a significant impact on the fraction of dataholders in the long run. However for
high transmission ranges, the selected variant of 7DS does have a significant impact on the hit rate. Depending on the 7DS variant and the buffer
size, hit rates between 0.48 and 0.92 can be achieved.
The medium transmission range yields higher hit rates than using aggressive power conservation at a high transmission range.
Recent studies have explored the analytical properties of information dissemination in constantly-connected networks that form various topologies, including scale-free and small-world networks (e.g., [61, 227, 248, 41]).1 As mentioned in Section 3.8.2, theoreticians have been also studying the problem of
diffusion and particle kinetics. Recently Kesten and Sidoravicius [221] showed
that in the long run, all particles will be concentrated in an area that grows
linearly. An attractive feature of the diffusion-controlled process is that it can
provide elegant theoretical tools to investigate data dissemination for different
network topologies. However, the extension of these research efforts to incorporate parameters, such as the expiration of data objects, buffer size, buffer
management policy, type of interaction among devices, cooperation strategy,
and time-varying network topologies unfolds several challenges.
To simplify the analysis of data dissemination among mobile peers in
DTN environments, this monograph considered that peers communicate via
broadcasts and restricted forwardings. More efficient routing protocol could
be adopted to facilitate the communication among peers. In general, routing,
epidemic, and gossiping algorithms for ad-hoc and sensor networks have received a lot of attention since 1980s and numerous routing protocols have been
proposed (e.g., AODV, DSR, TORA, DSDV, ADV, ZRP, LAR). To evaluate
their performance, comparative analysis studies have been performed, mostly
via simulations (e.g., [101, 207, 126, 193, 89, 298, 161, 228]) using metrics, such
1
Small-world networks are characterized by a high degree of clustering and small
distances between any two nodes in the network. Scale-free networks exhibit a
power law degree distribution. A power law has a heavy tail, which makes values
far beyond the mean much more likely than for light-tailed distributions [166].
Unlike other distributions, such as the exponential distribution, which drops off
very quickly beyond the mean, power laws do not possess a characteristic scale.
3.9 Discussion
75
as the energy dissipation, packet latency, routing overhead, and throughput
per flow. However, traditional routing protocols for ad-hoc networks do not
perform well in DTNs, since mobile peer-to-peer computing applications often form sparse, intermittently-connected networks with frequently unstable
paths. Flooding algorithms can be also problematic in DTNs of mobile devices with limited resources. Despite their simplicity, robustness, and relatively
low delay, their high energy, bandwidth, and memory requirements dissuade
their usage in such networks. Since early 2000s, several routing approaches for
DTNs have been proposed, investigating the following parameters:
•
•
•
caching policy, buffer size and management (e.g.,[127, 254])
use and control of relaying nodes (e.g., [367, 167, 366, 360, 252]) and
relaying policy (e.g., [97])
use of knowledge about device mobility and location (e.g., [97, 252, 202,
360, 366])
The impact of mobility on the design of forwarding algorithms for DTNs has
also generated a lot of interest in the last few years (e.g., [108, 189, 270]).
For example, Chaintreau et al. [108] analyzed the inter-contact times using
traces from real-life testbeds with human mobility and evaluated its impact
on forwarding algorithms. The use of forwarding to enhance the information
access in ad-hoc networks has been also studied theoretically. An influential
paper on the capacity of static ad-hoc networks impelled further theoretical
studies, some of them analyzing the use of relaying peers to improve the capacity in ad-hoc networks. Specifically, Gupta and Kumar [170] proved that
the average available throughput per node is inversely proportional to the
square root of the number of nodes in a static ad-hoc network. Equivalently,
the total network capacity increases at most as the square root of the number
of nodes. Extending these results, Grossglauser and Tse [167] showed that
the capacity of an ad-hoc wireless multi-hop network can be enhanced by
exploiting forwarding, in which a sender may forward its message further to
a mobile relay node. They evaluated the average per-session throughput and
its asymptotic performance in such multi-hop ad-hoc networks and showed
that the average throughput per source-destination pair of nodes can be kept
constant by increasing the density of nodes. However, the delay of a packet
may also increase substantially. To provide guarantees on delay, Bansal and
Liu proposed a routing algorithm that exploits the mobility patterns of devices, achieving a throughput that is only a poly-logarithmic factor from the
optimal [78].
Mobile peer-to-peer systems and routing protocols have been evaluated
mostly via simulations. Analyzing their performance under more realistic conditions with respect to user access (e.g., co-residency times of peers, intercontact times, arrivals in the range of information servers), traffic patterns
link conditions, and network topology can reveal important aspects of their
performance. The need for such models motivated the research presented in
the following chapters.
4
Empirically-based measurements on wireless
demand
This chapter presents our measurements of caching, access, and traffic demand in wireless networks. It describes the wireless infrastructure and data
acquisition methodology. Then, it provides an overview of the traffic demand
at APs in two major campus-wide wireless infrastructures and presents an
application-based characterization of traffic. It explores the spatial and temporal locality phenomena of the wireless information access and evaluates
the impact of caching paradigms. Finally, it discusses related work and main
conclusions.
4.1 Introduction
ieee802.11 networks have been rapidly deployed to provide wireless Internet access, especially in universities, corporations, and metropolitan areas.
Empirical and performance analysis studies indicate dramatically low performance of real-time constrained applications over wireless LANs (such as [62]
on VoIP), and large handoff delays [310, 264]. Moreover, mobile users still experience frequent loss of connectivity and high end-to-end delays when they
access the wireless Internet. For example, the overhead of scanning for nearby
APs is routinely over 250 ms, far longer than what can be tolerated by highly
interactive applications such as voice telephony.
Wireless LANs have more vulnerabilities, bandwidth, and latency constrains than their wired counterparts. It is critical to understand the performance and workload of wireless networks and develop wireless networks that
are more robust, easier to manage and scale, and more able to efficiently utilize their scarce resources. While in several cases over-provisioning in wired
networks is acceptable, it can become problematic in the wireless domain. A
number of mechanisms, such as capacity planning, resource reservation, device adaptation, and load balancing, need to be employed to support such
networks. Real-life measurement studies can be particularly beneficial in the
development and analysis of such mechanisms, as they can uncover deficiencies
78
4 Empirically-based measurements on wireless demand
of the wireless technology and different phenomena of the wireless access and
the workload. The existence of testbeds, tools, and benchmarks is of tremendous importance. Rich sets of data can impel modeling efforts to produce
more realistic models, and thus, enable more meaningful performance analysis studies. Recently there have been several empirical measurement studies
on the following issues:
•
•
•
•
•
traffic load [338, 75, 74, 233, 177, 261, 282]
user access [83, 74, 288, 340, 108, 223, 201, 203]
handoff [310, 264]
delay and packet losses in the mac [134] and TCP connections [180]
link quality and routing [122, 87, 57]
Measurements on ieee802.11-based mesh networks have also received a lot of
attention [57, 85, 87, 308, 257, 45, 43, 70, 129, 130, 45].
This chapter focuses on the wireless demand in large-scale wireless infrastructures and presents an exploratory analysis of the amount and type of
demand. Section 4.2 presents the wireless infrastructure and Section 4.3 describes the data acquisition methodology. The main terminology that will be
used in the empirical measurement studies included in this work is defined in
Section 4.4. Section 4.5 provides an overview of the workload of APs of two
major campus-wide wireless networks, in terms of the number of bytes sent
and received, number of packets sent and received, and number of associations and roaming operations. An application-based characterization of the
wireless demand is discussed in Section 4.6. Section 4.7 investigates the locality of the web urls accessed from the wireless infrastructure and evaluates
several caching paradigms. Finally, Section 4.8 discusses related research and
summarizes our main conclusions and future work plans.
4.2 Campus-wide wireless infrastructure
The UNC began the deployment of its wireless infrastructure in 1999, providing coverage for nearly every building in the 729-acre campus, encompassing a diverse academic environment, which includes university departments, programs, administration, activities, and residential buildings. In
these buildings, there are 26,000 students, 3,000 faculty members, and 9,000
staff/administrative personnel [5]. Of the 26,000 students, 61% are undergraduates, and more than 75% own a wireless laptop.
Most of the APs belong to three different series of the Cisco Aironet platform: the state-of-the-art 1200 Series, the widely-deployed 350 Series, and a
few older 340 Series. The 1200s and 350s run Cisco IOS, while the 340s run VxWorks. Since each AP has a unique ip address, we used an AP’s ip address to
determine its unique AP ID number. Each AP has a coverage area determined
by the radio propagation properties around the AP. Each ieee802.11-enabled
device that communicates with the campus wireless network is called a client,
4.3 Monitoring and data acquisition
79
is assumed to have a unique mac address, and is assigned a positive unique
ID number based on its mac address. A client communicates via the network
by associating itself with an AP; in this case we say that such a client visits
the AP.
The wireless infrastructure has expanded substantially during the last few
years. Table 4.1 shows the evolution of the wireless infrastructure and the
significant increase of APs and wireless clients.
Tracing period
February 10 - April 27, 2003
17-24, October 2004
2-9, March 2005
13-20, April 2005
September 29 - November, 2005
Clients
7,694
8,880
9,049
9,881
14,712
APs
232
459
532
574
574
Table 4.1. The evolution of the wireless infrastructure at UNC in terms of number
of APs and wireless clients.
4.3 Monitoring and data acquisition
We monitored the wireless infrastructure at UNC and collected extensive wireless traces, such as packet headers, syslog, simple network management
protocol (snmp), tcp flow, and signal strength-based data.
Monitoring large-scale wireless networks comes with several challenges.
Often monitoring tools are limited in their capabilities because they cannot capture all relevant information due to either hardware limitations, the
proprietary nature of hardware and software, or hidden terminals. Furthermore, the implementation of many protocol features of ieee802.11—such as
the rate adaptation and transmission power control—are vendor-specific and
their details are not publicly available. At the same time, wireless measurements feature high complexity due to transient phenomena, missing values,
and spatio-temporal dependencies. Transient phenomena are due to roaming
and radio propagation issues, while failures of the monitoring devices and
APs, lost udp packets of syslog events or other measurement messages result in missing values in data traces. It is a non-trivial task to monitor areas
of intermittent connectivity and select the physical and network position of
monitors in large-scale infrastructures. The type of phenomena that needs to
be studied determines the amount of traffic at multiple locations that needs
to be captured, its resolution, and the correlation among multiple sources of
data that needs to be performed.
The next paragraphs describe the traces and the main terminology used
in the measurement-based studies of this work.
80
4 Empirically-based measurements on wireless demand
Infrastructure of the University
of North Carolina at Chapel Hill
Internet
Wired Client
Ethernet Switch
Fiber Split
`
Campus Router
V
|
V
|
Wireless Client
WAN Router
Access Point
V
|
DAG-Based Packet Monitor
Fig. 4.1. The campus-wide wireless infrastructure and packet monitor tool.
4.3.1 Packet header traces
The bulk of the campus wireless network has a single aggregation point that
connects to a gateway router. This router provides connectivity between the
wireless network and the wired links, including all of the campus computing
infrastructure and the Internet. Packet header traces were collected with a
high-precision DAG-based monitoring card (Endace 4.3GE). The card was
installed in a high-end FreeBSD server and captured all packets traversing
the link between UNC and the Internet in both directions (Figure 4.1). In
general, monitoring high-speed links with a software-only system may result
in inaccuracies. Specifically, the traffic has to be forwarded from the network
interface to the monitoring software, using the system bus which may not
be fast enough, especially when the monitored link is under heavy traffic
load conditions. As a result, the monitor will not record dropped packets. In
addition, the buffering that is involved across the different layers—the network
interface to the operating system—may result in inaccurate timestamps. DAG
is a specialized hardware—has been widely used in network measurement
projects—that overcomes these problems. The accuracy of DAG traces can
be on the order of nanoseconds [178].
The monitoring period was 178.2 hours in 2005 and 192 hours in 2006,
yielding 175GB and 365GB of packet headers, respectively. The sharp increase in the trace size indicates the significant growth of the wireless demand
between these periods.
4.3 Monitoring and data acquisition
81
4.3.2 http traces
The http traces were based on packet headers collected from the FreeBSD
monitoring system described in the previous section. The tracing tool tcpdump was employed to collect all tcp packets with payloads that begin with
the ascii string “get” followed by a space. The full frame was collected
as a potential http request. We did not restrict our collection to the standard http port, allowing us to record http requests sent to servers on nonstandard ports, which include many common peer-to-peer file-sharing applications. The packet trace was then processed to extract the http get requests
contained therein.
From each packet, the following information was recorded:
•
•
•
•
time of the packet’s receipt with one-second resolution
hostname specified in the request’s Host header
Request-URI
hardware mac address of the ieee802.11 client
If all of these items were not available in a packet, that packet was not included
in the recorded requests. Using these criteria, 8,358,048 requests for 2,437,736
unique urls were traced and included in the analysis. By recording the traffic
before it had passed through an ip router, we were able to capture the original
mac header—as generated by the ieee802.11 clients—for transmission to the
gateway router.
The http traces were collected during the tracing period between February 26 and March 24, 2003. During that period, the campus used primarily
Cisco Aironet 350 802.11 APs, although some areas of the campus were serviced by older APs from other manufacturers. As the syslog traces indicated,
the infrastructure was accessed by 7,694 distinct wireless clients and 37% of
them made one or more http requests during that period.
4.3.3 snmp traces
snmp is one of the most widely available monitoring services. Every AP on
the market supports monitoring using snmp, so it is important to understand
how much operators and researchers can learn from snmp data.
For the comparative study of the workload of wireless campus-wide networks, snmp data was acquired using a non-blocking snmp library for polling
every AP precisely every five minutes in an independent manner. This eliminated any extra delays due to the slow processing of snmp polls by some of
the slower APs.
4.3.4 syslog traces
The majority of APs on campus were configured to send trace data to a
syslog server in our department. There are seven types of events that trigger
82
4 Empirically-based measurements on wireless demand
an AP to transmit a syslog message. These messages and their corresponding
events are interpreted as follows:
Authenticated: A card must authenticate itself before using the network. Since a card still has to associate with an AP before sending and receiving data, we ignored any authenticated messages.
Associated: After it authenticates itself, a card associates with an AP.
Any data transmitted to and from the network is transmitted by that AP.
Reassociated: A card may reassociate itself with a new AP (usually due
to higher signal strength) or the current AP. After a reassociation with an
AP, any data transmitted to and from the network is transmitted by that AP.
Roamed: After a reassociation occurs, the old AP and sometimes the AP
with which the card has just reassociated send a roamed message. Since we
still receive the reassociated message, we can ignore this message as well.
Reset: When a card’s connection is reset, a reset message is sent.
Dissasociated: When a card wishes to disconnect from the AP, it disassociates itself. We ignore any disassociated messages from a card if the previous
message for that card was a disassociated or deauthenticated message.
Deauthenticated: When a card is no longer part of the network, a
deauthenticated message is sent. It is not unusual to see repeated deauthenticated messages for the same card, with no other type of events for that card
in between. We ignore any deauthenticated messages for a card if the previous message for that card was a disassociated or a deauthenticated message.
A disconnection message describes either a disassociated or deauthenticated
message.
4.3.5 Privacy assurances
To avoid disclosure of the identity of individual users and of the sites that a
user has been visiting, we stored and used sha1 hashes of the client’s mac
address, request hostname, and requested path of the http requests.
4.3.6 Client identification
The mac address uniquely identifies an ieee802.11-enabled device and is assumed to be coupled to a specific computer.
4.4 State, history, visits and sessions
Using the associated, reassociated, deauthenticated, and disassociated syslog events, the following structures were defined to characterize the access
pattern. Note that we assumed that each event occurs at the time of the
4.4 State, history, visits and sessions
83
timestamp in the corresponding syslog entry.1
State: A state represents the AP with which a client is currently associated.
When a client is connected to the network, its state is the numeric ID of the
AP with which it is currently associated (via an association or a reassociation).
When the client is disconnected from the network, its state is defined to be
“0”. Since we do not know where the clients are before the trace begins, each
client is considered to be in state 0 at the beginning of the trace. We now
define these structures:
State history: The state history of a client is the ordered sequence of states
that the client has visited.
Reconnection threshold: Sometimes a client will disassociate or deauthenticate for a single second and then associate or reassociate. We found that a user
was disconnected 71,988 times for one second or less and 104,763 times for
30 seconds or less (and reconnected after that). Whenever a client is disconnected for one second or less, we do not consider the client to have disconnected
from or left the network, but instead to be in the middle of a reconnection
process. We decided to use one second because it accounted for such a large
percentage of all the times such short periods of disconnection occurred. We
believe that this represents more accurately the user’s intentions. These rules
left us with 2,474,394 useful syslog events for 6,186 clients (Table 4.2) and
allowed us to define the following terms:
Visit: A client begins a visit to an AP when a (re)association message is
received from that AP for that client and ends that visit when any message
from any AP is received for that client. The difference in the timestamp of
these two messages defines the duration of the visit. Each visit is “associated”
with a state.
Session: A session is a sequence of visits to APs and used to capture an
episode of a continuous wireless access to the infrastructure. A session begins
when a currently disconnected client receives a (re)association message and
ends when the next disconnection message is received. The difference in the
timestamps between the disconnection message and the first (re)association
message defines the duration of the session. A session can be mobile, roaming,
or stationary.
Inter-AP transition: If a client is currently associated to an AP, an interAP transition is defined as a (re)association to a different AP. The two APs
may or may not be in the same building.
1
The exception is that if a client is deauthenticated due to an inactivity period of
thirty minutes (or more), the disconnection was considered to have occurred thirty
minutes before the timestamp that appears in the corresponding deauthenticated
syslog entry. The inactivity period of thirty minutes is a default value for most
of the clients in our infrastructure.
84
4 Empirically-based measurements on wireless demand
Event type
Total syslog
Useful syslog
Events Clients
8,158,341 7,694
2,474,394 6,186
APs Buildings
222 79
222 79
Table 4.2. Summary of syslog statistics.
Inter-building transition: If a client is currently associated to an AP at a
certain building, an inter-building transition is defined as a (re)association to
an AP located in a different building.
Roaming session: A roaming session is a sequence of consecutive visits to
two or more distinct APs.
Mobile session: A mobile session is a special type of roaming session that
comprises a sequence of consecutive visits to two or more APs located in
different buildings.
Roaming (mobile) client: A client with a roaming (mobile) session is called
a roaming (mobile) client.
Drop-in client: A drop-in client is a card that visits two or more buildings
in the period of time in question. Drop-in clients may have disconnections in
between the visits to these buildings.
4.5 Wireless traffic demand at APs
Measurement studies indicate that several hotspot APs in campus-wide environments exhibit diurnal and weekly periodicities in their traffic load [177,
75, 287, 282]. This section examines the amount of traffic of APs, bytes and
packets, and number of association and roaming operations. It also presents
a comparative system-wide analysis that provides a useful view of the entire
utilization of two large-scale wireless networks from the perspective of APs.
4.5.1 Data acquisition
snmp data from the wireless infrastructures of UNC and Dartmouth was collected. The UNC dataset was collected between 9:09 AM, September 29th,
2004 and 12 AM, November 25th, 2004. The Dartmouth trace corresponds to
the dataset studied in [177] and was acquired using a similar approach. It was
collected between November 1st, 2003, and February 28, 2004, thus the duration of this trace is twice the duration of the UNC one. This trace includes
6,875 unique mac addresses which were associated with one or more APs
during the data collection period. This number is larger for the UNC trace,
which reports on the activity of 14,712 unique mac address. Thus, while the
number of APs in both campus networks is similar, there are twice as many
wireless clients in the UNC trace.
4.5 Wireless traffic demand at APs
(a) Dartmouth
(b) UNC
Fig. 4.2. Total wireless traffic sent and received per AP (by building type).
85
86
4 Empirically-based measurements on wireless demand
(a) UNC
(b) Dartmouth
Fig. 4.3. Ratio of sent to received traffic compared to received by AP.
4.5 Wireless traffic demand at APs
87
4.5.2 Comparative analysis of wireless traffic load at APs
A surprising degree of similarity in the characteristics of the UNC and Dartmouth wireless demand was found. Our results therefore provide strong evidence in support of the development of parsimonious workload models of
campus wireless networks.
Fig. 4.4. Total amount of traffic transferred at UNC and Dartmouth in each direction.
Specifically, our analysis reveals the following:
•
•
•
•
There is a wide range of workloads and that log normality is prevalent in
both the UNC and Dartmouth traces.
In general, the traffic load in both wireless infrastructures is light, although
there are long tails (Figures 4.4 and 4.6).
No clear dependency with the type of building at which the AP is located exists, although some stochastic ordering is present in the tail of the
distributions.
An interesting dichotomy among APs is prominent in both of the infrastructures: APs dominated by uploaders and APs dominated by downloaders (Figure 4.5). Specifically, we observed that as the total wireless traffic
received at an AP increases, there is also an increase in its total traffic sent (Figure 4.2) and, a simultaneous decrease in the sent-to-received
ratio (Figure 4.3).
88
•
•
•
•
4 Empirically-based measurements on wireless demand
The number of non-unicast wireless packets is substantial. Furthermore,
the number of unicast received packets is strongly correlated in the log-log
scale with the number of unicast sent packets (Figure 4.7 (a)).
While the majority of APs send and receive packets of relatively small size,
a significant number of APs show rather asymmetric packet sizes, i.e., APs
with large sent and small receive packets, and APs with small sent and
large receive packets (Figure 4.8).
The distribution of the associations and roaming operations was found to
be quite heavy-tailed.
There is a correlation between the traffic load and number of associations
in the log-log scale (Figure 4.9).
4.6 Application-based characterization of wireless
demand
As the wireless user population increases, characterization of its workload
can facilitate more efficient network management and better utilization of
users’ scarce resources. While there have been several studies looking at the
application cross-section in wired networks [333, 341, 169, 109], such attempts
are limited in the case of wireless networks [177].
Using the port number to classify flows may lead to significant amounts
of misclassified traffic due to dynamic port usage, overlapping port ranges,
and traffic masquerading. Often, peer-to-peer and streaming applications use
dynamic ports to communicate, and even worse, the port ranges of different
applications may overlap. Furthermore, several malware or peer-to-peer applications may try to masquerade their traffic under well-known “non-suspicious”
ports, such as port 80. Besides the well-documented limitations of application
identification [216, 268, 215] inherent additional complications in wireless networks, such as the increasing overheads of data collection due to the need of
multiple monitoring points, cross-correlation of different type of traces, and
transient phenomena due to the radio propagation and mobility, have led the
community to assume that the expected workload of wireless networks follows
the general trends of Internet applications.
To avoid this “known-port limitation” [268, 215], we employed the blinc
tool [216] which performs classification of flows into applications based on the
transport-layer footprint of the various application types.
For the application-based classification study, we processed packet-header
traces collected at one of the access routers at UNC and client-based snmp
data from all APs. The snmp data was used to associate each flow with the
corresponding mac address and AP information. Approximately 9,125 distinct
internal ips which were mapped to approximately 3,241 unique mac addresses
were observed in the traces. blinc was able to classify 86% of our flows into
application types. Some cases of misclassifications were due to outlying user
4.6 Application-based characterization of wireless demand
89
(a) Dartmouth
(b) UNC
Fig. 4.5. Ratio of total traffic sent and received compared to total traffic sent per
AP.
90
4 Empirically-based measurements on wireless demand
Fig. 4.6. Total amount of traffic transferred during 5-minute interval.
4.6 Application-based characterization of wireless demand
(a) Unicast packets
(b) Mutlicast/Broadcast packets
Fig. 4.7. Total number of packets sent and received by an AP at UNC.
91
92
4 Empirically-based measurements on wireless demand
Fig. 4.8. Average size of packets sent and received by an AP at UNC in bytes.
behavior. Nearly 5% of the users were responsible for 98% of misclassified web
traffic and thus all these flows were excluded. Our main results are summarized
as follows;
•
•
•
•
•
•
•
The most popular applications are web browsing and peer-to-peer, accounting approximately for 81% of the total traffic. Most users are also
dominated by these two applications.
Network management and scanning activity are responsible for 17% of the
total flows.
While building-aggregated traffic application usage patterns appear similar, the application cross-section varies within APs of the same building.
Most wireless clients appear to use the wireless network for one specific
application that dominates their traffic share.
File transfer flows, such as ftp and peer-to-peer, are heavier in the wired
network than in the wireless one.
The traffic share across applications is significantly affected when clients
associate with new APs. This appears to be independent of the specific
application type.
There is a dichotomy among APs, in terms of their dominant application
type and downloading and uploading behavior.
4.6 Application-based characterization of wireless demand
93
Fig. 4.9. Total wireless traffic and number of client associations at Dartmouth.
94
4 Empirically-based measurements on wireless demand
As new wireless applications and services are deployed—reshaping the wireless
arena—it would be interesting to observe and analyze the evolution of the
wireless access in the spatial and temporal domain.
4.7 Locality of web objects
The peer-to-peer paradigm exploits the spatial locality of queries and information access. Chapter 3 showed via simulations that in settings with high
spatial locality of information and frequent disconnections from the Internet,
these peer-to-peer systems can enhance information access by reducing the
average delay in receiving the data. Empirical studies in wireless networks
have indicated that web and peer-to-peer applications are among the most
prominent type of access [300, 234, 85].
This section examines the spatio-temporal characteristics of the wireless
access through measurements. Does the information access in wireless production networks exhibit spatial locality? How effective can different caching
schemes be and what would be the impact of the peer-to-peer paradigm in
such networks? Although the web is not primarily a location-dependent or
collaborative application, its prevalence motivated us to start our analysis by
focusing on web requests in a large-scale wireless network.
Internet
Wired Network
Router
Switch
AP Cache
V
|
(1) request
V
|
V
|
(2) response
MH A
MH B
Access Point
V
|
Fig. 4.10. AP cache: Devices request and acquire data from the cache of their local
AP.
4.7 Locality of web objects
95
Internet
Wired Network
Router
Switch
V
|
Wireless LAN
V
|
(2) data
V
|
Access Point
MH A
(1) web request
V
|
MH B
Fig. 4.11. Peer-to-peer caching: Devices request and acquire data from the caches
of peers associated with the same AP.
Internet
Wired Network
Campus-wide Cache
Router
(3)
Switch
(2)
(4)
(5)
V
|
(1) request
V
|
V
|
(6) data
MH A
MH B
Wireless LAN
Access Point
V
|
Fig. 4.12. Campus-wide cache: Devices request and acquire data from a campuswide cache.
96
4 Empirically-based measurements on wireless demand
The temporal locality identifies the frequency and temporal aspects of repeated requests for certain information. The spatial locality focuses on the AP
and building in which a repeated request occurs and indicates if the repeated
request originated from a nearby client, a client within the same AP, or a
client in the same building.
Three main caching paradigms are explored: user cache, cache attached
to an AP or a building, and peer-to-peer caching. For the evaluation of these
paradigms, the following assumptions were made:
•
•
•
A user cache is considered to be the web browser cache.
A cache attached to an AP or a building will serve the wireless clients
associated to that AP or to APs of that building, respectively.
In peer-to-peer caching, clients associated with the same AP act as cooperative caches for each other.
Figures 4.10, 4.11, and 4.12 illustrate the AP cache, peer-to-peer caching, and
campus-wide cache paradigms, respectively.
Web requests may exhibit different locality characteristics and can be classified into the following categories:
•
•
•
•
•
same-client
same-AP
AP-coresident-client
same-building
campus-wide
This classification is hybrid in that it exhibits both temporal and spatial
characteristics. The following sections discuss these characteristics and present
the locality characteristics of a large campus-wide wireless infrastructure.
4.7.1 http requests model
Two requests are considered to be from the same client, if they were generated
by clients that have the same hashed mac address, and two requests are
considered to be for the same url, if they have the same hashed hostname
and request path.
A post-processing phase was performed that conceptually examined every
request in the http trace and identified the AP via which it was made using
the syslog trace [115].
4.7.2 Same-client repeated requests
A same-client repeated request occurs when a single client requests an object
that it has requested in the past. The cause could be any of the following:
Subsequent request: A client intentionally requests an object that it has
requested in the past but had not been satisfied by the browser cache subsequent request. Such a request would represent genuine ongoing interest by
that client.
4.7 Locality of web objects
97
Client reloads: A client reloads a page. This may occur when the page has
not been transmitted properly or to refresh content (e.g., live sports scores).
Automatic reloads: Many popular pages (such as headline-news and weather
sites) cause the browser to re-load the page periodically. While the page is displayed, the browser will periodically re-request it. Some of these requests could
also be considered indicative of continued interest by the client.
Packet retransmissions: If the first packet containing the request was not
known by the client to have reached its destination, tcp specifies that the
client retransmit the packet. Both requests are distinct requests. However,
such retransmissions are expected to be rare [255].
This study is subject to the effects of browser caching; if the requested object is in the browser’s cache, then no http request will be generated. Some,
but not all, browsers follow http’s specification for determining the freshness of a cached object. Also, we speculate that a percentage of the repeated
requests are conditional http get requests. This measure does not account
for the location of the client and therefore reveals temporal but not spatial
locality. The temporal locality of these requests was computed as follows: for
each request in the trace, we searched for previous references to the same url
made by that same client. If such request was found, we recorded the time
elapsed since this request occurred.
4.7.3 Same-AP repeated requests
When an object is requested multiple times within the same AP’s range, those
are called same-AP repeated requests. This measure does not account for the
client that makes the request; i.e., the repetition can occur due to a single
client or several clients requesting the same object within a single AP’s range.
4.7.4 AP-coresident-client repeated requests
A central question for motivating information sharing systems targeting mobile users is the following: How often are users who are interested in the same
things near one another? To answer this question, object and client-AP, as
well as, object and client-building correlations were examined. These spatial
locality properties of wireless web access can impact caching.
An AP-coresident client repeated request is said to occur when a client in
an AP’s area requests an object that has been requested at some time in the
past by another client who is in the same AP’s area at the time that the new
request is made. Note that the other client which requested the object in the
past may have requested the object while at a different location.
For each request in the trace, we searched backwards in time for previous
references to the same object made by a different client that is currently
associated with the same AP. If such a request was found, the time that has
elapsed since this request occurred was recorded. Figure 4.13 displays the
fraction of same-client, same-AP, and AP-coresident client repeated requests
98
4 Empirically-based measurements on wireless demand
1
0.1
0.01
0.001
0.0001
Fig. 4.13. Fraction of additional repeated requests within a one-hour interval. The
number of requests considered is at least 7.6 million. Over 2,800 clients are represented.
respectively, for an interval equal to one hour. More specifically, the fraction of
repeated requests at each minute is equal to the additional repeated requests
that occur in that minute of the first hour. For example, within the first minute
the fraction of repeated requests is at least 0.19 for the same-client, sameAP, and 0.01 for AP-coresident client. In the second minute, an additional
0.04 fraction of requests are same-client and same-AP repeated requests, and
the fraction of additional repeated requests is 0.006 for AP-coresident client
repeated requests.
Same-client and same-AP repeated requests exhibit some five-, ten-,
fifteen-, and thirty-minute periodicities. Furthermore, the fraction of repeated
requests for same-client and same-AP is similar and higher than that of APcoresident repeated requests. As many as 37% of all requests would be unnecessary if every object on the web had a cache lifetime of at least an hour.
This indicates the impact of the client’s web browser cache, assuming that all
browsers observe the http standard for caching.
The repeated requests follow a power law with exponential coefficients of
-1.31, -1.27, and -0.84 for same-client, same-AP, and AP-coresident client,
respectively. The coefficient of determination is at least 0.94 for all of them.
These coefficients indicate that the temporal locality is more apparent in the
same-client but not in the AP-coresident client caches.
4.7 Locality of web objects
99
The web requests exhibit a strong temporal locality highlighted by the
decreasing trend that becomes more prominent for larger time intervals. As
shown in Figure 4.14, within a day the percentage of repeated requests is
44% for same-client, 48% for same-AP repeated requests, and only 14% for
AP-coresident client repeated requests. On the second day, an additional fraction 0.02 of requests are same-client and same-AP repeated requests, and the
fraction of repeated requests is 0.02 for AP-coresident client repeated requests.
Our cache hierarchy consists of the client cache at the lower level of the
hierarchy, caches at APs, caches of co-resident peers, and a campus-wide cache.
Figure 4.15 focuses on the impact of each caching paradigm when there is a
miss in the other caches of the cache hierarchy. For example, it shows the
impact of the caches at APs for requests that cannot be served by the client
cache (“Same-AP ∩ ¬Same-client repeated requests”) as well as the impact of
the peer cache for requests that cannot be served by the client cache or the
cache of the local AP.
The hit ratio results are conservative, because they include compulsory
(cold start) misses. This effect is reduced by taking measurement traces over
26 days. On the other hand, we assumed infinite cache size and that shared
documents are cacheable, thus the following hit ratios are ideal hit ratios. A
cache at each AP would achieve an ideal hit ratio of 55% for the whole trace,
whereas a cache that serves the entire campus would achieve an ideal hit ratio
of 71%. There are APs with higher ideal hit ratios; for example, an AP in
an auditorium had an ideal hit ratio of 73% that corresponds to the 40,064
requests made by six distinct users. These ratios are ideal hit ratios, since
an infinite size of the cache and cacheability of all shared documents were
assumed.
We found that 8% of all requests refer to objects that have been requested
by a nearby client within the last hour. This proportion varies widely; at
some locations on the campus, 15% of all requests refer to such objects. Also,
a lower number of http requests and fraction of repeated requests are made
on weekends than on weekdays [255], and several repeated requests exhibit 24hour periodicity. Assuming that web objects remain in a client’s web browser
cache for the entire trace period, the AP-coresident-client cache would attain
an ideal hit ratio of 23%, which is less than the ideal hit ratio for same-client
and same-AP caches within two minutes.
4.7.5 Same-building and campus-wide repeated requests
Same-building repeated requests are all the requests for which, at sometime
in the past, there was another request for the same url by a client from an
AP in the same building where the first request was made. The percentage of
such repeated requests (i.e., hit ratio) varies from 15% to 75%.
We investigated how the number of http requests and client population
of a building may affect this hit ratio. For each building, the total number of
distinct clients that have sent at least one request from an AP in that building
100
4 Empirically-based measurements on wireless demand
Same-AP repeated requets
Same-client repeated requests
AP-coresident-client repeated requests
Fig. 4.14. Fraction of additional repeated requests within the entire trace. The
number of requests considered is at least 7.6 million. Over 2,800 clients are represented.
Same-client repeated requests
Same-AP ∩ ¬Same client repeated requests
AP-coresident-client ∩ ¬Same-client repeated requests
AP-coresident-client ∩ (¬Same-AP ∩ ¬Same-Client)
repeated requests
Fig. 4.15. Fraction of additional repeated requests within the entire trace. Impact
of each caching paradigm on requests that had a miss on other levels of the cache
hierarchy.
4.8 Discussion
101
represents its client population and the total number of requests sent from an
AP in that building the request demand. The client population varies from
one to 1,172 clients, and the request demand ranges from five to 1,929,399
requests. The buildings were sorted in decreasing order with respect to their
client population and request demand. In both cases, there is a trend of declining hit ratio. However, the hit ratios across the buildings exhibit high
variance and we cannot draw any strong conclusion. It is part of future work
to investigate possible correlations of the hit ratios with the building, session,
and application type.
4.8 Discussion
The estimation of the wireless workload of an AP has been the epicenter
of several measurement-based studies in wireless networks [338, 75, 74, 177,
261, 224, 288, 289, 282, 181, 180]. Most of them present high-level, usually
aggregate statistics of the traffic load of APs in campus- or conference-wide
networks, or small-scale controlled environments [338, 75, 74, 233, 177, 261,
224, 288, 289, 282, 181, 180]. Temporal and spatial variations in the traffic demand across APs have been reported in several measurement studies
on various wireless infrastructures, such as campus WLANs [233, 177, 181],
enterprise WLANs [75], and conference hotspots [74, 204]. For instance, in
[233], Kotz and Essien characterized Dartmouth’s wireless network, examining aggregate traffic and AP utilization. Extending this work, Kotz et al.
[177] studied the evolution of the wireless network at Dartmouth College using syslog, snmp, and tcpdump traces. They reported the average number
of active cards per active AP per day (2-3 in 2001, and 6-7 in 2003/2004) and
average daily traffic per AP by category (2-3 times higher in 2003/2004; two
or three times more inbound than outbound traffic). Jarosh et al. examined
issues of congested ieee802.11b APs in an IETF meeting and made several
interesting observations [204]. Measurement-based studies have indicated that
several hotspot APs in campus-wide environments exhibit diurnal and weekly
periodicities in their traffic load [177, 75, 287, 282].
The application-based characterization of traffic has triggered several research efforts, most of them employing port-based criteria [338, 74, 177]. However, as shown in [268, 215], the majority of emerging applications use random
port numbers complicating further the classification problem. Tutschku [341]
examined the difference of the uploading from the downloading traffic of a
popular peer-to-peer application in a wired network and reported a significant
amount of uploaded peer-to-peer traffic. Such asymmetries appeared also in
BitTorrent traffic and were highly affected by high-speed downloading [169].
A characterization of online games in terms of user sessions and periodicities
of the workload can be found in [109].
Web browsing and peer-to-peer applications dominate the traffic mix in
campus-wide wireless infrastructures, accounting approximately for 81% of the
102
4 Empirically-based measurements on wireless demand
total traffic at UNC. These applications also dominate the traffic mix of most
clients. Network management and scanning activity are responsible for 17%
of the total flows in our trace. While building-aggregated traffic application
usage patterns appear similar, the application cross-section varies within APs
of the same building.
Our analysis of the workload of APs in large campus-wide networks revealed interesting structure due to heavy uploading behavior, pervasive lognormality in the system-wide load, and surprisingly heavy distributions of
total client associations and roaming operations.
The temporal and spatial locality phenomena of wireless information access and the impact of caching in a large-scale wireless infrastructure were
examined in this measurement-driven study. Each client frequently requests
objects that it has requested within the past hour, and occasionally requests
objects that had been requested by other nearby users within the past hour.
The overall ideal hit ratios of user cache, cache attached to an AP, and peerto-peer caching (where peers are coresident within an AP) paradigms are 51%,
55%, and 23%, respectively. A cache at each AP would achieve an ideal hit
ratio of 55% for the entire trace. In general, same-AP caching is beneficial
for APs with high hit ratios; such APs were found in the UNC wireless infrastructure. On the other hand, a cache that serves the entire campus would
achieve an ideal hit ratio equal to 71%. For a similar user population size in
the wired infrastructure of a university campus, the UW study [356] reported
a 59% ideal hit ratio. As in the case of wired networks, the single-client locality is a primary factor in wireless data. Thus, there is an opportunity to
improve wireless access by more actively caching data in a user cache. Unlike
previous studies on wired networks, in which 25% to 40% of documents draw
70% of web access [91], our traces indicate that 13% of unique urls draw the
same number of web access. It would be interesting to examine the spatiotemporal phenomena per application type and wireless environment (such as
home, metropolitan, institute, conference, vehicle).
The peer-to-peer caching systems that motivated this study require the
objects to be cacheable. Stale objects should not be distributed, but many
popular objects on the web are not cacheable by the http standard [132]. It
appears that content providers use cacheability to force reloads of their pages
for reasons other than document freshness (such as the distribution of new
advertisements). Although this use of the cacheability mechanisms works well
enough in fully connected environments, it is a limiting factor for weaklyconnected systems, as the ones described here. Ideally, an object should be
cached only for its true useful lifetime, while content providers receive the
feedback they need.
One of our earlier studies focused on large-scale passive measurements
of the characteristics of tcp connections, in terms of their volumes, delays,
losses, and lack of termination [180]. Our main findings are summarized as
follows:
4.8 Discussion
•
•
•
103
The wireless network introduced substantially higher delay variability, but
its loss rates were only marginally above those observed for the wired LAN.
Unnecessary retransmissions are significantly more frequent for wireless
clients.
The number of connections for which the wireless client did not take any
action to terminate the connection is significant larger than the corresponding number of connections of wired clients. The number of interrupted connections are higher for the wireless LAN than for the wired
LAN.
Empirical and performance analysis studies indicate dramatically low performance of real-time constrained applications over wireless LANs (such as [62]
on VoIP), and large handoff delays [310, 264]. Moreover, mobile users still experience frequent loss of connectivity and high end-to-end delays when they
access the wireless Internet. For example, handoff between APs and across
subnets in wireless LANs can consume from one to multiple seconds, as associations and bindings at various layers need to be re-established. Unfortunately, such long delays cause disruptions in real-time and streaming applications, such as VoIP and video-on-demand. Examples of sources of delay include acquiring new ip addresses, with duplicate address detection, reestablishing security associations and discovering possible APs without scanning the whole frequency range. The probing operation in the handoff process
of the ieee802.11 mac is the primary contributor to the overall handoff latency
and can affect the quality of service for many applications [264]. As mentioned
earlier, the overhead of scanning for nearby APs is routinely over 250 ms, far
longer than what can be tolerated by highly interactive applications such as
voice telephony.
As popular applications and services from wired networks shift to the
wireless arena, new applications emerge, and the use of wireless-enabled devices evolves rapidly, it would be interesting to perform comparative analysis of traces collected from various networking environments. It is important
to understand which are the network performance characteristics that have
the most dominant impact on the performance of certain applications. Network benchmarks, such as jitter, latency, and packet loss, have been used to
quantify network performance. However, what is their impact on how a user
“perceives” the performance of its applications? Shifting our attention from
mac- and network-based metrics to application-based characteristics, we plan
to address the following issues:
•
•
•
distinguish the metrics that indicate “extreme” network conditions (i.e.,
conditions that degrade substantially the performance of applications)
quantify user satisfaction and application requirements with more formal
subjective and objective metrics/benchmarks
evaluate the impact of these extreme network conditions on various
application-based benchmarks
104
•
4 Empirically-based measurements on wireless demand
understand how user behaviour changes depending on the network topology and technology characteristics
Understanding not only the user demand but also the performance of applications is critical for improving their quality of service and designing effective
monitoring and adaptation mechanisms.
5
Modeling the wireless user demand
This chapter focuses on modeling the wireless user demand, and particularly,
the client associations and user traffic demand in a wireless campus-wide
network. It provides a multi-level modeling of the traffic demand and explores
the statistical properties of the flows and sessions.
5.1 Introduction
To support wireless networks with better than best-effort service, the deployment of mechanisms for efficient roaming, resource reservation, admission control, caching and prefetching can be essential. For the design and evaluation
of those mechanisms, traffic and mobility models in different spatio-temporal
scales are required. For example, it would be important to understand the
client association patterns and flow demand at different APs. How do clients
arrive in a wireless infrastructure? What is the duration of their continuous
wireless access and for how long do they stay connected? How do they roam
across APs? How do their association patterns differ with respect to device usage pattern, location, and mobility? Which abstractions can be used to model
the traffic demand? The above questions drive this research.
It is common practice for a preliminary evaluation of a technology to explore its behavior under well-understood conditions and simple models. Most
of the performance analysis studies on wireless network protocols and mechanisms employ traffic models to simulate saturation conditions (asymptotic behavior), e.g., [237, 346, 370, 113, 263, 292, 114, 222, 84, 190, 267, 58, 112, 219,
93]. Other studies simulate udp flows with fixed packet rate and a few source
and destination pairs (e.g., [271, 274, 117, 357, 368]). There are only very few
previous studies that employ stochastic packet rate models, such as the Uniform, Poisson, Pareto, Autoregressive (AR), and Markov (e.g., [229, 68, 206]).
Example of studies using tcp flows are described in [118, 155, 90, 68, 117],
while [210] presents a study which “replays” real-life traces.
106
5 Modeling the wireless user demand
Currently, most of the simulators use quite simplistic models, as mobility,
topology, access, and traffic models are rich sub-fields on their own, and until
recently data from large-scale wireless networks was not available. Typical
models used in simulation studies on wireless networks are the following:
•
•
•
•
•
constant bit-rate (CBR) models for traffic
uniform distribution of clients in an area
fixed or uniform distribution in the selection of sender and receiver pairs
fixed arrival of clients at APs
random-walk based mobility models
It is clear that in several cases the above models are unrealistic. For more
comprehensive performance analysis, it is necessary to use realistic and sophisticated models for the parameters of that technology. In general, models
should have the following properties:
•
•
•
•
•
•
accuracy
robustness
scalability
parsimony
reusability
“easy” interpretation
Rich sets of empirical traces, collected from large-scale wireless infrastructures, impel modeling efforts to produce more realistic models, and thus,
enable more meaningful performance analysis studies. We distinguish the following important dimensions in wireless network modeling, namely, user demand, mobility and access patterns, network topology, and channel conditions.
Depending on the environment, the device mobility could be group or individual, spontaneous or controlled, pedestrian or vehicular, known a priori or
dynamic. In general, network conditions can be characterized by link quality
criteria (e.g., packet losses, delays, signal-to-noise ratio), the spatio-temporal
distributions of traffic demand and application mix, and the distributions of
regions of weak connectivity or no signal (“deadspots”). Network topologies
can be described based on their connectivity and link characteristics, distribution and density of peers, degree of clustering, co-residency time, inter-contact
time, duration of disconnection from the Internet, and interaction patterns.
Highlighting the ability of empirical-based models to capture the characteristics of the user workload, and providing a flexible framework for using them
in performance analysis studies are the driving forces of this research.
Figure 5.1 illustrates an example of a wireless infrastructure and a client
(client B) roaming between APs before it gets disconnected. Client B first
associates with AP 1, it then associates with AP 2, and before getting disconnected from the Internet, it associates with AP 3. While the client is connected
to the Internet via the infrastructure, it produces flows by receiving and sending packets. An episode of continuous wireless connectivity via one or more
APs is called a session. The wireless life of a client is an alternation between
5.1 Introduction
107
Fig. 5.1. Key components of our traffic models are the client associations at APs,
sessions, and flows.
sessions and disconnections. Sessions and flows are key structures of the user
workload analysis, satisfying two objectives:
1. Sessions capture the interaction between the clients and the network infrastructure.
2. Flows are structures with the appropriate level of detail for traffic generation to analyze mechanisms, such as capacity planning, AP selection, and
admission control, that motivate this modeling study.
The inherent multi-level spatio-temporal nature of wlans is intriguing. Our modeling efforts focus on traffic and access demand in large-scale
ieee802.11 networks, aiming to provide a multi-level perspective in different
spatio-temporal scales. Table 5.1 shows examples of various spatial and temporal granularities. In fact, selecting the appropriate spatio-temporal scales for
modeling the characteristics of user workload is an open question that largely
depends on the particular mechanism that needs to be analyzed. For example,
in the context of capacity planning or admission control, the AP-level can be
problematic, since minor changes in the AP infrastructure may impact signif-
108
5 Modeling the wireless user demand
Spatial
Temporal
AP, client, infrastructure, clusters of APs, building
client associations, flow, packet, session, time intervals
Table 5.1. Examples of various spatio-temporal granularities in modeling traffic
and access demand.
icantly the workload distribution per AP. Higher levels of spatial aggregation,
such as buildings or building types appear to be more appropriate. Similarly,
our attention is shifted from the packet-level dynamics and fine time-scales
to flow-level modeling. Packet-level dynamics are tightly-dependent on the
user mobility, network topology, and channel conditions. Sessions and flows
allow us to model user-workload, considering it as a principal building block,
independently and complementary to other important dimensions (such as
network topology, channel and user mobility).
To evaluate the capability of models to capture the user demand dynamics, we employ various metrics. Although accuracy is an important modeling
objective, the scalability and tractability of a model are also critical. This
monograph will evaluate the scalability characteristics of the contributed models and addresses the tradeoffs between accuracy and scalability. The models
have been extensively validated for different time periods, different spatial and
temporal scales, and periods of different workload demand.
Section 5.2 discusses the client access patterns, while Section 5.3 focuses
on roaming and models the transitions of a client between APs. It presents
and evaluates algorithms that predict the next association of a client. Shifting from the client’s perspective to that of the AP, Section 5.4 describes a
novel methodology for modeling the arrival process of clients at an AP. Section 5.5 outlines the principal aspects of our modeling approach and presents
the proposed models. A flexible framework in which synthetic traces based on
various models of user workload can be generated is introduced in Section 5.6.
Section 5.7 discusses the scalability and reusability aspects of the user workload. The models are evaluated using statistics- and systems-based metrics
in Section 5.8. Section 5.9 describes an analysis of the wireless traffic load at
APs using Singular Spectrum Analysis and discusses structural properties of
these time-series. Finally, Section 5.10 discusses related research efforts and
Section 5.11 summarizes the main contributions of our modeling efforts.
5.2 Client access patterns
A client initially disconnected from the Internet may associate with an AP in
its wireless range. During its visit to that AP, this client may generate traffic
by sending and/or receiving packets. Later, the client may reassociate with
another AP and prolong its wireless Internet access, or disconnect from the
wireless infrastructure. A transition is marked by two consecutive connections
5.2 Client access patterns
109
to distinct APs. Various parameters can be used to characterize the mobility
or roaming activity of a client, such as
•
•
•
•
•
•
•
duration of sessions and visits
transitions between APs
number of inter-building transitions
duration of time spent and frequency of visits at a certain AP
duration of disconnection
predictability of the next AP associations
arrival process at an AP
Statistics
Mean
Median
Visits Inter-AP transitions Inter-building transitions
363
164
32
40
6
0
Table 5.2. Statistics indicating the degree of mobility of wireless clients at UNC
considering our trace.
To understand the client access patterns, we analyzed the syslog messages
collected at UNC.1 Wireless clients at UNC exhibit relatively low mobility.
On a day, there are 6.8% roaming, 3.7% drop-in, and 2% mobile clients on
average. As shown in Table 5.2, the mean of all clients has only 32 interbuilding transitions, while the corresponding median is 0. If the average client
visits an AP, this AP will be different than the one it is currently connected
to, for 48.3% of the time, and it will be in a different building for 13% of the
time. In the case of a visit to a different AP, the likelihood that this AP is in
a different building is equal to 20.2%.
The locality of the roaming behavior of a client can be also characterized based on the existence of an AP, where that client spends most of its
wireless time. To analyze the locality of roaming, the duration-based homeAP
of a client was defined to be the AP (if any) at which this client spends at
least a given percentage of its wireless access time. Similarly the number-ofvisits-based homeAP of a client is the AP (if any) that this client visits most
frequently. The threshold for the percentage of wireless access time and the
number of visits may vary from 25% to 90%. The duration-based definition
is more relaxed than the frequency-based one. More than 50% of the clients
spend more than 75% of their time at a single AP, whereas 30% of them visit
the same AP more than 75% of the time (as shown in Figure 5.2).
To characterize the roaming of a wireless client, we also defined the AP path
of a client to be the sequence of continuous inter-AP transitions of that client.
Similarly the building path of a client was defined as the sequence of continuous
1
These messages were generated from 232 APs between 12:00am on February 10th,
2003 and 11:59pm on April 27th, 2003.
110
5 Modeling the wireless user demand
Fig. 5.2. Fraction of clients that have a homeAP for different thresholds according
to the two definitions.
transitions between APs that are located in different buildings of that client.
A client may potentially have more than one AP and building paths. For
example, if a wireless client that was originally disconnected, connects to APs
1, 2, 1, 1, and 10, before disconnecting, its AP path is “1 2 1 10”. The length
of this AP path is three. If we assume that the AP 1 and AP 2 are placed
in the same building (e.g., building “A”), which is different from the one in
which AP 10 (building “B”) is located, then the corresponding building path
is “A B”. Figure 5.3 shows the mean and median for the maximum and mean
AP path, and building path length of all users.
Sessions are categorized according to client mobility and are classified as
stationary and mobile. Stationary sessions are composed of associations at
APs located in the same building. Mobile sessions can be further divided into
those with a transition between two buildings (“one-edge”) and all the others
with transitions to several pairs of buildings (“multiple-edge”). As the number
of edges increases, the mobility of a client is considered to increase.
5.2 Client access patterns
111
40.0
35.0
33.8
AP paths
Building paths
Path length
30.0
25.0
20.0
15.0
10.0
8.4
5.0
4.5
3.0
0.0
0.0
Mean of max
Median of max
0.9
Mean of mean
0.4 0.0
Median of mean
Fig. 5.3. Statistics for the path length of all active clients.
5.2.1 Session duration
A session reflects an episode of continuous wireless access of a client at an
infrastructure. During a session, the infrastructure needs to be capable of
supporting its clients, and thus, we were interested in understanding the session duration. We found that 56.4% of the sessions lasted less than 30 minutes,
68.9% less than one hour, and only 16.2% less than one minute. The vast majority of the stationary sessions last 1.5 hours or less while the medians of stationary, one-edge and multiple-edge session duration are 9, 18, and 34 minutes,
respectively. To compare the duration of different types of sessions, we employed the concept of stochastic order. A random variable X is stochastically
larger than another random variable Y if
•
•
P (X > t) ≥ P (Y > t) for every t, and
P (X > t) > P (Y > t) for some t.
To compare the duration of mobile and stationary sessions, the notion of
stochastic order between two distributions is used [105]. As becomes apparent from their complementary cumulative distribution function (CCDF), the
duration of mobile sessions is stochastically larger than the duration of stationary sessions. As the session mobility increases, the session duration also
increases stochastically.
The CCDF of the stationary session duration has two nearly linear regimes.
This led us to propose to model the stationary session duration using a Bi-
112
5 Modeling the wireless user demand
Pareto distribution.2 A BiPareto distribution’s CCDF is given by
x −α x/k + c α−β
k
1+c
, x ≥ k.
A BiPareto distribution has four parameters (α, β, c, k), that can be estimated
via maximum likelihood. The scale parameter k (k > 0) is the minimum value
of the BiPareto random variable. The CCDF initially decays as a power law
with exponent α > 0. Then, in the vicinity of the breakpoint kc (with c > 0),
the decay exponent gradually changes to β > 0. Notice that on log-log plots,
a CCDF of the form x−α would appear as a straight line with slope −α. A
BiPareto distribution with c = 0 corresponds to a Pareto with parameters
(β, k).
Fig. 5.4. Stationary session duration (empirical and model).
The BiPareto distribution was fitted to the stationary session duration,
and the parameters were estimated to be (0.05, 0.76, 867.64, 1) using the
maximum likelihood method. Figure 5.4 compares the empirical log-log CCDF
with the theoretical log-log CCDF of the fitted BiPareto distribution (the two
linear regions also indicated). The two CCDFs closely follow each other with a
coefficient of determination of 0.99. The major difference appears in the tails,
2
More details on the BiPareto distribution and its estimation method can be found
in [278].
5.2 Client access patterns
113
Fig. 5.5. Mobile session duration (empirical and model).
which only concerns 1% of the sessions. One possible explanation for this
discrepancy is due to censoring caused by our data collection period. Because
of this, we did not observe any stationary sessions that are longer than the
collection period. Otherwise, those long session durations may bring the tail
closer to the BiPareto tail.
Several other common parametric distributions were also examined, such
as the Lognormal, Weibull, and Gamma (see Appendix) but the BiPareto
gave the best fit. In fact, the fit became even better by aggregating the durations into minute resolution level, which could be fitted with a BiPareto with
parameters (0.34, 1.37, 258.94, 1).
The log-log CCDFs for mobile sessions also exhibit two linear regions up
to three hours (Figure 5.5). We truncated the mobile session durations at
three hours and modeled them using a truncated BiPareto distribution. The
truncation percentage is about 9% and the fitted parameters for the mobile
session durations are (0.02, 1.42, 1633.42, 1).
5.2.2 Transient sessions
To further characterize the access patterns, the distribution of the duration
of visits within a session was examined: are most of the sessions composed
114
5 Modeling the wireless user demand
Fig. 5.6. Fraction of transient sessions (i.e., all visits to a building in the building
path last less than w).
of relatively short visits (at APs)? Are the durations of visits in the same
session “statistically similar”? Does the first visit differ statistically from the
last? Based on the duration of visits at each building involved in a session,
a transient session is defined as a session that does not have any visits to a
building that last more than a certain number of minutes. Figure 5.6 illustrates
the distribution of transient sessions for different time periods varying from
one to thirty minutes. By increasing the time period, the fraction of transient
sessions also increases. However, for low thresholds (e.g., one or five minutes),
the mobile sessions tend to be less transient than the rest. More than 20%
of the clients have at least 90% of sessions in which all their visits last 30
minutes or less (as shown in Figure 5.7).
To further analyze how the session time is distributed among its visits, the
percentage of visits that have a duration close to the median visit duration of
that session was computed. Interestingly, 50% of the sessions have less than
10% of their visits in the 10% interval of the median duration of their session.
Mobile sessions tend to have a small percentage of long visits and a large
percentage of short visits. As a result, sessions with high mobility are less
transient, as it is less likely for all visits to fall below a certain threshold.
This indicates that all our results are consistent with each other. Figure 5.8
does not include the stationary sessions, since by definition, they have only
5.2 Client access patterns
115
Fig. 5.7. Fraction of clients that have a certain fraction of their sessions transient.
one visit and their similarity index is 100%. “Mobile Sessions” is a subset of
“All Sessions”, including only these sessions with visits to two or more APs
located at different buildings.
5.2.3 Revisits
Wireless users may revisit the APs of an infrastructure multiple times.
Caching data at an AP can mask delays that roaming users experience during
an association process. To get an insight into how long a user’s data (e.g.,
profile, cache) should be stored in an AP, we estimated how likely it is for
that client to revisit an AP within a certain time interval.
For each client, we used its state history with a timestamp indicating when
the client visited each state. Its probability of revisiting a state (i.e., revisit
probability of this client) in a given time interval is defined as the fraction
of times this client visits that state within a time period since its last visit
to that state and there is also at least one visit to any other non-zero state
between these two consecutive visits to that state. Furthermore, the revisit
probability for an AP is defined as the fraction of all visits which are revisits
at that AP, considering the entire client population.
116
5 Modeling the wireless user demand
Fig. 5.8. Percentage of visits in a session with duration within an interval of +/-10%
from the median visit of that session. The similarity index of a session was defined
as the percentage of visits that are within a certain interval of their median visit
duration, such as +/-10% from the median duration of the visits in that session.
The mean revisit probability for a one-hour interval is 20% for clients and
40% for APs. The revisit probability varies drastically among APs and clients,
varying between 0% and 95% among APs, and 0% and 99% among clients.
Figure 5.9 (a) shows the probability for each AP that a visit at that AP was
a revisit, while Figure 5.9 (b) shows the probability that a visit was a revisit
for each client. The median revisit probability was 40% and 6% for APs and
clients, respectively. Therefore, a cache with a lifetime of one hour at each AP
would be beneficial.
5.3 Roaming across APs
While a client roams in a wireless infrastructure, it may associate with multiple
APs. During an association process, a client requests access from an AP and
that AP accepts or declines the access. The association overhead in addition
to end-to-end delays could be prohibitive for several real-time applications
running on roaming clients. By profiling a client and predicting its next asso-
5.3 Roaming across APs
(a) For each AP
(b) For each client
Fig. 5.9. Revisit probability.
117
118
5 Modeling the wireless user demand
ciations, these delays could be masked. Specifically, a roaming controller could
keep track of a client and its application access. It could predict the next association of a client and prefetch data on its behalf. Furthermore, APs can use
the next-associations of clients and traffic demand to perform load balancing
and better utilization of their buffer and wireless bandwidth. The association
protocol could be also enhanced by advising a client to avoid extremely busy
APs. These issues motivate our next-AP prediction algorithms.
Based on a client’s state history, an algorithm that predicts the n-th state
of the client can be developed. Our prediction is based on a Markov-chain
model and uses the current state to predict the next one. For each client, a
first-order Markov-chain based on the client’s state history is generated. Each
state of the Markov chain corresponds to a state as defined in Section 4.4.
Let us denote the set of all the states as S. The transition probability from
state j to state k is the relative frequency of the sequence of states sj sk in the
client’s state history (sj , sk ∈ S). This corresponds to the (j, k) entry of the
transitional probability matrix P (1) . The prediction model can be extended by
using the previous as well as the current state to predict the next one.3 In this
prediction model, we computed the relative frequencies of si sj sk (si , sj , sk ∈
S). This corresponds to the (i, j, k) entry in the three-dimensional matrix
P (2) . We evaluated the following variations of such prediction algorithms using
different amounts of history:
One-state history: This model is the one-state history as discussed above.
The first n− 1 states are used to build the matrix P (1) . Given that the current
state is sj , we predict the next state to be the state:
arg max{P (1) (j, k), ∀sk ∈ S}.
sk
(5.1)
The error in making this single prediction of the next state n is εn = 1 −
P (1) (j, k).
One-state window: If the (n − 1)-th state occurs at time t, this model uses
the sequence of states that occur between t − 24 hours and t to build its
probability matrix. Then, it predicts the n-th state in the same way as the
one-state history model.
Max of one-state window and history: This model compares the probability
of the one-state window and history algorithms and reports the state that is
more probable.
Two-state history: This model is the two-state history model discussed
above. The first n − 1 states are used to build P (2) . Let the (n − 2)-th state
be si and the (n − 1)-th state be sj . The next state is predicted to be the
state sk such that k maximizes P (2) (i, j, k). The error in the single prediction
of the next state n is εn = 1 − P (2) (i, j, k).
3
There are some storage considerations. For example, a very mobile client can visit
half of the total number of APs. Storing a single client’s three-dimensional matrix
M for 128 APs, for a single day, requires about 8 MB of memory, while a four
dimensional matrix would require about 1 GB of memory.
5.3 Roaming across APs
119
Two-state window: If the (n − 1)-th state occurs at time t, this model uses
the sequence of states that occur between t − 24 hours and t. It then predicts
the n-th state in the same way as the two-state history model predicts the
next state.
Max of two-state window and history: This model compares the probability
of the two-state window and history algorithms and reports the state that is
more probable.
To evaluate the performance of these prediction algorithms, we employed
the following metrics:
•
•
the correct prediction percentage, which is the percentage of times that the
next state was predicted correctly
the prediction error after predicting n states, which is defined as the mean
of the error of all predictions made
The training set of each client consists of its first 25 syslog entries. The
mean correct prediction percentages for predicting state 8,012 were 81.36%,
82.16%, and 84.85% for the one-state history, one-state window, and max of
one-state window and history models, respectively. For the two-state history,
two-state window, and max of both two-state window and history model,
the mean correct prediction percentages were 83.68%, 83.19%, and 85.59%,
respectively. Figure 5.10 (a) illustrates the percentage of correct predictions
after each entry. The one-state history and the one-state window history model
have similar performance, while the two-state models perform slightly better
than their one-state counterparts. The one-state history, one-state window,
and max of one-state history and window had prediction errors of 0.26, 0.23,
and 0.21, respectively, and their two-state counterparts had prediction errors
of 0.23, 0.22, and 0.20, respectively. The standard deviations for the correct
prediction percentages are less than 0.19 for the one-state models and less
than 0.18 for the two-state models. The standard deviations for the prediction
error are less than 0.23 for the one-state models and less than 0.21 for the
two-state models. Note that by maintaining information about the last 2,000
entries, the max of two-state history and window achieves a correct prediction
percentage of at least 82.17%. This suggests that if storage space is a concern,
the model can be implemented in a slightly different manner that uses only
a certain number of entries, so that it is space efficient and still has a high
correct prediction percentage.
The top five percent of clients—in terms of total number of inter-building
transitions, which also have 8,012 events—have a correct prediction percentage of 79% for predicting state 8,012. Figure 5.10 (b) illustrates the correct
prediction percentage after 80,000 instead of 8,000 entries and Figure 5.11
shows the corresponding prediction errors. Figure 5.11 (a) focuses on the top
five percent of clients that exhibit the highest degree of mobility (considering
their number of inter-building transitions). The max of one-state history and
window algorithm performs reasonably well, is simple, and does not have high
memory requirements.
120
5 Modeling the wireless user demand
one-state history
one-state window
max of one-state history and window
two-state history
two-state window
max of two-state history and window
(a) All clients (up to 8,000 entries)
max of one-state history
max of one-state history and window for top 5% mobile clients
max of two-state history and window
max of two-state history and window for top 5% mobile clients
(b) Comparing all vs. only mobile clients (up to 80,000 entries)
Fig. 5.10. Next-state prediction algorithms. Correct prediction percentage after
each entry. Initially, there were over 3,900 clients. Only 31 clients participated for
the prediction of the last entry.
5.3 Roaming across APs
121
one-state history for top 5% mobile clients
one-state window for top 5% mobile clients
max of one-state history and window for top 5% mobile clients
(a) Only top 5% mobile users
max of one-state history and window
max of one-state history and window for top 5% mobile clients
max of two-state history and window
max of two-state history and window for top 5% mobile clients
(b) All clients
Fig. 5.11. Next-state prediction. Prediction error after each entry. Initially, there
were over 3,900 total clients and 300 mobile clients. Only two clients in total—one
mobile client—participated for the prediction of the last entry.
122
5 Modeling the wireless user demand
We also incorporated a time component into the sequence of states, as
described in [83]. This method produces additional states by polling for a
client’s state at regular time intervals and thereby creates a state history
based on both movement and time. For clients disconnected for long time
intervals, this polling process introduces long sequences of 0, resulting in an
overestimation of the performance of the predictor. Thus, this movement and
time model can be used to predict the next state only for connected clients.
The mean correct prediction percentage, using the max of two-state window
and history model was 87% at the last entry, for which there were more than
30 clients.
5.4 Arrivals of wireless clients at APs
The client associations can be studied either from the perspective of the client,
as transitions between APs, or from the perspective of an AP, as arrival processes. Section 5.3 presented a Markov-based model for the client transitions
to APs. Here, we describe a novel methodology for modeling the arrival process of clients at APs as a time-varying Poisson process with different arrival
rates. Poisson models have been already used to characterize the “arrivals” of
humans to the Internet, i.e., the times at which humans access the Internet
to preform a task conform to a memory-less process with an arrival rate that
can be constant over time intervals of many minutes to perhaps an hour.
A quantile plot with simulation envelope is used for testing the goodnessof-fit. Furthermore, we investigated the impact of the type of building (i.e.,
its functionality) in which the AP is located at the arrival rate and cluster
these visit arrival models based on the building type. In addition to each AP’s
unique ip address, we maintained information about the building the AP is
located in, its type, and its coordinates. The possible building types in our
study are the following:
•
•
•
•
•
•
•
•
•
•
•
academic
administrative
athletic
business
clinical
dining
library
residential and letter society
student stores
health affairs
playing fields, performing arts, and theater
This study focused on the visit arrivals at the sixteen hotspot APs of the
UNC wireless infrastructure. An AP is defined as hotspot when it belongs
in the intersection of two sets, namely the top 5% APs based on the total
5.4 Arrivals of wireless clients at APs
123
maximum traffic and the top 5% APs based on the maximum hourly traffic.
The distribution of the selected hotspot APs per building type is as follows:
academic (8 APs), library (3 APs), residential (3 APs), and theater (2 APs).4
5.4.1 Time-varying Poisson process
In this section, the focus shifts from a client’s perspective to an AP’s and
explores the wireless access by modeling the arrivals of clients at an AP.
Specifically, this section models the arrivals of clients at an AP as a timevarying Poisson process. For this purpose, let us first introduce the definition
of a time-varying Poisson process and then construct a test for such a process.
Let {Λ(t) : t ≥ 0} be a stochastic point process, which counts the number
of events (or arrivals) in the interval [0, t]. Sometimes, {Λ(t)} is referred to
as the arrival process of the events of interest. For example, let {Λ(t)} be the
arrival process of client visits at a particular AP. {Λ(t)} is a Poisson process
if it satisfies the following two properties:
1. The number of arrivals in disjoint intervals are independent
2. For some finite λ > 0, P (Λ(t) = j) = e−λt (λt)j /j!, j = 0, 1, . . .
Thus, for each t, Λ(t) is a Poisson random variable with mean λt, the
product of the arrival rate λ and the interval length t. Note that a Poisson
process is a renewal process, where the inter-arrival times are independent
exponential random variables [320]. It is well-known that such a process results
from the following behavior: there exist many potential, statistically identical
arrivals; there is a very small—yet non-negligible—probability for each of them
arriving at any given time; and arrivals happen independently of each other.
One closely related variation is a time-varying Poisson process, where the
arrival rate is a function of time t, say, λ(t). Such a process is the result of
time-varying probabilities of event arrivals, and it is completely characterized
by its arrival rate function. A smooth variation of λ(t) is usual in both theory
and practice in a wide variety of contexts, and seems reasonable for modeling
client visits to an AP.
Construction of a statistical test
We constructed a test for the null hypothesis that an arrival process is a timevarying Poisson process, with a slowly varying arrival rate. In statistics, a null
hypothesis is a hypothesis set up to be nullified or refuted in order to support
an alternative hypothesis. When it is used, the null hypothesis is presumed
4
In our analysis, the hotspot APs with unknown building type and location were
excluded. We had only limited information about the exact functionality of the
rooms in which the hotspots were located. For example, APs in academic buildings
could be found in classrooms, offices for advising students, lounges, and meeting
rooms. Hotspot APs in residential buildings were found in lounges and labs.
124
5 Modeling the wireless user demand
true until statistical evidence—in the form of a hypothesis test—indicates
otherwise [33].
To begin with, we broke up the interval of a day into relatively short blocks
of time. For convenience, blocks of equal length, L, were used, resulting in a
total of I blocks; however, this equality assumption can be relaxed. For the
analysis in Section 5.4.2, L was set to be six minutes.
Let Tij denote the jth ordered arrival time in the ith block, i = 1, . . . , I.
Thus Ti1 ≤ . . . ≤ TiJ(i) , where J(i) denotes the total number of arrivals in
the ith block. Define the variables Ti0 = 0 and, for j = 1, ..., J(i),
L − Tij
.
(5.2)
Rij = (J(i) + 1 − j) − log
L − Ti,j−1
Under the null hypothesis that the arrival rate is constant within each time
interval, the {Rij } will be independent standard exponential random variables
as is proved below.
Let Uij denote the jth (unordered) arrival time in the ith block. Then,
the assumption of a constant Poisson arrival rate within this block implies
that, conditioned on J(i), the unordered arrival times are independent and
L
, then, it foluniformly distributed between 0 and L. If we define Vij = L−U
ij
lows that Vij are independent standard exponential random variables. Indeed,
notice that Tij = Ui(j) , and thus
L
L
Vi(j) = log
= log
.
L − Ui(j)
L − Tij
5
Here, (j) indicates
the j−th order-statistic. Evidently, Rij = (J(i) + 1 −
j) Vi(j) − Vi(j−1) . Then, the exponentiality of Rij results from the following
lemma.
Lemma: Suppose that X1 , . . . , Xn are independent standard exponential
random variables, then Yi = (n − i + 1)[X(i) − X(i−1) ], i = 2, . . . , n, are also
independent standard exponential random variables. Any common test for
the exponential distribution can then be applied to Rij for testing the null
hypothesis. For convenience, the familiar Kolmogorov-Smirnov test[125] was
used. This nonparametric test is based on the maximum deviation between
the empirical CDF of the data and the hypothesized theoretical CDF. The
exponentiality can also be tested using a graphical tool, such as an exponential
quantile plot.
5.4.2 Arrival process of visits at wireless hotspots
As an illustration, this section now analyzes the arrival process of client visits
at the hotspot AP 222, which is located in an academic building. The analysis
has been also validated using APs in other building types.
5
For example, consider a sample of n numbers: U1 , . . . , Un . The term U(j) indicates
the j-th smallest one.
5.4 Arrivals of wireless clients at APs
125
Exploratory data analysis
In an exploratory data analysis, we employed the SIgnificant ZERo crossing
of the derivatives (SiZer) map [110], a powerful visualization method which
enables statistical inference for discovery of meaningful structures within the
data. It identifies important underlying structures, and not artifacts due to
sampling noise. SiZer is based on scale-space techniques used in computer
vision [253]. Scale-space is a family of locally linear smoothed data curves
indexed by the scale, which is the smoothing parameter or the bandwidth h
used for the local linear smoothing [145]. The bandwidth controls the level
of smoothing, and can be treated approximately as the window size used for
computing local averages in order to smooth the data. SiZer considers a wide
range of bandwidths to derive “smoothed versions of the underlying curve”
at various resolution levels. This approach then visualizes all the information
available in the data at each given scale. A detailed introduction to SiZer,
along with more examples and software, can be found in [110]. An illustration
of the use of SiZer to analyze flow arrival processes is available in [258].
This analysis indicated that the arrival rate appears to be time-varying
at all scales. At coarse scales (or large bandwidths), there is an overall daily
increasing trend; at medium scales, the rate function first decreases between
early morning and 14:00, and starts increasing until 18:00, before decreasing
again. More features appear at fine scales, suggesting that the arrival rate has
several ups and downs during this period.
In order to better examine these features, we focused on the hour between
17:30 and 18:30, which has the largest arrival rate and consists of 2143 visits. First, we calculated the inter-arrival times between every two consecutive
sorted visits. The visual inspection of the corresponding SiZer maps indicates
that the inter-arrival times may take only a finite number of discrete values. Furthermore, there is an artifact caused by the rounding of visit arrival
times to nearest whole seconds. To compensate for this rounding effect, we
“unrounded” the data by adding independent uniform (-0.5,0.5) noise to each
visit’s start time before calculating the inter-arrival times. The distribution
of the inter-arrival times is analyzed in Figures 5.12 and 5.13. Note that the
distribution of the inter-arrival times is exponential, if the arrival process is
Poisson.
An exponential quantile plot, which can be used as a graphical method
for assessing the goodness-of-fit of the exponential distribution on the interarrival times, shows that the exponential distribution does not fit well our
data (“Data” vs. “Theoretical” in Figure 5.12 (a)). The wider, dark-grey
curve (marked as “Data”) is the main quantile plot, which plots the actual
data quantiles (based on our traces) versus the corresponding theoretical exponential quantiles (brighter, thinner curve, marked as “Theoretical”). The
parameter for the theoretical distribution is estimated using maximum likelihood. When the theoretical distribution is a good fit, the “Data” curve should
closely follow the diagonal “Theoretical” line. To account for possible sampling
126
5 Modeling the wireless user demand
variability, an envelope of 100 overlaid curves is constructed. Each of these
curves is a similar quantile plot, where the corresponding data are simulated
from the theoretical exponential distribution. This envelope provides a simple
visual accounting for the sampling variability. When the theoretical exponential distribution fits the inter-arrival times well, the “Data” curve should lie
mostly within the envelope. The observed substantial departure of the curve
from the envelope in Figure 5.12 (a) strongly suggests that the inter-arrival
times are not exponentially distributed.
Figure 5.12 (b) shows a Weibull quantile plot for the inter-arrival times.
The two parameters of the Weibull distribution are fitted by matching the 90th
and 99th percentiles of the data and the theoretical distribution. The plot indicates that the inter-arrival times follow approximately a Weibull distribution.
In addition, the inter-arrival times in our data are not independent as shown
in the corresponding auto-correlation plot in [290]. All the auto-correlations
are significantly positive. The strong auto-correlation of the inter-arrival times
suggests that the visit arrival process can not be modeled as a renewal process
with independent Weibull inter-arrival times. A more appropriate model is to
combine Weibull inter-arrival times with a suitable dependence structure, as
proposed in [120]. Generating or simulating such a dependent process is much
more complicated, since one has to specify the dependence structure. We decided to model our data using a time-varying Poisson process as it has a nice
practical interpretation and is easier to simulate.
Time-varying Poisson process for visit arrivals
We applied the statistical test proposed in Section 5.4.1 and drew an exponential quantile plot to show that the arrival process of client visits at AP222
is a Poisson process with a time-varying arrival rate. The analysis was carried
out in detail for the process between 17:30 and 18:30 only. We broke the hour
into ten six-minute intervals, and calculated the Rij according to Eq. (5.2)
by setting L to be six minutes. The corresponding Kolmogorov-Smirnov test
statistic is 0.0188, and has a p-value of 0.15 with 2143 observations, which
means that the null hypothesis can not be rejected.
Figure 5.13 shows the exponential quantile plot for the Rij , which clearly
suggests that they are exponential. The maximum likelihood estimate for the
exponential parameter is 1.0024, which is very close to 1, and this agrees with
the claim that the Rij are standard exponential random variables. The corresponding auto-correlation plot suggests that the Rij s are approximately independent [290]. Thus, the null hypothesis that the visit arrival process within
an hour is a time-varying Poisson process is validated both mathematically
and graphically. There are well-developed methods for simulating time-varying
Poisson processes, such as the thinning method described in [245, 353]. Along
with models for visit durations, we can also generate synthetic traces.
An interesting question is whether or not the functionality of an area
affects the arrival rate of client visits at APs located in that area. In general,
5.4 Arrivals of wireless clients at APs
127
−3
x 10
5
4.5
Data
4
Data quantile
3.5
3
2.5
2
1.5
Simulation
Theoretical
1
0.5
0.5
1
1.5
2
Exponential quantile
2.5
3
3.5
−3
x 10
(a) Exponential quantile plot
−3
x 10
5
0.999 quantile
4.5
4
0.99 quantile
Data quantile
3.5
Data
3
Simulation
2.5
2
0.9 quantile
Theoretical
1.5
1
0.5
1
2
3
4
Weibull quantile
5
6
7
−3
x 10
(b) Weibull quantile plot
Fig. 5.12. Quantile plots for visit inter-arrival times between 17:30 and 18:30.
128
5 Modeling the wireless user demand
9
Simulation
8
Data quantile
7
6
5
Data
4
3
Theoretical
2
1
1
2
3
4
Exponential quantile
5
6
7
Fig. 5.13. Exponential quantile plot for Rij between 17:30 and 18:30 (AP 222).
we have limited information about the exact activities, schedules, and usage of
the areas around APs, some of them also used for diverse activities. However,
we do observe clusters of hotspot APs with similar arrival patterns according
to their functionality.
Three clusters of APs can be distinguished: the first cluster contains APs
placed in libraries, the second one in lounges of residential buildings, and the
third one in meeting rooms and lounges in theaters (recreational centers).
For each AP, we also calculated the 25th -percentile, median, and standard
deviation. We found a similarity among APs located in meeting rooms and
lounges of theater/recreational building types. APs placed in lounges or labs
in dorm/residential areas have similar patterns which differ from the ones in
residential buildings. Similarly, APs located in classrooms have similar patterns, which actually differ from the visit arrival pattern in a classroom, where
lectures for a middle school take place or the area with offices for advising students. We also observed a similarity in the arrival patterns at the three library
APs.
5.5 Methodology for modeling user demand
There is a consensus in the network community that traffic modeling should
not address elements that are dominated by too specific network-side char-
5.5 Methodology for modeling user demand
129
acteristics or conditions. Otherwise, simulations and experiments using the
respective models can never study changes in these conditions or new network
mechanisms that shape them. For example, in the context of WLANs, modeling the precise sequence of associations and disassociations inside sessions
is network-specific and non-deterministic, since small changes in the network
layout, physical environment, radio propagation, and network/client equipment can dramatically change the association/disassociation dynamics. A new
proposed algorithm for AP selection may also change association dynamics.
Therefore, the simulation model should not impose a priori a certain sequence
of associations and disassociations. This requirement is satisfied when sessions
are the subject of modeling. The simulated session may end up having completely different association dynamics, but the corresponding workload (i.e.,
traffic generated during a time period) is preserved.
As mentioned in Section 5.1, we distinguished four important dimensions
in modeling wireless networks, namely, user traffic workload, user mobility,
network topology, and link conditions. In a performance analysis study, one
can integrate the proposed user workload models (or corresponding traces)
with the appropriate user mobility, network topology, and channel condition
models. This approach enables us to superimpose models for the demand on
a given topology and focus on the right level of detail for each study.
5.5.1 Sessions and flows
In our approach, sessions represent the highest-level unit of wireless network
traffic load, including all the packets sent and received by the APs due to the
client’s communication with one or more Internet hosts. Working with flows,
such as tcp connections and udp conversations, is in line with the approach
taken in [278, 261, 294] and the principles of network-independent modeling
described in [295]. Network flows are well-separated collections of packets between a pair of Internet hosts, sharing the same transport-layer “5-tuple”.
Simulating the user demand consists of simulating the sessions and flows initiated inside them, while leaving packet-level and association dynamics to
underlying mechanisms that are independent of our model. User demand can
be simulated at both the client association and flow levels by using models
of the compound process of sessions and flows. As shown here, sessions have
a well-behaved arrival process, which can be accurately described using a
time-varying Poisson process. As previously discussed, the Poisson process is
a parsimonious model that has been used widely to model the arrival process
of events initiated by humans (e.g., in a telephone network or in the Internet).
The session arrival process provides the seeds of a cluster process, in which the
arrivals of sessions imply the arrivals of correlated sets of flows. The following
parameters are modeled:
•
•
session arrivals
number of flows within a session
130
•
•
5 Modeling the wireless user demand
flow inter-arrival times within a session
flow sizes
For each parameter to be modeled, the distribution that fits the best our
empirical traces is selected. Several distributions were considered, such as the
Pareto, Lognormal, Poisson, Exponential, BiPareto, and Generalized Extreme
value. Maximum likelihood was used for the parameter fitting while the evaluation of the goodness of the various distributions was performed using formal
and visual statistical analysis methods and tools, such as the quantile plots
with simulation envelopes. The following distributions model the user demand
well:
•
•
•
a Time-varying Poisson process models well the session arrivals at various APs in the infrastructure
the BiPareto models well the flow size and number of flows within a
session
the Lognormal is a great candidate for the flow inter-arrivals within a
session
The parameters of the distributions are based on empirical data that may correspond to different spatial and temporal scales. In increasing order of spatial
scale, empirical data may be data collected from a specific AP, groups of APs
located at the same building, groups of APs located at buildings of the same
building-type, or all the APs in the infrastructure. The default time scale
was the entire tracing period, but also finer time scales—such as daily and
hourly—were also explored. The aggregation in the spatial and temporal domain trades the accuracy of the models for higher scalability and tractability.
We first verified that the same distributions do persist across the aforementioned spatial and temporal scales. We then evaluated the tradeoffs between
accuracy and scalability.
5.5.2 Models of user demand
This section illustrates our modeling approach considering the most aggregate
spatio-temporal level, namely, the system-wide level. To fit the parameters of
the proposed models, it employs the entire trace collected from all APs of the
infrastructure (i.e., system-wide approach).
System-wide approach
Although session arrivals vary widely, some expected patterns are apparent.
Firstly, there is a clear diurnal periodicity, which is related to the substantial
decrease of the network activity during the nights. Secondly, the activity of
network clients decreases during the weekend. These temporal patterns appear
to be common throughout the AP population, although some APs are more
likely to be used at night than others.
5.5 Methodology for modeling user demand
131
5
Simulation
Data quantile
4
3
Theoretical
2
Data
1
σ = 0.9372
1
2
3
4
5
Exponential quantile
Sample Autocorrelation Function (ACF)
Sample Autocorrelation
1
0.5
0
−0.5
0
2
4
6
8
10
Lag
12
14
16
18
20
Fig. 5.14. The Rij s are independent and exponentially distributed. Only one hourly
block is shown here, but the results are consistent across the entire dataset.
The session arrival process is modeled as a time-varying Poisson process.
We tested the validity of our modeling assumption with the statistical test
described in Section 5.4.2. For the model to be valid, the variables Rij s, which
are defined in Eq. (5.2) as functions of the ordered session arrival times, must
be exponentially distributed with a mean equal to unity and uncorrelated.
The top part of Figure 5.14 shows an exponential quantile plot of the Rij s
during one randomly chosen hour.
We set the block length L = 0.1 hours in calculating the Rij s. The quantile
plot follows closely the diagonal line and remains well within the simulation
132
5 Modeling the wireless user demand
envelope. This suggests that the exponential fit is clearly appropriate. The
maximum likelihood estimate of the exponential parameter is 0.9372, which
is very close to unity, and agrees with the claim that the Rij s are standard
exponential. The bottom plot of the figure plots the autocorrelations of the
Rij s up to 20 lags. The sample autocorrelations are always within the confidence intervals, so the Rij s do not exhibit any significant correlations. Similar
results were obtained when repeating the same analysis for other one-hour
intervals of the 8-day dataset.
At the next modeling level, the arrival of a session triggers the arrival
of a group of flows, initiated between the client and one or more Internet
hosts. It is therefore natural to describe flow arrivals as a cluster process [278]
rather than a point process in which flows arrivals are described in isolation.
Since session arrival counts are (time-varying) Poisson distributed, flow arrivals form a cluster Poisson process. The flow-level traffic variables that need
to be modeled with this approach are the number of flows associated to each
session-cluster, and the inter-arrivals of flows within sessions. Our analysis
Fig. 5.15. CCDF for number of flows per session.
showed that the BiPareto distribution yields the best fit for the number of
5.5 Methodology for modeling user demand
133
flows per session. Figure 5.15 plots the complementary cumulative distribution
function of the fitted distribution against the empirical data in a logarithmic
scale. The circles are an equidistant set of samples from a BiPareto distribution with parameters α = 0.06, β = 1.72, c = 284.79 and k = 1. The empirical
distribution of the number of flows matches our model well for probabilities
between 0 and 0.995. The fit is worse at the tail due to sampling artifacts. In
any event, it is clear that the BiPareto model fits the empirical distribution
very well. We also studied how the distribution of the in-session number of
flows varies per day. The distributions are very similar, with the vast majority of the sessions having between 1 and 1000 flows. The distributions for
the weekends are slightly heavier. The number of flows per session goes as
far as 10,000 for 0.1% of the sessions. This striking consistency of the curves
strongly indicates that it is feasible to use parametric models for the traffic
variables [217].
10
8
Data
Log(data) quantile
6
4
2
Theoretical
0
−2
µ = −1.3674
−4
σ = 2.785
−6
−10
−5
0
5
10
Normal quantile
Fig. 5.16. Lognormal distribution for modeling flow interarrival.
The second component of our cluster model is the distribution of the flow
inter-arrivals within sessions. We showed that the Lognormal distribution provides the best fit, although the distribution is rather complex. The Lognormal
distribution appears to have similar shape to power law distributions. In a loglog plot of its CCDF, its behavior will appear to be nearly a straight line for
a large part of the body of the distribution, especially when the variance of
the corresponding normal distribution is large [266]. However, in contrast to
134
5 Modeling the wireless user demand
power law distribution under natural parameters, a Lognormal distribution
has finite mean and variance. The Lognormal quantile plot for the empirical
data is shown in Figure 5.16; the parameters are estimated to be µ = −1.3674
and σ = 2.785 using maximum likelihood. The quantile plot follows the diagonal line closely for all of the quantiles. The simulation envelope is very narrow
in this case, and shows that some deviations from the Lognormal model in
the upper part are significant. While more complex models, e.g., an ON/OFF
model, may provide a better approximation, our Lognormal fit certainly provides a reasonable description of the data using only two parameters.
We have also studied the stationarity of the flow inter-arrivals within sessions and found that the flow inter-arrivals during each day are very consistent
with each other [179]. To enable generation of traffic load in a manner suitable for experimentation, it is necessary to describe not only the flow arrival
process but also the flow sizes in terms of number of bytes they transfer. Our
statistical analysis reveals that flow sizes can be accurately described using
a BiPareto distribution with parameters α = 0.00, β = 0.91, c = 5.20 and
k = 179. Figure 5.17 plots the BiPareto fit to the empirical data. The fit
Fig. 5.17. Bipareto distribution formodeling flow size.
is excellent for most of the distribution with the BiPareto clearly capturing
the transition in the slope between the body and the heavy tail of the em-
5.6 Syntrig: a synthetic traffic generator
135
pirical distribution. The approximation appears heavier than the empirical
data at the end of the tail, which could motivate further refinements of the
fit. We have also examined the stationarity of the flow size distributions over
different days and found consistent tails considering all days in our tracing
period, suggesting that weekly periodicities are not critical for modeling the
flow sizes.
5.6 Syntrig: a synthetic traffic generator
Implement models / Generate traces
(a,b,c,k)
N
(μ,σ)
i
(a',b',c',k')
Session
time
time
0
k
time
time
time
Fig. 5.18. Syntrig is a synthetic traffic generator that can produce synthetic traffic based on our models for various spatio-temporal scales, application-mixes, and
workload.
Syntrig is a flexible synthetic traffic generator that obtains as input a
set of distributions for the session arrival, number of in-session flows, flow
interarrivals, and flow size. Based on these, it produces synthetic traces as
shown in Figure 5.18. Specifically, it first produces a time series for the session
arrival process, then samples the distribution of the number of flows to decide
about the number of flows of the given session. Next, for these flows it selects
the inter-arrivals (based on the corresponding distribution) to generate the
136
5 Modeling the wireless user demand
within-session flow arrival time series. Finally, it assigns a size to each flow
based on flow size distribution.
An emulation or simulation testbed (such as [368, 331]) can employ the
generated synthetic traces as its wireless user workload.6 Syntrig’s input is a
set of tunable parameters that are closely associated with the models of the
session arrival, flow size, number of flows within a session, and flow interarrival
times within a session. These parameters correspond to various conditions of
the traffic load, application mix, and session profiles. By tuning Syntrig’s
input, the produced synthetic traces “reflect” these conditions.
Each entry of the Syntrig output trace corresponds to a session and its
associated flows. Specifically, it provides the following information:
•
•
•
the session arrival timestamp
the AP at which the corresponding session started
the arrival of each in-session flow and its flow size
For fitting the parameters of the models, empirical traces—possibly at different spatio-temporal scales—are used. For example, for the generation of a
synthetic trace at the AP-level, the empirical trace collected from that AP
is used. Simililarly, a synthetic trace at the “system-wide” (i.e., “networkwide”) level is based on the empirical traced collected from all APs in the
infrastructure.
To produce the synthetic traces based on input models, the following steps
were carried out:
1. For each hour of the corresponding empirical trace, the session arrival rate
was estimated.
2. Synthetic session arrival times were produced at specific APs during that
hour using the sessions arrival model.
3. For each session, its number of flows, flow inter-arrivals, and flow size
values were generated based on the corresponding models.
4. Depending on the scale, the parameters of the input models were fitted
using the corresponding empirical traces.
By tuning the spatio-temporal scales, application mixes, session profiles and
rate, flow size, and flow interarrivals, Syntrig can produce various workload
types. Such different workload types are useful in the context of capacity
planning, admission control, and AP selection.
In general, Syntrig can be integrated with any type of model for the session
arrivals, flow-size, flow interarrivals, and number of flows (e.g., traces for the
models described in Table 5.3 were produced using Syntrig). To produce synthetic traces based on the proposed models, the input included a time-varying
Poisson session arrival model, a BiPareto distribution for the in-session number of flows and flowsizes, and a Lognormal for in-session interarrivals. Apart
6
Packet-level details are left to the underlying protocols and are beyond the scope
of this modeling effort (as explained earlier).
5.7 Scalability and reusability in user demand models
137
from the session arrival parameter that was always estimated based on the
hourly building-specific empirical data, the parameters of all the other models
were fitted using the empirical traces in the specified spatio-temporal scale.
Note that a simple transformation of the mean of the original Lognormal
distribution for the flow interarrivals can produce a desirable new distribution. The increase of the flow size has an insignificant impact on the per-flow
throughput. By multiplying the values of a BiPareto distribution by a factor,
the resulting distribution is also BiPareto, with a scale parameter equal to
the scale parameter of the previous one multiplied by a factor. Using this
transformation, the Syntrig can tune the number of flows and flow sizes.
5.7 Scalability and reusability in user demand models
Scalability and reusability—properties particularly desirable in modeling—
further complicate the modeling process. Previous modeling studies have either attempted to model traffic demand over hourly intervals at the level of
individual APs [261] or studied the problem at the system-level, deriving models for the aggregate network-wide traffic demand, as in Section 5.5.2 [179].
Clearly, both approaches have their strong and weak points. The second approach results in datasets that are amenable to statistical analysis and provides a concise summary of the traffic demand at the system-level. However, it
fails to capture the variation at a finer spatial detail that may be required for
the evaluation of network functions with an emphasis on the AP-level (e.g.,
load balancing). Despite these advantages when working at the AP-level, this
approach fails in other respects. For example, it does not scale for large wireless
infrastructures and the data does not always lend itself to statistical analysis.
Moreover, the modeling results are highly sensitive to the specific AP layout
of a particular network and the short-term variations of the radio propagation
conditions.
The above challenges have motivated us to address the scalability and
reusability tradeoffs. Our methodological choices attempt to strike a good
trade-off between the two extreme approaches in traffic modeling that were
outlined earlier:
•
•
AP-level modeling
infrastructure-wide modeling
To highlight the spatial dimension of the variation, we used buildings as
basic entities for traffic demand modeling. Major features of user activity—
such as traffic and roaming patterns—are studied at the building level, i.e.,
group of APs located in the same building.
The spatial and temporal scales may increase from AP “ap” to building
“bldg” to building type “bldgtype” to “network” or from day “day”
to the entire tracing period “trace”, respectively. Let us indicate with the
notation “A(B)” the scales of a modeling approach to be of spatial scale “A”
5 Modeling the wireless user demand
Bank Of America −BUSINESS
5
4.5
4
Session arrivals
3.5
3
2.5
2
1.5
1
0.5
0
0
20
40
60
80
100
120
140
160
180
200
140
160
180
200
140
160
180
200
Time (hours)
Hinton James −RESIDENTIAL
30
25
Session arrivals
20
15
10
5
0
0
20
40
60
80
100
120
Time (hours)
Phillips −ACADEMIC
45
40
35
30
Session arrivals
138
25
20
15
10
5
0
0
20
40
60
80
100
120
Time (hours)
5.7 Scalability and reusability in user demand models
139
ITS −ADMINISTRATIVE
15
Session arrivals
10
5
0
0
20
40
60
80
100
120
140
160
180
200
140
160
180
200
140
160
180
200
Time (hours)
Ehringhaus −RESIDENTIAL
16
14
Session arrivals
12
10
8
6
4
2
0
0
20
40
60
80
100
120
Time (hours)
McColl −ACADEMIC
250
Session arrivals
200
150
100
50
0
0
20
40
60
80
100
120
Time (hours)
Fig. 5.19. Hourly session arrival rates for representative building types in UNC.
140
5 Modeling the wireless user demand
and temporal scale “B”. The required number of sampling distributions for
modeling each campus building under the “bldg(trace)” approach would be
4N , where N is the number of campus buildings. Repeating this procedure, for
each single day of the trace (“bldg(day)”) would increase this number by a
factor of D, which denotes the number of days. When all buildings of the same
type are modeled by a common set of distributions for flow-related variables,
their number is reduced to N + 3M , where M is the number of building types.
Smaller values of M can make the difference more dramatic and vice-versa.
Thus, M acts as a tuning knob that can trade computing requirements with
model accuracy and determines the complexity of the simulator. The type
of building, the population of clients that access the network, the patterns of
usage, and the environment are a non-exhaustive list of factors that contribute
to the spatial and temporal variation of traffic demand. The following sections
discuss how the modeled traffic variables vary across various time (hour, day,
week) and spatial scales (building, building-type).
5.7.1 Variation of the session arrival rate within a day
Figure 5.19 plots the hourly session arrivals over the whole 2006 trace duration (192 hours) for some representative campus buildings. Although the
absolute numbers of session arrivals and their exact variation are specific to
each building, these profiles exhibit clear patterns that are, to a large extent,
intuitive and closely related to the building type and usage. For example:
•
•
•
Administrative and business buildings present clearly similar daily and
weekly patterns in their profiles. The activity window is quite narrow
during weekdays (6-8 hours long), in agreement with the working hours,
whereas the activity during the weekend is almost zero.
Residential buildings show distinctly different patterns. The number of
session arrivals is more uniformly distributed across the week and hours
within the day. The activity is also significant during the evening hours,
often resulting in a daily or weekly peak.
Academic buildings lie somewhere in between these two patterns. The daily
window of activity is clearly broader than the administrative and business
buildings, since they host WLAN clients for longer time intervals during
the day. Weekends see fewer session arrivals and shorter windows of activity
when compared with residential buildings, but traffic is non-negligible.
5.7.2 Variation of the session-level flow-related variables
The variation of traffic demand is also evident in the session-level variables.
Their empirical distribution functions at the building-type level reflect this
variation. Figure 5.20 (top) shows the broad variation of the per buildingtype distribution tails of the in-session number of flows. The number of flows
5.7 Scalability and reusability in user demand models
10
Pr( number of flows > k )
10
10
10
10
141
0
-1
-2
-3
-4
10
-5
10
0
10
2
10
4
10
6
Number of flows, k
Pr( mean flow inter-arrival <= f )
10
10
10
10
10
0
-1
-2
-3
-4
10
-4
10
-2
10
0
10
2
10
4
10
6
Mean flow inter-arrival (s), f
Fig. 5.20. Behavior of modeled session attributes across different types of campus
buildings.
142
5 Modeling the wireless user demand
related to the residential buildings sessions has a strikingly heavier tail, largely
related to the more active web browsing behavior of residential users. The plots
also suggest that the BiPareto distribution can be applied to model the per
building-type in-session number of flows.
The behavior of flow inter-arrival times across different building types is
presented in Figure 5.20 (bottom). Again, the plots of mean in-session flow
inter-arrivals suggest that the variables could potentially be modeled by the
same type of distribution for all building types, though with different parameters. The mean flow sizes across different building types are more similar [217].
The building type is an intuitive heuristic attribute for grouping buildings, providing a base for a unified treatment of the spatial dimension of the
modeling task. The actual utility of this base is evaluated in the following
section.
5.8 Evaluation of user demand models
This section evaluates our models in different spatio-temporal scales to highlight their accuracy and also addresses the accuracy and scalability tradeoffs
using statistical-based and system-based metrics. These metrics were not explicitly addressed by our models. The time-varying Poisson session arrivals
are always modeled using the hourly building-specific data. The main focus
of this analysis is on the flow related parameters.
5.8.1 Statistical-based evaluation
The following statistics-based metrics are used:
•
•
the flow arrival count process
the flow inter-arrival time-series
The impact of various scales in the accuracy of our data is clearly illustrated in Figure 5.21. As either the spatial scale or temporal scale increases
(from building “bldg” to building type “bldgtype” to “network” or from
day “day” to the entire tracing period “trace”, respectively), the synthetic
traces based on our models diverge from the empirical ones. A first view of the
“noise” introduced by the aggregation is reflected in the deviation between
the curves, as the spatial scale increases (e.g., “empirical” compared to
“bldgtype(trace)”). The “empirical” corresponds to the empirical trace
collected from all APs in a busy building during the entire tracing period.
Staying at the building-type level does not result in a significant loss of
accuracy compared to the building level. Despite its simplicity, the aggregation (“network(trace)”) does not result in substantially higher loss of
information. Interestingly, the aggregation in the spatial scale may cancel out
the impact of the fine temporal scale (e.g., the performance of the “bldgtype(day)” compared to “network(trace)”).
5.8 Evaluation of user demand models
143
0
10
CCDF
EMPIRICAL
BLDG(DAY)
BLDG(TRACE)
BLDGTYPE(DAY)
BLDGTYPE(TRACE)
NETWORK(TRACE)
−1
10
0
10
2000
4000
6000
8000
Flow arrival Count
10000
12000
0
CCDF
EMPIRICAL
BLDG(DAY)
BLDG(TRACE)
BLDGTYPE(DAY)
BLDGTYPE(TRACE)
NETWORK(TRACE)
-1
10
-2
10
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Flow interarrivals
Fig. 5.21. Count of flow arrivals in an hour and flow interarrivals for different
spatio-temporal scales.
144
5 Modeling the wireless user demand
Further improvement would be obtained by modeling the flow-related variables over shorter time periods than over the full monitoring period or over a
day. In fact, the standard practice is to focus on modeling short-time windows
where the building activity experiences its peak (busy hour).
5.8.2 Systems-based evaluation
When the statistical metrics show a deviation of the models from the empirical data, the systems-based metrics can be used to evaluate the impact of
this difference on the performance of that system (or protocol). This section
analyzes the performance of a hotspot AP under real-traffic conditions. As
input for the user demand, it uses the empirical, real-life traces from UNC
(empirical) and synthetic traces based on our models. Furthermore, it performs a comparative analysis study of several models. The following metrics
were employed to characterize the performance of the ieee802.11 AP under
real-life network conditions:
•
•
hourly aggregate throughput
per-flow delay, jitter, throughput, and goodput
Unlike throughput that takes into account all the data transferred in the
transport layer, goodput only considers the amount of bytes delivered from
the transport layer to the application layer. The delay per flow is the mean
delay of a packet in the flow, which is the difference of the time required for
the packet to be delivered at the receiver from the time it was enqueued at
the sender. The jitter expresses the delay variability experienced by a receiver.
The reported jitter value for a flow corresponds to the cumulative absolute
difference between the delay of reception of consecutive packets.
It is expected that per-flow statistics will behave differently from the hourly
aggregate statistics, given that most flows last less than one hour. Average
per flow statistics are more sensitive to network dynamics than aggregate
hourly flow statistics. The latter are less dependent on localized and transient
phenomena, and can be useful to mechanisms (such as capacity planning, loadbalancing, and admission control) that require knowledge of the user-demand
in larger time scales.
The main objectives of this analysis are threefold:
•
•
•
demonstrate the accuracy of our models using systems-based criteria
highlight the impact of flow arrival and flow size on the throughput, goodput, jitter, and delay measured in a wireless LAN
provide a comparative analysis study of various traffic models that simulate real-traffic demand conditions
Comparative analysis of various models
To illustrate the importance of accurate modeling of flow sizes and flow interarrivals in simulation studies, and also highlight the parameter with the great-
5.8 Evaluation of user demand models
Model
bipareto-lognormal
bipareto-lognormal-ap
pareto-empirical
pareto-uniform
fixed-empirical
empirical-fixed
fixed-uniform
Lognormal-Weibull
fixed-fixed
145
Size
Interarrival Arrival
BiPareto Lognormal
BiPareto Lognormal
Pareto
empirical empirical
Pareto
Uniform
fixed
empirical empirical
empirical
fixed
fixed
Uniform
Lognormal
Weibull
fixed
fixed
Table 5.3. Generation of synthetic traces based on various models for the flow-based
parameters. In some models (e.g., pareto-uniform), the flow arrival is modeled
while in others (e.g., bipareto-lognormal), the flow inter-arrival. The fixed flow
size is equal to the mean flow size in the empirical trace. The “empirical” in a
parameter indicates an exact match of the values in the corresponding field of the
synthetic and empirical traces.
est impact on the performance, we derived several additional models, summarized in Table 5.3. The following notation was used: “x-y” to indicate that
the flow size follows the “x” distribution and the flow interarrival the “y”.
Based on these models, synthetic traces were generated and replayed in the
simulations. For fitting the parameters of these models, the empirical trace
of a hotspot AP was used. Some of these models kept either the flow size
or the flow inter-arrival identical with the corresponding data in the empirical trace (e.g., pareto-empirical and fixed-empirical). We experimented
with flow arrivals that follow the uniform distribution and derived flow sizes
from a Pareto distribution. Both distributions are popular choices for modeling
the arrival process of flows, and the size of files downloaded via peer-to-peer
applications, ftp, and http [266]. The only difference between the paretoempirical and fixed-empirical synthetic traces from the empirical trace
is on the size of each flow. In particular, the pareto-empirical synthetic
trace is based on flow size values derived from a Pareto distribution, while
the fixed-empirical synthetic traces have flow size values that are fixed and
equal to the mean flow size of the empirical trace. Notice that the total aggregate traffic of the fixed-empirical trace is the same as that of the empirical
trace.
The flow arrival times of the pareto-uniform and fixed-uniform synthetic traces are derived from a uniform distribution on the interval [0, T ],
where T is the duration of the empirical trace. The flow sizes in the paretouniform trace were derived using a Pareto distribution, while the fixeduniform trace includes flow sizes that are fixed, equal to the mean of the flow
sizes in the empirical trace. The proposed models are bipareto-lognormal
and bipareto-lognormal-ap. They are the only ones using the session ab-
146
5 Modeling the wireless user demand
straction, and the number of flows per session is modeled as a BiPareto distribution.
The lognormal-weibull model (proposed by Meng et al. [261]) is composed of flow inter-arrival times that follow a Weibull distribution in an hourly
basis and flow sizes that are based on a Lognormal distribution. The parameters of the Weibull distribution were determined using maximum likelihood
estimation for each hour-of-day of the empirical trace. To fit the parameters of
the Lognormal distribution for flow size, all flows of the empirical trace were
used. In addition, synthetic traces based on naive models (e.g., fixed-fixed)
were generated. In the fixed-fixed model, flow sizes are equal to the mean
flow size of the empirical trace and flow interarrivals are equal to the mean
duration of the mean in-session flow interarrival. Such models have been used
extensively in performance analysis studies of wireless networking protocols.
All the synthetic traces were generated via Syntrig. To fit their parameters of all except the bipareto-lognormal model, the empirical trace corresponding to a hotspot AP was used. For the synthetic trace of biparetolognormal, the empirical trace of the entire wireless infrastructure was employed.
Fig. 5.22. The simulation/emulation testbed for analyzing the performance of a
wireless LAN. The wired clients act as traffic “sources” (senders), while the wireless
clients as “sinks” (receivers).
5.8 Evaluation of user demand models
147
1
EMPIRICAL
PARETO−EMPIRICAL
PARETO−UNIFORM
FIXED−EMPIRICAL
FIXED−UNIFORM
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
LOGNORMAL−WEIBULL
0.9
0.8
CCDF
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
10
0
1000
2000
3000
4000
Throughput (Kbps)
6000
0
EMPIRICAL
PARETO−EMPIRICAL
PARETO−UNIFORM
FIXED−EMPIRICAL
FIXED−UNIFORM
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
LOGNORMAL−WEIBULL
-1
10
-2
CCDF
5000
10
-3
10
-4
10
-5
10
0
1000
2000
3000
4000
Goodput (Kbps)
5000
6000
7000
Fig. 5.23. Throughput and goodput per flow in a wireless hotspot AP simulated
with real-traffic demand conditions.
148
5 Modeling the wireless user demand
1
EMPIRICAL
CCDF
0.9
PARETO−EMPIRICAL
0.8
PARETO−UNIFORM
0.7
FIXED−UNIFORM
0.6
BIPARETO−LOGNORMAL
FIXED−EMPIRICAL
BIPARETO−LOGNORMAL−AP
LOGNORMAL−WEIBULL
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
Jitter (ms)
0.5
0.6
0.7
1
0.9
0.8
CCDF
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
3.06
EMPIRICAL
PARETO−EMPIRICAL
PARETO−UNIFORM
FIXED−EMPIRICAL
FIXED−UNIFORM
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
LOGNORMAL−WEIBULL
3.07
3.08
3.09
3.10
3.11
Delay (ms)
Fig. 5.24. Delay and jitter per flow in a wireless hotspot AP simulated with realtraffic demand conditions. The empirical curve is very close to the biparetolognormal and bipareto-lognormal-ap.
5.8 Evaluation of user demand models
149
1
EMPIRICAL
PARETO−EMPIRICAL
PARETO−UNIFORM
FIXED−EMPIRICAL
FIXED−UNIFORM
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
LOGNORMAL−WEIBULL
0.9
0.8
CCDF
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
50
100
150
200
250
Throughput (Kbps)
300
350
400
Fig. 5.25. Aggregate hourly throughput in a wireless hotspot AP.
TCP-based experiments via simulations
The ns-2 testbed simulates a wireless LAN with three wireless clients associated with the same AP, and four wired clients connected via a router to the
Internet (as shown in Figure 5.22). The link between the wired devices and
the router has a speed of 100 Mbps. All links are duplex with a propagation
delay of 2 ms, except for the one connecting the router to the AP which has a
1 ms delay, and a fifo scheduling and drop-on-overflow buffer with a default
size of 40 packets. The wired devices act as traffic sources, running an ftp application and sending data to the wireless clients (traffic sinks). The wireless
clients use tcp reno to download traffic from the Internet. Various synthetic
and empirical traces were “replayed” in the simulation testbed. The session
id “determines” the sink and each in-session flow is assigned to a source in a
round-robin fashion.
A consistent trend for all benchmarks is that the bipareto-lognormalap model produces synthetic traces which when replayed in ns-2 result in
a performance almost identical to the empirical ones (as shown in Figures 5.23, 5.24, and 5.25). The next best model, resulting in a performance
close to the empirical traces, is the bipareto-lognormal. The lognormalweibull performs reasonably well. It should reminded that the lognormalweibull trace was generated using empirical data collected for the corresponding AP and specific hour of day, and thus, is less scalable than the
bipareto-lognormal one. Moreover, unlike the bipareto-lognormal and
bipareto-lognormal-ap, it strongly underestimates the flow sizes.
The rapid drop in the throughput and delay per flow (in Figures 5.23 and 5.24)
is due to the large percentage of flow sizes equal to the maximum segment size
150
5 Modeling the wireless user demand
(MSS). These tcp flows correspond to transfers that carry less than 1 KB, and
in ns-2, all the payload is packed in one MSS. In all the empirical, biparetolognormal, bipareto-lognormal-ap, and lognormal-weibull, a large
percentage of flows with size of 1 KB or less was found.
The fixed-fixed model exhibits the worst performance among all these
models. The departure of the pareto-empirical and fixed-empirical from
the empirical traces is prominent in both the hourly throughput and per-flow
statistics and demonstrates the impact of the flow size. Note that although
the fixed-empirical carries the same amount of total workload as the empirical trace, its performance deviates substantially from the empirical.
Furthermore, the flow interarrival models have a prominent impact on the
hourly throughput.
For the per-flow throughput and goodput, the flow inter-arrival exhibits
a stronger impact than the flow size. For example, the fixed-uniform and
the pareto-uniform models have similar performance. Likewise, the fixedempirical and pareto-empirical models exhibit similar performance characteristics. When the distribution of the flow size remains the same while the
flow interarrival distribution changes (e.g., pareto-empirical compared to
pareto-uniform and fixed-empirical compared to fixed-uniform), their
performance deviates prominently.
1
Hour1 [15kbps]
0.9
Hour167 [15kbps]
Hour98 [500kbps]
0.8
Hour100 [500kbps]
0.7
CCDF
0.6
0.5
0.4
0.3
0.2
0.1
0
0
500
1000
1500
2000
2500
Average per flow throughput (kbps)
3000
3500
Fig. 5.26. Comparing per-flow statistics for hours that have produced the same
aggregate download traffic.
5.8 Evaluation of user demand models
151
To demonstrate that the per-flow statistics can carry useful information for
the performance of the network, we selected several hours from this hotspot
AP with very close mean hourly throughput statistics and found that their
per-flow throughput and delay statistics may differ substantially (Figure 5.26).
The modeling study was repeated for hotspot APs with different application
mixes, namely, an AP with 85% web traffic, a second one with 50% web and
40% peer-to-peer, and a third one with 80% peer-to-peer. The applicationbased classification was performed utilizing blinc [300]. Using empirical traces
from these APs, we fitted the parameters of our proposed models and produced the corresponding synthetic traces. Then, these traces were replayed
in the simulation testbed. Figures 5.27 and 5.28 show clearly that for the
first two APs with a large percentage of web traffic (50% or more), synthetic
traces based on our models perform very similarly to the empirical traces.
However, for the peer-to-peer traffic dominated AP, the performance of our
models is less satisfying. Thus, the application mix can have a dominant impact on the accuracy of our models. Modeling the peer-to-peer traffic is not
an easy task, especially due to the increased number, diversity, complexity
and unpredictability in user interaction of peer-to-peer applications.
In a preliminary analysis of our models under “heavy-traffic” conditions
at an AP, we focused on hours within which the total amount of wireless
traffic accessed by that AP was above the 90-th percentile. We found that the
proposed distributions approximate reasonably well the traffic collected during
these heavy-traffic hours (as shown in Figure 5.29). Further analysis is required
not only using more such intervals, but also determining various network
conditions that impact the performance of application (“user satisfaction”
and quality of service) and evaluating our models under those conditions.
Emulations on TCP
Often simulations fail to capture all the interactions and dependencies across
the different layers. Emulations can be used to provide a more thorough
look over the controlled experiments and further validate the performance
results. For this purpose, we repeated the study on a small testbed using
Harpoon [331].7 The emulation testbed consists of three stationary wireless devices—operating as network sinks—a desktop PC—running all three
corresponding Harpoon servers—and a Cisco Aironet 1200 AP, operating in
ieee802.11b. The servers replayed our empirical and synthetic traces. The
wireless clients and the server were synchronized using ntp. Packet transfers
on transport layer were monitored by tcpdump, running on all four computers. Using these packet header traces, we measured the performance of the
AP based on the same benchmarks as in simulations. As in simulations, the
synthetic traces based on our models resulted in a performance very close to
that when empirical traces were used.
7
Harpoon can be used in an emulation testbed to generate flows with certain flow
arrival and flow size values provided as input distributions.
5 Modeling the wireless user demand
CCDF
152
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
EMPIRICAL
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
500
1000
1500
2000
2500
3000
3500
CCDF
Throughput (Kbps)
(a) hotspot with 85% web traffic
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
EMPIRICAL
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
500
1000
1500 2000 2500
Throughput (Kbps)
3000
3500
CCDF
(b) hotspot with 50% web and 40% peer-to-peer traffic
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
EMPIRICAL
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
0
500
1000
1500 2000 2500
Throughput (kbps)
3000
3500
(c) hotspot with 80% peer-to-peer traffic
Fig. 5.27. Impact of the application mix on per-flow throughput (selected hotspots
with different application mixes).
CCDF
CCDF
5.8 Evaluation of user demand models
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
153
EMPIRICAL
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
3.06
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
3.06
3.07
3.08
3.09
3.10
Delay (ms)
(a) hotspot with 85% web traffic
3.11
EMPIRICAL
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
3.08
3.09
3.10
3.11
Delay (ms)
(b) hotspot with 80% web and 40% peer-to-peer traffic
CCDF
3.07
1
0.9
0.8
0.7
0.6
EMPIRICAL
BIPARETO−LOGNORMAL−AP
BIPARETO−LOGNORMAL
0.5
0.4
0.3
0.2
0.1
0
3.06
3.07
3.08
3.09
3.10
Delay (ms)
(c) hotspot with 80% peer-to-peer traffic
3.11
Fig. 5.28. Impact of the application mix on per-flow delay (selected hotspots with
different application mixes).
154
5 Modeling the wireless user demand
5
x 10
Original data quantiles
4
3.5
3
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5
2
2.5
3
3.5
Synthetic data quantiles
4 5
x 10
(a) Flow size
Original data quantiles
3
5
4.5
4
3.5
3
2.5
2
1.5
x 10
1
0.5
0
1 1.5 2 2.5 3 3.5 4
Synthetic data quantiles
0 0.5
4.5 5 3
x 10
Original data quantiles
(b) in session flow interarrival
3.5
x 10
3
3
2.5
2
1.5
1
0.5
0
0
0.5
1
1.5 2
2.5 3
Synthetic data quantiles
3.5
3
x 10
(c) number of flow per session
Fig. 5.29. Our models persist for traffic generated during busy periods. The empirical trace used here corresponds to an hour of a hotspot AP with heavy workload
conditions.
5.9 Singular spectrum analysis of traffic at APs
155
udp-based experiments via simulations
To further evaluate the models, udp-based scenarios were performed. In order to do this, the bipareto-lognormal-ap, bipareto-lognormal-ap,
empirical, and CBR-based traffic (popular in simulation studies) were compared with respect to hourly throughput. In the simulation experiments, the
number of communicating pairs (wired senders and wireless receivers) were
drawn from a uniform distribution in the range of [2,10]. The amount of
data replayed in each set of experiments was equal to the aggregate traffic
in the original trace. A small random delay was introduced in the arrival
time of each flow to avoid the concurrent start of all flows. Each session of
the bipareto-lognormal-ap, bipareto-lognormal-ap, and empirical
traces, is “assigned” to a pair (wired sender and wireless receiver). A UDP
transmission is then initiated of size equal to the defined flow size at the specified flow arrival time, for each flow. In CBR scenarios, each source transmits
a persistent UDP flow with size equal to that of the original trace divided by
the total number of pairs. For the CBR simulations, the mean computed from
a total of ten runs was reported (Table 5.4). The bipareto-lognormal-ap, bipareto-lognormal-, and empirical-based scenarios perform similarly,
deviating from the CBR-based ones.
CBR rate Median throughput
10 Kbps
9.7 Kbps
25 Kbps
15.2 Kbps
50 Kbps
30.5 Kbps
100 Kbps
57.7 Kbps
Table 5.4. Median hourly throughput in scenarios using CBR sources.
Table 5.4 shows statistics on the hourly throughput for various CBR transmission rates. When wired clients transmit at 25 Kbps, the mean hourly
throughput is 13.2 Kbps. The mean hourly throughput in the empirical
trace is 1.7 Kbps, while in the bipareto-lognormal-ap and biparetolognormal, it is 10.3 Kbps and 9.7 Kbps, respectively. The median hourly
throughput in the empirical trace is 1.4 Kbps, while in the bipareto-lognormalap and bipareto-lognormal, it is 2.4 Kbps and 2.6 Kbps, respectively. On
the other hand, the median hourly throughput is 15.2 Kbps. Thus, the traffic
based on the CBR models results in an hourly throughput distribution that
differs substantially from the one produced using the empirical traces.
5.9 Singular spectrum analysis of traffic at APs
For quality of service provision, capacity planning, load balancing, and network monitoring, it is critical to understand the traffic characteristics. For this
156
5 Modeling the wireless user demand
purpose, the analysis of the traffic load time-series at APs can be important.
In some earlier studies, we modeled the traffic load at APs using variants of
the Moving Average and Autoregressive Moving Average models [287, 282].
Due to the complicated structures of these traffic load series, traditional
algorithms of non-linear analysis may not result in reliable estimates. However,
after filtering out a high-frequency component—which can be considered as
a noisy part—we could expect to obtain a more accurate estimation of the
embedded dimension of the underlying process. Motivated by this observation
we analyzed traffic series by decomposing them in two components, namely,
a low-frequency and a high-frequency one, using Singular Spectrum Analysis
(SSA).
SSA [162] belongs to the general category of PCA methods [208]. The
SSA method is very effective in the analysis of time-series corresponding to
an arbitrary process. In a recent work [63], SSA was used to analyze the
dynamics of traffic obtained in an intermediate-scale wired LAN. To the best
of our knowledge, our study in [342] is the first one that applies SSA to the
analysis of traffic from a WLAN.
SSA allows us to explore the intrinsic dimensionality and structure of
the time-series corresponding to the traffic load at a given AP, using data
collected from a campus-wide WLAN infrastructure. To investigate the nature
of this dimensionality, we introduce the notion of eigenloads. Derived from the
implementation of SSA on a given traffic load series, an eigenload is a timeseries that captures a particular source of temporal variability. Each traffic
load series can be expressed as a weighted sum of eigenloads, where the weights
are proportional to the extent to which each eigenload is present in the given
traffic load series.
We show that traffic eigenloads in a WLAN fall into two natural classes:
•
•
deterministic eigenloads, which capture the slow-varying trends in the traffic load series
noise eigenloads, which account for traffic fluctuations appearing to have
relatively time-invariant properties
By categorizing eigenloads in this manner, we can obtain a significant insight
into the intrinsic properties of the traffic load series. Our main findings can
be summarized as follows:
•
•
•
Each time-series can be well approximated by only a small number of
eigenloads, which constitute its “feature set”.
These features vary in a predictable way as a function of the amount of
traffic carried in the time-series.
The largest traffic load series, i.e., the series with the highest mean traffic
load, are primarily deterministic. On the other hand, traffic load series of
moderate size are generally comprised of noisy features.
Motivated by the observation that the deterministic part of a traffic load series
presents a slow variation in time and carries the main part of the information
5.10 Related work
157
content, we designed a predictor that performed trend forecasting at a larger
than an hourly time-scale. This forecasting algorithm is based on the modeling
of the traffic-series using a linear model of order p, whose coefficients (weights)
were estimated using the Normalized Least Mean Squares approach.
In future work, we will complete the design of the proposed predictor by
taking into account not only the deterministic, but also the noisy component
of a given traffic series. For this purpose, an optimal radial basis function will
be trained for the prediction of the noisy part [244].
Another interesting problem is the detection of dynamic changes of the
future traffic load values. In particular, the accurate detection of transitions
from a normal to an abnormal state, either due to hardware or software failure, or due to an attack, may improve diagnosis and treatment. The multiscale decomposition given by the SSA approach, could be combined with the
conceptually simple and computationally very fast concept of permutation
entropy [103] to detect dynamical changes in the subset of noisy eigenloads,
which are responsible for the transient behavior of the traffic load series.
5.10 Related work
A large body of literature has developed concepts and techniques for modeling
Internet traffic, especially in terms of statistical properties (e.g., heavy-tail,
self-similarity). For example, heavy-tailed distributions appear in the sizes
of files stored on web servers [124], data files transferred through the Internet [294], and files stored in general-purpose Unix filesystems, suggesting the
prevalence and importance of these distributions. Self-similarity characteristics exist in Internet traffic. In a pioneering work, Leland et al. showed that
LAN traffic exhibits a self-similar nature [243]. Evidence of self-similarity was
also found in WAN traffic [296]. In that work, Paxson and Floyd demonstrated
that self-similar processes capture the statistical characteristics of the WAN
packet arrival more accurately than Poisson arrival processes, which are quite
limited in their burstiness, especially when multiplexed to a high degree. Selfsimilar traffic does not exhibit a natural length for its “bursts”. Its traffic
bursts appear in various time scales [243]. The relation of the self-similarity
and heavy-tailed behavior in wired LAN and WAN traffic was analyzed by
Willinger et al. [355]. On the other hand, Poisson processes can be used to
model the arrival of user sessions (e.g., telnet connections and ftp control
connections). However, modeling packet arrivals in telnet connections by a
Poisson process may result in inaccurate delay characteristics, since packet
arrivals are strongly affected by network dynamics and protocol characteristics.
Web traffic exhibits also self-similarity characteristics. Crovella and Bestavros showed evidence of this and attempted to explain them in terms of file
system characteristics (e.g., distribution of web file size, user preference in file
transfer, effects of caching), user behavior (e.g., “think time” accessing a web
158
5 Modeling the wireless user demand
page), and the aggregation of many such flows in a LAN [123]. The majority
of web traffic in wired networks is below 10 KB while a small percentage of
very large flow account for 90% of the total traffic. They employed powerlaws to describe web flow sizes. We also observed similar phenomena in the
campus-wide wireless traffic. A nice discussion of the use of power law and
lognormal distributions in other fields can be found in [266].
Peer-to-peer applications evolve rapidly, dominating the traffic mix in several cases. As recent studies have indicated, peer-to-peer and web traffic differ
significantly (e.g., unlike in web, where web clients may download a popular web page, multiple times, the immutability of Kazaa’s multimedia objects
leads clients to fetch objects at most once) [168]. However due to their increasing number, the differences in their communication pattern, and the difficulty
to classify them accurately, modeling of peer-to-peer traffic is challenging.
Two general approaches for traffic generation are the packet-level replay
and source-level generation. The packet-level replay is an exact reproduction
of a collected trace both in terms of packet arrival times, size, source and destination, and content type. To analyze a system under various traffic conditions,
researchers need to employ the appropriate packet-level trace that exhibits the
required traffic conditions. However, collecting the appropriate empirical data
is a non-trivial task. Specifically, reproducing the intended packet arrival process can be complex due to the arbitrary delays introduced at the various network components by various interrupts, service mechanisms, and scheduling
processes. Closed-loop or feedback-loop characteristics manifest the reactions
of the source and destination of a flow to network conditions, triggering further
changes (e.g., tcp’s congestion avoidance mechanism). However, packet-level
replays cannot reflect such feedback-loop characteristics.
Adopting a different approach, the source-level models the sources of traffic
(e.g., the applications running on the source and destination). These sources
are used as building blocks, along with the various network components that
can be modeled or simulated, allowing the analysis of a system under various
conditions. The generation of packet-level data can be based on some statistical properties that characterize the empirical data, and thus, ensure that the
synthetic data are “realistic enough”. However, it is important to note that
the realism of a trace depends tightly on the system to be studied. The selection of these statistical properties that are general enough but also tunable to
express different traffic conditions/profiles is a non-trivial task and depends
on the characteristics of the system to be studied. The source-level approach,
advocated by Paxson and Floyd [149], allows the underlying network, protocol, and application layer to specify and control the packet arrival process. The
infinite source model is one of the simplest and popular source-level models.
It has no parameters and is used to model very large network flows. However
the infinite source model models the traffic poorly, since the majority of Internet traffic is relatively light, with bidirectional flows, and of small packet
size [107, 211, 150]. An enlightening discussion of these approaches is included
in [178].
5.11 Conclusions
159
Our approach is inspired by the source-level (or network independent)
modeling. The main assumption is that session arrivals—initiated by humans—
at a large extent are not affected by the underlying network technology. Furthermore, given the relatively low percentage of packet loss at the network
layer, we assumed that the in-session flow size and flow arrivals can approximate the intended user traffic demand. The proposed user workload traces can
then be integrated with a channel, packet generation, and network topology
model to simulate/emulate certain conditions in the context of a performance
analysis study.
Traffic generation is an important aspect of the network modeling and simulations. Several studies have addressed the challenges and provided guidelines
on generating realistic synthetic traffic in wired networks [296, 149, 178]. In
general, traffic generators may either use mathematical models (e.g., a Poisson
process) or empirical data (e.g., Swing [349]). Swing focuses on characterizing
and mimicking packet inter-arrival rate, packet size distribution, destination
ip address, and port distribution in a wired network. Unlike Swing that aims
to produce synthetic traces that capture the network conditions, our objective
is the generation of user demand traces based on accurate models of intended
traffic demand, independent from specific network characteristics.
While there is rich literature on traffic characterization in wired networks
(e.g., [354, 80, 120, 102, 278]), there is significantly less work of the same
depth for WLANs. Hierarchical approaches to modeling the wireless demand
and its spatial and temporal phenomena have received little attention from
our community. In fact, the only relevant study we are aware of is the flowlevel modeling study by Meng et al. [261]. The authors used the available
Dartmouth traces, that include syslog messages and tcpdump data from 31
APs in five buildings. They proposed a two-tier (Weibull regression) model
for the arrival of flows at APs and a Weibull model for flow residing times,
and they also observed high spatial similarity within the same building. The
authors also studied the modeling of flow size, and suggest that a Lognormal
model provides the best approximation.
Minkyong et al. [223] clustered APs based on their peak hours and analyzed the distribution of arrivals for each cluster, using the aggregate client
arrivals and departures at APs. Similar clusters based on registration patterns were also reported by Ravi Jain et al. in their modeling study of user
registration at APs [201].
5.11 Conclusions
We introduced a novel methodology for modeling the wireless access and traffic
demand by providing a multilevel perspective. In particular, we modeled the
arrival and size of sessions and flows considering various spatio-temporal scales
and explored their statistical properties, dependencies and inter-relations.
160
5 Modeling the wireless user demand
Time-varying Poisson processes provide a suitable tool for modeling the arrival processes of clients at APs. We validated these results by modeling the
visit arrival rates at different time intervals and APs. In addition, we proposed
a clustering of the APs based on their visit arrival and the functionality of the
area in which they are located. The models have been validated using empirical data from different time periods (an entire week in April 2005 and another
one in April 2006), different time scales (week, day, hour), different spatial
scales (AP, group of APs located within the same building, set of APs located
within buildings of the same functionality, and entire wireless infrastructure),
and various workload conditions (with respect to the application-mixes and
amount of traffic load). The BiPareto distribution models well the flow sizes of
the Dartmouth trace, collected from its wireless campus-wide infrastructure.8
Although the absolute numbers of session arrivals and their exact variation are specific to each building, these profiles exhibit clear patterns that
are, to a large extent, intuitive and closely related to the building type and
usage [217]. Also, the mean in-session flow inter-arrivals across different buildings and building types suggest that the variables could potentially be modeled
by the same type of distribution for all building types, though with different
parameters. The mean flow sizes across different building types are very similar [217]. Furthermore, the empirical traces collected from APs with 50% or
more of web traffic can be fitted nicely by the proposed models. However, for
workloads dominated by peer-to-peer traffic, the fit deteriorates significantly.
Syntrig generates synthetic traces based on a set of tunable parameters
(i.e., its input). These parameters are tightly associated with the proposed
models and can reflect various conditions, such as flow sizes, flow interarrival
times, session arrivals, application mixes, and session profiles. The obtained
synthetic traces can then be “replayed” in emulation or simulation testbeds
in the context of a performance analysis study. Synthetic traces based on our
models result in a performance very close to the one when empirical traces
are used as input. Furthermore, synthetic traces based on popular models—
employed frequently in simulations—exhibit large deviations from the empirical traces. The trade-offs between accuracy and scalability of our models were
evaluated using statistics-based and systems-based benchmarks.
Different synthetic traces can be generated for various application mixes,
traffic loads, and user profiles. Such traces can be used in the performance
analysis of algorithms for capacity planning, dimensioning, and admission control under different traffic load, user profiles, and application mix conditions.
The flexibility in defining different profiles is desirable, especially given the fact
that the user traffic demand cannot be easily determined: new applications
and services gain popularity and new user behavior, type of devices and access
patterns emerge. Thus, a natural next objective is to derive client profiles, a
more intuitive abstraction than session profiles. Understanding this part of
8
Given that session-related information could not be generated using the available
Dartmouth traces, only the flow size models were validated.
5.11 Conclusions
161
the workload will make simulations more intuitive, in the sense that the input
could be the number of clients and perhaps some parametric description of
their long-term access patterns. Ideally, these client profiles would be based
on the proposed session and flow distribution and tunable parameters.
It would be interesting to explore different user workload profiles, utilizing traces from emerging wireless environments. As more traces from various wireless network environments become available, it is critical to develop
methodologies and tools for searching for “law-like” relationships across these
different traces that can be generalized to a wide range of different conditions.
6
Conclusions and future work
6.1 Conclusions
The advances in wireless communications and the adoption of mobile computing devices have further impelled the evolution of pervasive computing space.
We envision users with wirelessly-enabled devices, interacting with such pervasive computing spaces to access, generate and share information, forming new
social networks and networking paradigms. In such networking environments,
self-organizing, autonomous devices interact with each other to enhance information access. Their autonomy, self-organization, and cooperation led us
to explore the peer-to-peer paradigm.
Our research was driven by several questions: Given their frequent disconnections, how can wireless devices exploit their increasing storage and
processing power to enhance the information access? How fast does information diffuse in such mobile networks? Does wireless access exhibit high spatial
locality of information? What is the interplay between device cooperation and
data availability? What are the gains if devices act as miniature mobile caches?
How do clients access wireless networks and what is their traffic demand?
6.1.1 Mobile peer-to-peer computing
We proposed 7DS, a novel mechanism that enables wireless devices to share
resources in a self-organizing manner, without the need of an infrastructure.
In information sharing, peers query, discover, and disseminate information,
while for message relaying, hosts forward messages to the Internet on behalf
of other hosts when they gain Internet access. The percentage of hosts that
acquire the data object as a function of time and their average delay were
measured. We found that the density of the cooperative hosts, their mobility, and the transmission power have the most pronounced impact on data
dissemination. The synchronization of the periods that the network interface
of peers is powered and the reduction in the frequency of querying can save
164
6 Conclusions and future work
energy. In the case of FIS with a low density of hosts, the query frequency
can be set as large as three minutes without impacting on the speed of data
dissemination. Similar results hold in the case of P-P.
The performance of data dissemination remains the same when the area
is expanded but the density of the cooperative hosts and the transmission
power are kept fixed. Also, for a fixed wireless coverage density, the larger the
density of cooperative hosts, the better the performance. In S-C, this implies
that for the same wireless coverage density, it is more efficient to have a
larger number of cooperative hosts with lower transmission power than fewer
with a higher transmission power. We also presented an analytical model for
FIS using theory from random walks and environments and the kinetics of
diffusion-controlled processes.
The spatial locality of information was the driving force behind 7DS. To
evaluate the degree of spatial locality in a real environment, we analyzed web
requests collected from a large-scale wireless network. Although the web is not
primarily a location-dependent or collaborative application, its prevalence motivated this analysis. The spatial locality can be computed for various spatial
granularities. We mostly concentrated on AP-, building-, and infrastructurewide levels. Specifically, we measured how likely it is for two peers co-resident
within an AP to be interested in the same data, and how likely it is for a
client to request a data item that is already stored in the AP-, building-, or
infrastructure-wide level cache. The building-level cache is an aggregation of
all the caches of APs located in that building, while the infrastructure-wide
cache is an aggregation of all the caches of all the APs in the infrastructure.
The following caching paradigms were analyzed:
•
•
•
•
user cache
cache attached to an AP
peer-to-peer cache, in which peers are devices associated with the same
AP
campus-wide cache
The overall ideal hit ratios of the user cache, cache attached to an AP, and
peer-to-peer caching are 51%, 55%, and 23%, respectively. The ideal hit ratio
across APs varies and was found to be as high as 73%. For such APs, a local
AP cache can be beneficial. In general, the spatial locality of the wireless
web access varies across APs. Wireless web access also exhibits high temporal
locality. Each client frequently requests objects that it has requested within
the past hour, and occasionally, requests objects that have been requested by
other nearby users within the past hour.
We also applied the peer-to-peer paradigm to positioning for mobile computing devices. Our proposed system, CLS, positions wirelessly-enabled devices using the existing wireless communication infrastructure adaptively
without the need of specialized hardware or training. To improve its accuracy, CLS enables hosts to cooperate and share positioning information and
6.1 Conclusions
165
also allows the integration of external information, such as maps, popular
routes, and user mobility patterns.
6.1.2 Wireless measurements and modeling
In general, networks are extremely complex and the interaction of different
layers and technologies creates many situations that cannot be foreseen during
the design and testing stages of technology development. This is especially
true for wireless networks, which are used for many different purposes, and
which are based on a shared medium that is inherently more vulnerable than
its wired counterpart. One of the lessons learned during this research was
that it is critical to perform measurement-based studies, in order to uncover
deficiencies and identify possible optimizations for better utilizing the scarce
resources in wireless systems. As mentioned earlier, a typical evolution of a
technology consists of the following steps:
1.
2.
3.
4.
5.
simple simulations
advanced and more realistic simulations
emulations and tests in small-scale testbeds
tests in large-scale testbeds
adoption and use in real-life environments
The existence of testbeds, tools, benchmarks, and models is of tremendous
importance and can be a catalyst for further performance analysis and simulations.
Wireless networks have their own distinct characteristics and challenges
due to the radio propagation characteristics and mobility. Some typical assumptions in performance analysis studies on wireless networks are the following [100, 363]:
•
•
•
•
•
•
•
models and analysis of wired networks are valid for wireless networks
wireless links are symmetric
link conditions are static
the density of devices in an area is uniform
the traffic demand and access patterns are fixed
the communication pairs (i.e., source and destination devices) are fixed
users move based on a random-walk model
In most of the cases, these assumptions are unrealistic and incorrect. For
instance, it is known that, in general, the spatial distribution of network nodes
moving according to the random waypoint model is nonuniform (e.g., [82]).
Moreover, wireless channels can be highly asymmetric and highly time-varying.
Unfortunately, there are not many traces of actual data access patterns or
realistic models available for wireless users, especially for mobile peer-to-peer
settings (e.g., [203]). Often academics are reluctant to expend the time and
energy required to “sanitize” the data sets. Similarly, companies are not eager
166
6 Conclusions and future work
to disclose information they consider proprietary. The development of realistic,
but also general, tractable and elegant models is a non-trivial task.
In contrast to traditional wired-network topologies that reflect the physical hardwired connection of routers, wireless network topologies are more
dynamic and have a stochastic element due to the radio propagation conditions, the user mobility, and client-AP association process. Modeling wireless
network topologies opens up new research directions. For current traffic modeling tools, the application mixture and traffic models are quite simplistic.
One of the problems is that complex mobility and topology models are rich
sub-fields of their own expertise. There should be tools and methods for others
to effectively and easily use models from these sub-fields in standard simulators. The scaling properties of simulators are very important and have not
been fully addressed. For example, it is not clear that a simple 20-node simulation can be “stretched” to 10,000 node simulations by a “copy-and-paste”
methodology.
A wide range of traffic load is observed in wireless campus-wide infrastructures. In general the traffic load is light, though there are long tails. Furthermore, APs in campus-wide infrastructures exhibit a dichotomy with respect
to their upload and download traffic: there are APs dominated by uploaders and APs dominated by downloaders. The most popular applications are
web browsing and peer-to-peer, accounting for approximately 81% of the total
traffic, and most users are also dominated by these two applications.
Rich sets of empirical traces, collected from large-scale wireless infrastructures, impelled us to model the user and access demand, and thus, enable
more meaningful performance analysis studies. We distinguished the following important dimensions in wireless network modeling:
•
•
•
•
user demand
access patterns
network topology
channel conditions
This distinction enabled us to superimpose models for the demand on a given
topology and focus on the right level of detail. This monograph focused on
user demand and access patterns, modeling session and flow parameters. Sessions capture the interaction between the clients and the network, while flows
model the above-packet-level traffic activity masking the underlying network
dependencies. The wireless access of a client is modeled as an alternation between sessions and disconnections. An access pattern is characterized by an
arrival process at certain APs and a sequence of transitions between APs.
Important parameters in access patterns are the arrival process at an AP,
session and visit duration, transitions between APs, and predictability of the
next AP association.
The majority of the sessions last less than one hour. Wireless clients exhibited relatively low mobility, spending a large percentage of their wireless
life at the same AP. In general, mobile sessions tend to have a small percent-
6.1 Conclusions
167
age of long visits and a large percentage of short visits at APs. Markov-chain
models can be used to characterize transitions of clients between APs and
accurately predict the next AP with which a client will associate. These predictions can be further enhanced by incorporating networking and physical
topological data as well as temporal information, such as time, day of the
week, and visit duration. Time-varying Poisson processes can model client
arrivals at APs well. Predicting client arrivals at APs can improve the buffering, caching, load balancing, and prefetching at APs in order to mask the
end-to-end delay, particularly in the case of regular clients. APs may not only
predict client arrivals but also traffic demand. Based on these predictions,
neighboring APs can advise newly arrived clients to avoid hotspots, suggest
alternative APs, and better balance their load and channel utilization.
Highlighting the ability of empirically-based models to capture the characteristics of the user workload and providing a flexible framework for using them
in performance analysis studies was another contribution of this research.
Specifically, a multi-level modeling of the wireless demand in ieee802.11
campus-wide infrastructures was presented. A methodology for the statistical modeling of wireless network traffic demand was proposed relying on
robust statistical methods to study large-scale phenomena. Furthermore, we
contributed intuitive system-wide and AP-level models of traffic demand that
capture the network-independent characteristics of the traffic workload. The
parameters and the proposed statistical models appear in Table 6.1.
Parameter
AP visit duration
Session arrival
Client arrival
Flow inter-arrival/session
AP of first association/session
Flow number/session
Flow size
Session duration
Transitions between APs
Model
BiPareto
Time-varying Poisson
Time-varying Poisson
Lognormal
Lognormal
BiPareto
BiPareto
BiPareto
Markov-chain
Table 6.1. Proposed models for wireless access and traffic demand.
The session- and flow-related models are well-behaved, robust, and reusable.
We validated these models using different spatial scales (e.g., AP-level, networkwide, groups of APs located at the same building) and different periods and
found that the same distributions apply for modeling at finer spatial scales.
At each spatio-temporal scale, the models for sessions and flows remain the
same with only their parameter values differing.
By selecting the appropriate spatio-temporal granularities of the models,
the right balance between reusability and accuracy can be addressed. For ex-
168
6 Conclusions and future work
ample, when hourly periods and AP-scale are used, the models maintain sufficient spatial detail at the cost of a lower scalability and amenability. When
a network-wide scale is used, we gain simplicity at the cost of a higher loss
of detail. The evaluation of the models was performed using statistics- and
systems-based metrics. When the statistics-based metrics showed a deviation
of the models from the empirical data, the systems-based metrics were used
to evaluate the impact of this difference on the performance of that system.
The systems-based evaluation focused on the performance of a hotspot AP,
employing various metrics, such as the hourly aggregate throughput, per-flow
delay and throughput, and goodput. We generated synthetic traces based on
various models and spatio-temporal scales. Emulation- and simulation-based
scenarios were performed using synthetic and original traces—generated from
a real-life wireless infrastructure—as input for the user workload. The proposed models exhibit a performance which is very close to the one obtained
when the original traces are used (“ground-truth” of the AP performance).
On the other hand, naive models result in a performance that deviates substantially from the one reported when the original traces are used.
6.2 Directions for future research
Pervasive computing spaces involve autonomous networked heterogeneous systems operating with minimum human intervention. They should be capable
of detecting impending violations of the service requirements, reconfiguring
themselves, and isolating the failed or malicious components. To do this, it
is necessary to provide dynamic adaptation mechanisms that perform the following tasks:
•
•
•
monitor the environment
relate low-level information about resource availability and network conditions to higher-level functional or performance specifications
select the appropriate network interface, channel, AP, power transmission,
and bitrate
Wireless networks exhibit vulnerabilities that can be classified into the following three main types:
•
•
•
connectivity
performance
security
Connectivity problems reflect the lack of sufficient wireless coverage; an enduser may observe degraded performance—such as a low throughput or a high
latency—due to various reasons related to the wired or wireless parts of the
network, congestion in several networking components, or slow servers.
Security problems involve the presence of rogue APs and malicious clients.
In mobile wireless networks, it is easier to disseminate worms, viruses, and
6.2 Directions for future research
169
false information or eavesdrop, deploy rogue or malicious software or hardware, attack, or behave in a selfish or malicious manner. Attacks may occur at
different layers, aiming to exhaust the resources, while instances of selfish behavior include promising falsely to relay packets or not responding to requests
for service. Given the vulnerabilities of wireless networks, security provision
needs to become a research target in its own right instead of being simply an
add-on component, investigated in isolation to quality of service.
Our ultimate technological goal is to develop intelligent and robust wireless
networks, which can be defined as networks of devices that adapt in a selforganizing and autonomous manner based on their resources to enhance their
quality of service. Examples of important issues that need to be addressed
are the following: efficient monitoring of networks, identifying the appropriate parameters to be measured that reflect accurately the network conditions,
understanding the impact of these conditions on the performance of an application, and facilitating various mechanisms that enable wireless devices to
select the appropriate network interface or channel.
6.2.1 Increasing capacity
To increase the network capacity, improvements in all protocol layers have
been proposed. At the physical layer, advanced radio technologies, such as
reconfigurable and frequency-agile radios, multi-channel and multi-radio systems, and directional and smart antennas have been proposed to increase
capacity and mitigate impairments caused due to fading and co-channel interference. Multipath fading can be caused by phase cancellation between
different propagation paths, reducing signal power against noise. These mechanisms need to be integrated with mac and routing protocols.
Efficient spectrum utilization is an issue of primary importance. Studies
have shown that there are frequency bands in the spectrum that are largely
unoccupied most of the time while others are heavily used. Cognitive radios
have been proposed to enable a device to access a spectrum band that is unoccupied by others at that location and time [265]. Cognitive radio is defined
as an intelligent wireless communication system that is aware of the environment and adapts to changes, aiming to achieve both reliable communication
whenever needed and efficient utilization of the radio spectrum [175, 265]. The
commercialization of such technologies has not yet been fully realized, as most
of them are still in research and development phases and face cost, complexity,
and compatibility issues.
Other improvements target the mac layer. To achieve a higher throughput
and energy-efficient access, devices may use multiple channels instead of only
one fixed channel [60]. Depending on the number of radios and transceivers,
the following approaches can be distinguished:
•
Single-radio mac:
170
•
6 Conclusions and future work
– Multi-channel single-transceiver mac: one transceiver is available in
the network device, and therefore only one channel is active at a time
in each device.
– Multi-channel multi-transceiver mac: the network device includes multiple RF front-end chips and baseband processing modules to support
several simultaneous channels. A single mac layer controls and coordinates the access to these multiple channels.
Multi-radio mac: the network device has multiple radios, each with its
own mac and physical layer.
Researchers have proposed modifications to the ieee802.11 mac to use multiple channels. These approaches can be classified into different categories depending on the channel assignment and availability of multiple transceivers.
For example, one approach dedicates a channel to the control packets and uses
the remaining channels for data packets, whereas another approach utilizes
all channels identically. Two main trends appear when multiple transceivers
are available: the multiple-transceivers with one transceiver per channel and
the use of a common transceiver for all channels. Unlike the multi-transceiver
case, a common transceiver operates on a single channel at any given point
of time. Manufacturers, such as Engim and D-Link, have launched APs that
use multiple channels simultaneously and claim to provide high-bandwidth
wireless networks.
6.2.2 Capacity planning
Unlike device adaptation that takes place dynamically, capacity planning determines the AP placement, configuration, and administration of APs in an
off-line proactive manner. The configuration of an AP includes the determination of its transmission power, frequency, and orientation. The determination
of the transmission power is a trade-off between energy conservation and network connectivity. Reducing transmission power lowers the interference, which
in turn, reduces the number of collisions and packet retransmissions. At the
same time, it also results in a smaller number of communication links and
lower connectivity. Another issue is related to the conservative configuration
of the default carrier-sense threshold. An increase in the carrier-sense threshold of a device also results in an increase of the delay to transmit. A dynamic
configuration of this threshold that takes into consideration the interference
range of the potential receivers and the transmission power may enable a
larger number of devices in proximity to transmit, improving the per-flow
and aggregate throughput [345]. Capacity planning aims to provide sufficient
coverage and satisfy demand, considering the spatio-temporal evolution of the
demand. Typical objectives include:
•
•
•
the minimization of interference
the maximization of the coverage area and overall signal quality
the minimization of the number of APs used to provide sufficient coverage
6.2 Directions for future research
171
Capacity planning is an important research direction and has been the
focus of several research efforts (e.g., [240, 309, 213, 306, 59]). Several capacity planning systems assume predefined positions of the APs and aim
to reduce the number of APs used based on administrative criteria. Power
management—an integral component of capacity planning—aims to control
spectrum spatial reuse, connectivity, and interference. An objective of power
control could be to adjust the transmit power of devices, such that their
signal-to-interference-noise-ratio (SINR) meets a certain threshold required
for an acceptable performance (e.g., [247, 251, 250, 249, 96, 269, 154, 225]).
The non-deterministic nature of the environment due to exogenous parameters, mobility, and radio propagation characteristics impact the performance
of the network, making capacity planning challenging and further motivating
the need of dynamic network adaptation.
6.2.3 Network interface and channel selection
The problem of channel assignment has been studied in the context of cellular
networks. The spectrum is divided into a number of non-interfering disjoint
channels using different techniques, such as:
•
•
•
•
frequency division, in which the spectrum is divided into disjoint frequency
bands
time division, in which the channel usage is allocated into time slots
code division, in which different users are modulated by spreading codes
space division, in which users can access the channel at the same time and
the same frequency by exploiting the spatial separation of the individual
user. Multibeam (directional) antennas are used to separate radio signals
by pointing them along different directions
The channel or network interface selection can be static or dynamic. The decision of which channel or network interface to select can be based on various
criteria, such as the AP capacity, channel quality, application requirements,
registration cost, and admission control. In current infrastructure networks,
a common criterion for selecting an AP is based on received signal-strength
values, which indicate the quality of the wireless link of a client to an AP and
affect the client transmission rate. Although signal-strength does impact the
packet delivery probability, signal-strength measurements is not an optimal
metric for AP selection due to the asymmetry and highly time-varying characteristics of link conditions. Other criteria combine link quality and traffic
load estimations, including the number of active clients, average amount of
time an AP spends to serve its users, beacon delays, packet error rate, and
round-trip-time estimations [347, 336]. Note that both the traffic load and
link conditions can impact these parameters, so it is important to collect sufficient measurements in appropriate temporal scales and layers to obtain a
clear picture of the network conditions.
172
6 Conclusions and future work
Typical ieee802.11b devices reduce their bit-rate when repeated unsuccessful frame transmissions are detected. Furthermore, their performance is
considerably degraded in the presence of a host with a reduced bit-rate. In
general, in a wireless infrastructure, the client’s bit-rate, use of the rts-cts
mechanism, and frame size can impact its performance. Rate adaptation enables wireless devices to select the best transmission rate and dynamically
adapt their decision to the time-varying channel quality. Typical metrics for
estimating the channel quality include the signal-to-noise ratio (SNR) and the
delivery probability of probing packets.1 Various bit-rate adaptation mechanisms have been proposed in the literature (e.g., [214, 238, 186, 325, 305, 358,
172, 173, 86, 304, 222, 59]).
APs in proximity, configured in the same or overlapping channels may
interfere with each other, affecting dramatically the user performance. To
alleviate the interference, APs and clients may dynamically switch channels or
adapt their transmission power. Channel selection algorithms need to address
several issues, such as
•
•
•
fast discovery of devices across channels
fairness across active flows and participants
accurate measurements of varying channel conditions
Several studies on channel switching mechanisms have appeared recently,
e.g., [121, 330, 81, 143, 273, 246, 200, 197, 111, 235]. Rate adaptation and
channel and network interface selection face a fundamental challenge: in order to be effective they require an accurate estimation on-the-fly of channel
conditions in the presence of various dynamics caused by fading, mobility, and
hidden terminals. This involves distributed and collaborative monitoring and
analysis of the collected measurements. Their realization in an energy-efficient
manner is a non-trivial task.
6.2.4 Monitoring
Depending on the type of conditions that need to be measured, monitoring
needs to be performed at certain layers and spatio-temporal granularities.
Monitoring tools are not without flaws and several issues arise when they are
used in parallel for thousands devices of different types and manufacturers.
These issues are related to:
•
•
•
1
fine-grain data sampling
time synchronization
incomplete information
Rate adaptation, a link-layer mechanism, is left unspecified by ieee802.11 standards. The current specification mandates multiple transmission rates at the
physical layer that use different modulation and coding schemes. For example,
ieee802.11b supports four transmission rates (1-11 Mbps), ieee802.11a eight rates
(6-54 Mbps), and ieee802.11g twelve (1-54 Mbps).
6.2 Directions for future research
•
173
data consistency
Often monitoring tools are limited in their capabilities because they cannot
capture all the relevant information due to either hardware limitations, the
proprietary nature of hardware and software, or hidden terminals. Furthermore, there are many protocol features of ieee802.11, such as those related to
the rate adaptation and transmission power control, whose implementations
are vendor-specific and whose details are not publicly available.
Extensive monitoring and collection of data in fine spatio-temporal detail
can improve the accuracy of the performance estimates, but also increase the
energy consumption and detection delay, as the network interfaces need to
monitor the channel over longer time periods and then exchange this information with other devices. Four important aspects that need to be addressed
are:
•
•
•
•
identification of the dominant parameters through sensitivity analysis
studies
strategic placement of monitors at routers, APs, clients, and other devices
automation of the monitoring process to reduce human intervention in
managing the monitors and collecting data
aggregation of data collected from distributed monitors to improve the
accuracy, while maintaining a low communication and energy overhead
To provide a more complete picture of the network conditions, cross-layer
measurements—collected data spanning from the physical layer up to the application layer—are required. This further complicates the monitoring and
analysis process. To interpret their dependencies and identify the relevant explanatory variables, cross-correlation functions on this data can be employed
in an iterative approach. The wireless domain gives many opportunities for the
use of a rich set of statistical and visualization techniques, such as feature extraction, multidimensional clustering, and forecasting. Also, the identification
of the impact of various benchmarks is critical for the support of intelligent
and robust wireless networks.
Benchmarks can be derived from theoretical models or by analyzing reallife data or by combining in different temporal, spatial, and network scales.
They may also reflect different perspectives (e.g., user, client, AP, group of
APs in certain regions, entire infrastructure). In general, the availability of
benchmarks can play a dramatic role in comparative performance analysis
and validation studies through repeatability. Examples of such benchmarks
can be combinations of the following non-exhaustive general metrics:
•
•
•
•
•
application characteristics
device mobility
robustness and fault-tolerance criteria
network conditions
network topologies
174
6 Conclusions and future work
An application can be characterized based on its requirements (e.g., in terms
of throughput, delay, jitter, packet losses, resolution, and media quality), interactivity model, usage pattern, and traffic demand. Depending on the environment, the device mobility could be
•
•
•
•
group or individual
spontaneous or controlled
pedestrian or vehicular
known a priori or dynamic
Examples of robustness and fault-tolerance criteria include the number of active neighboring devices, the degree of vulnerability under the loss of valuable
links or APs, and the impact of induced failures on the performance. Network
conditions can be characterized by link quality criteria (e.g., packet losses, delays, signal-to-noise ratio), the spatio-temporal distributions of traffic demand
and application mix, and the distributions of regions of weak connectivity or
no signal. Network topologies can be described based on their connectivity
and link characteristics, distribution and density of peers, degree of clustering, co-residency time, inter-contact time, duration of disconnection from the
Internet, and interaction patterns.
6.3 Bio-inspired computing networks
Computing spaces with wirelessly-enabled devices monitoring the environment, and processing and communicating the acquired information are becoming more and more pervasive. In several situations, specialized devices
communicate with each other to aggregate their information and deliver it to
the user in the appropriate modality and format. In other situations, miniature networked devices need to collaborate and form a network that presents
intelligent and robust behavior.
Depending on the degrees of collaboration, caching and network paradigms,
wirelessly-enabled devices in pervasive computing spaces interact, sharing information and other resources. In this resource sharing, better routes, APs,
servers, and caches can be selected, based on various criteria, such as: energyefficiency, response delay, throughput, network lifetime, robustness, faulttolerance, security, scalability, and user interruption. Similarly, devices compete for, or allocate resources to optimize these criteria.
Several social systems in nature, composed of simple individuals exhibit
an intelligent collective behavior. Researchers have been drawing parallels
between biological and computer systems and applying biologically-inspired
models to achieve more efficient computing paradigms. In a particularly interesting work, Weitz et al. [352] draw several analogies among different disciplines that study the evolution of networks; mathematicians and physicists
focus on the network structure as it changes over time, while biologists investigate how selection and fitness act to optimize the performance of a biological
6.3 Bio-inspired computing networks
175
network. Examples of biological networks are the metabolic, regulatory, and
protein networks. Like biologists, computer scientists seek to apply energyefficient adaptation mechanisms to optimize networking environments.
Biologists have been studying the structure and behavior of organisms in
depth, such as the C. elegans, which is the first multicellular animal to have a
fully-sequenced genome and a major model organism used for biomedical research. Two interesting questions about biological networks are the following:
•
•
Do biological networks reflect hidden organizational and structural principles?
How do these principles contribute to the adaptation, fault-tolerance, and
energy-efficiency of the biological organisms?
Some researchers have suggested that biological networks are organized through
scale-free random evolution while others have claimed that they exhibit statistically significant patterns. Watts and Strogatz showed that metabolic and C.
elegans networks have a high degree of clustering and a short average length.
Barabasi studied the network of protein interactions in yeast and found that
the most highly-connected proteins are the most important for the survival
of a cell. Scale-free networks have been found to be resistant to random failures but vulnerable to attacks against their “key” nodes (e.g., hubs, nodes
with high degree of connectivity). Could we build more adaptive, robust and
energy-efficient pervasive computing systems by applying the analogies drawn
from these structural properties of biological networks?
Diffusion has been studied in biology and similarities can be drawn between
the propagation of pathogens, such as viruses and worms, or other type of information in computer networks, and the proliferation of pathogens in cellular
organisms [160]. Chemotaxis is the kind of taxis in which cells, bacteria, and
other single-cell or multicellular organisms direct their movements according
to certain chemicals in their environment, critical to their development and
normal function. In different spatio-temporal scales, the information dissemination in pervasive computing environments plays a similar role. Could we
improve the information dissemination in pervasive computing environments
by applying chemotaxis-inspired mechanisms?
Other examples are sensory networks in which biologists explore strategies
used by cells to function reliably under the presence of noise. Similarly, routing algorithms have been inspired by models related to ant colonies and the
notion of stigmergy, that is the indirect communication in a self-organizing
emergent system where its individuals communicate with one another through
modifications induced in their local environment. Real ants have been shown
to find intelligent solutions to problems, such as discovering shortest paths using only the pheromone trail deposited by other ants, prioritizing food sources
based on their distance and ease of access, carrying large items, and forming
bridges.
Positioning and orientation is another area for cross-disciplinary research.
Let us take as an example birds and their ability to navigate and orient
176
6 Conclusions and future work
themselves when displaced. This ability is a complex phenomenon, which may
include both endogenous programs as well as learning. Studies have shown that
birds use several mechanisms, such as landmarks, solar cues (“sun compass”),
stellar cues, and geomagnetic cues. There is also some evidence that odors and
sounds may provide additional cues. More recent research has found a neural
connection between the eye and “Cluster N”, the part of the forebrain that is
active during migrational orientation, suggesting that birds may actually be
able to see the magnetic field of the earth. The transfer of knowledge is realized
in both directions; for example, radiotelemetry has been used extensively in
ornithology and marine biology to monitor animals and their habitat.
Cross-disciplinary research is emerging to explore how computer scientists can use properties from biological systems in building efficient pervasive
computing spaces and how biologists can experiment with simulations from
large-scale computer networks to better understand their own biological networks.
6.4 New horizons in cross-disciplinary research
Computer science has offered new paradigms, technologies and tools for communication and interaction that were catalysts not only in other sciences
but also in society. On-line collaboration has been enriched with new applications and tools for storing, sharing, and experimenting with multimedia
data. Mobile peer-to-peer computing may enhance the formation of on-line
communities of mobile users and create new socio-technological paradigms.
In a recent study, the World Bank computed the time elapsed between the
invention of various technologies across last centuries and their widespread
adoption. While telephones reached 80% country coverage in 100 years, radio in 65 years, and Internet use in 22 years, mobile phones required just 16
years. It remains to be seen if the mobile peer-to-peer paradigm will trigger the
formation of new on-line communities and have a greater social penetration.
According to the Internet World Stats 2007, Internet penetration in North
America is 69.7% of population compared to 3.6% for Africa and 10.7% for
Asia. There are already several discussions, research proposals, market initiatives, and political actions on communication technologies and infrastructures
for developing regions. Wireless networking and mobile peer-to-peer computing would be two candidates for bridging the digital divide.
The mobile peer-to-peer paradigm with its distinct feature of cooperation
can be applied to facilitate the information access and sharing among devices
for the support of context-aware services. An underlying objective of these services is the recognition and characterization of the users’ context without interrupting them from their main tasks. This involves research in domains that
span from networking and systems to contextual information representation
and reasoning, and graphics. Thus mobile peer-to-peer computing, combined
6.4 New horizons in cross-disciplinary research
177
with context-aware computing, opens up exciting challenges in computer science, demanding interdisciplinary research and innovative paradigms.
The new technologies and rapid growth and distribution of data impose
new ethical and social questions encompassing issues spanning from privacy
and security to medical and legal considerations. To highlight the variety of
new issues involved in mobile access, let us focus on a specific topic: mobile electronic identity. Wirelessly-enabled devices that would support such
mobile electronic identification mechanisms are vulnerable to different types
of threats, such as impersonation, eavesdropping on personal data, and dissemination of false information, viruses, and spam. These vulnerabilities and
constraints make the provisioning of privacy, confidentiality, and security a
challenging task.
Not only technological and legislative, but also environmental issues arise.
Several environmental reports call attention to the hazardous materials used
in the phones and batteries including arsenic, antimony, beryllium, cadmium,
and lead. Disposing such materials into our soil and water creates an enormous
amount of toxic garbage, demanding urgently efficient recycling programs. It is
the responsibility of our community to raise relevant questions and encourage
the investigation of those issues.
Researchers predict that electronic tags—such as rftags—will be pervasive; not only as electronic identification but also as implantable chips in
humans, raising even more questions about security, privacy, confidentiality,
legislation and ethics. Science-fiction scenarios speculate about the “seventh
sense”, the technologically-enhanced ability of humans to observe and understand the environment. This crossing of mobile computing, wireless technologies, and multi-modal interfaces (e.g., tactile and haptic displays, tagging and
sensing technologies) creates even more networking paradigms. Extended by
augmented reality and brain/user-interface technologies, this interdisciplinary
research creates new fertile realms in education, medicine, entertainment, assistive technology, psychology, law, art, ethics, and urban design. The deployment of wireless and augmented reality technologies will raise new issues and
challenges in how to create environments for maximum human development
that can guarantee everybody the best possible development under conditions
of freedom and safety. Pervasive computing spaces intertwined with urban environments should not prevent people from developing a harmonious contact
with others and with nature.
Wireless technology—and in general, computer science—is playing a dramatic role in our lives not only by assisting other sciences, but by reshaping
society and the way we think and sense.
A
Appendix
Model
PDF
Normal
p(x) =
Lognormal
p(x) =
Exponential
2
2
1
√
e−(x−µ) /(2σ ) ,
2π
p(x) =
Rayleigh
1 −x/µ
e
,
µ
2
2
x −x /(2b )
e
,
b2
Generalized Gaussian
p(x) =
Pareto
x
e
b
b
e−(|x|/α) ,
2α Γ (1/b)
p(x; k, xm ) =
b
p(x) = k (1 + c)
b−α −α−1
x
xk
m
,
k xk+1
(x + kc)
x ∈ [0, ∞)
x ∈ [0, ∞)
−b b−1 −(x/α)b
p(x) = bα
x ∈ (0, ∞)
x ∈ [0, ∞)
1
xα−1 e−x/b ,
bα Γ (α)
p(x) =
Weibull
x ∈ (−∞, ∞)
2
2
1
√
e−(ln x−µ) /(2σ ) ,
xσ 2π
p(x) =
Gamma
BiPareto
σ
, x ∈ [0, ∞)
x ∈ (−∞, ∞)
x ≥ xm
α−b−1
(bx + αkc), x > α
Table A.1. Models used in demand analysis.
B
Wireless measurement-based data repositories
Measurement-based data collected from diverse wireless networking environments, such as metropolitan areas, vehicular, houses, academic environments,
research labs, and conference sites, have been made available in various data
repositories. One of the largest collections with publicly available wireless
traces is CRAWDAD [10] which hosts traces from different wireless environments. Tables B.1, B.2, and B.3 summarize the type of wireless traces available
in CRAWDAD. Other archives include:
•
•
•
•
•
the UCSD wireless topology discovery trace [56, 260]
the MIT Roofnet [43]
the MobiLib [30]
wireless LAN traces from the ACM Sigcomm’01 [55]
vehicular network traces [99, 153, 156, 95]
Empirical studies focusing on metropolitan area-based wireless networks have
recently taken place:
•
•
•
•
•
•
in Cambridge, UK with users currying iMotes [241]
in Toronto with Bluetooth-enabled PDA users walking in the subway and
malls to test if a worm outbreak is possible in practice [335]
at MIT, with one hundred smart phones that use both short-range (such as
Bluetooth) and long-range (GSM) networks logging users’ location, communication, and device usage behavior information [133]
in Cambridge, US, with users of Roofnet, an experimental ieee802.11b/g
mesh network which provides broadband Internet access, developed at
MIT CSAIL [43]
in a grid of six nodes placed within three different houses that produced wireless measurements to characterize connectivity and udp and
tcp throughput [291, 361]
in Oulu, Finland, panOULU network provides in its coverage area wireless
broadband Internet access in libraries, schools, sports facilities, hospitals,
and the market area [40]
182
B Wireless measurement-based data repositories
The Roofnet measurements focused on the link-level of ieee802.11, finding
high-throughput routes in the face of lossy links, adaptive bit-rate selection,
and developing new protocols which take advantage of wireless communications’ unique properties [85, 87].
Empirical measurement studies were also performed in several conferences,
such as: the 2005 IETF meeting [205], 2004 ACM Sigcomm[317], and 2001
ACM Sigcomm[55], in which snmp, and syslog traces were acquired from
the deployed ieee802.11 APs.
Traces from large-scale academic deployments of ieee802.11 APs include
UNC [51], Dartmouth [10], USC [30], and smaller-scale ieee802.11 APs networks in research labs or institutes, such as FORTH [51], and IBM [75, 76].
Sensor-based testbeds include the one at Columbia University using TinyOS
on Mica2 motes for testing a mac protocol [135, 136].
Vehicular-based networking environments have also been explored [153,
95]. For example, [153] includes traces from a short-range communications
between vehicle and roadside traffic and [95], from an ieee802.11-enabled bus
in a campus and the surrounding county in UMASS. Traces from a cdma
1x EV-DO network are also available [152]. Finally, a very large collection of
data tailored to measuring Internet traffic and performance can be found at
the CAIDA site [11].
Table B.1. IEEE802.11 Infrastructure Network
Scale
Type
Time period
Time granularity
# Devices # APs
syslog,
29/9/2004-30/11/2004
UNC
9881
574
snmp,
29/9/2004-26/6/2005
snmp:300 s
tcp & udp headers, 13/4/2005-20/4/2005
http requests
FORTH
1
12
signal strength
12/12/2007
snmp:300 s
signal strength:150 s
FORTH &
1
8
signal strength
15/11/2007-16/11/2007
snmp:300 s
Crete Aquarium
signal strength:150 s
Dartmouth
2500
566
syslog,
11/4/2001-4/10/2005
snmp:300 s
tcp & udp headers
ACM Sigcomm 2001
195
4
tcp & udp headers 27/8/2001-31/8/2001
snmp:60 s
IBM/Watson
1366
177
–
20/7/2002-18/8/2002
snmp:300 s
Stanford/Gates
74
12
syslog,
20/9/1999-12/12/1999
snmp:120 s
tcp & udp headers
Traces
B Wireless measurement-based data repositories
183
B Wireless measurement-based data repositories
184
Traces
Dartmouth/outdoor
Rutgers/noise
UCSB/meshnet
15/10/2005
1/4/2006-7/4/2006
30 s
10 s
predefined
predefined
60 s
predefined
stationary
stationary
stationary netperf
N/A
29/6/2005-3/7/2006
vehicular
N/A
iperf
gps
vehicular
vehicular
1s
gps
gps
28/9/2006-29/11/2006
1s
1s
22/1/2001-16/1/2001
24/1/2005-23/4/2005
Area
outdoors
indoors
(Athletic field)
indoors
(ORBIT wireless testbed)
(UCSB Meshnet)
(Microsoft campus)
outdoors
apartment complex)
(Parking lot,
outdoors
(Highway)
outdoors
(Houses)
indoors
wireless technology (Controlled experiments, mostly on througput & connectivity)
Time period Sampling User position Mobility Tool
Area
N/A
mac frames,
signal strength
N/A expected transmition time
Table B.2. IEEE802.11 Ad-hoc Network (Controlled experiments, mostly on routing)
Scale
Type
Time period Sampling User position Mobility
# Devices # APs
33
N/A
udp,ip&mac headers,
17/10/2003
6 or 10 s
gps
vehicular
64
20
Type
IEEE802.11
mac frames,
gps
application level,
mac frames,
gps
tcp &udp
troughput data
gps,
bytes received,
signal strength
Table: B.3. Ad-hoc Network Type,
Traces
Scale
# Devices # APs
6
N/A
Intel
6
N/A
1
11
2
Synusb/Mobisteer
2
Gatech
Microsoft
References
1. (2007) Wi-Fi chipsets shipments reach 200 million in 2006-report.
http://www.telecomseurope.net/article.php?id article=3549/.
2. Ajax. http://en.wikipedia.org/wiki/AJAX.
3. Ajax: A new approach to web applications.
http://www.adaptivepath.com/publications/essays/archives/000385.php.
4. AMBIENTE, Division at Fraunhofer IPSI, Darmstadt, Germany.
http://www.ipsi.fhg.de/ambiente.
5. America’s Most Connected Campuses.
http://forbes.com/home/lists/2004/10/20/04conncampland.html.
6. Aura at Carnegie Mellon. http://www.cs.cmu.edu/˜aura/.
7. Barnsley telehealth service monitors heart failure patients in the home.
http://www.mtbeurope.info/news/2006/603025.htm.
8. The bat ultrasonic location system.
http://www.cl.cam.ac.uk/research/dtg/attarchive/bat/.
9. Bluetooth biosensing wristwatch monitors heart rate, activity and emotions.
http://www.mtbeurope.info/news/2006/607027.htm.
10. CRAWDAD a community resource for archiving wireless data at Dartmouth.
http://crawdad.cs.dartmouth.edu/.
11. Data collection at CAIDA. http://www.caida.org/data/.
12. Delay tolerant networking research group. http://www.dtnrg.org/.
13. Ekahau v.3.1. (http://www.ekahau.com).
14. eTForecasts report on worldwide PDA markets.
http://www.etforecasts.com/pr/pr0603.htm.
15. Ethical Implications of Emerging Technologies: A Survey. Prepared by Mary
Rundle and Chris Conley. United Nations Educational, Scientific and Cultural
Organization. http://unesdoc.unesco.org/images/0014/001499/149992E.pdf.
16. Free real-time traffic maps, alerts, and Jam Factor reports for the routes you
drive. http://www.traffic.com/.
17. Free riding on gnutella. http://www.firstmonday.org/issues/issue5 10/adar/.
18. Fuego-Helsinki Institute for Information Technology Future Mobile and Ubiquitous Computing Research Program. http://www.hiit.fi/fuego.
19. Future Computing Environment Group.
http://www.cc.gatech.edu/fce/index.html.
20. Future computing environments. http://www.cc.gatech.edu/fce/smartfloor/.
186
References
21. GE to distribute MP4 remote foetal and maternal monitoring system for PDA.
http://www.mtbeurope.info/news/2006/602010.htm.
22. Group of User Interface Research. http://guir.berkeley.edu.
23. IBM Pervasive Computing.
http://www-306.ibm.com/software/info1/websphere/index.jsp?tab=products/mobilespeech.
24. LimeWire homepage. http://www.limewire.org.
25. Ling liu. mobile web and location-based services: opportunities and challenges.
http://www.cc.gatech.edu/˜lingliu/keynotes/.
26. Ling liu. security and trust in peer-to-peer systems: Risks and countermeasures.
http://www.cc.gatech.edu/˜lingliu/keynotes/.
27. Location-based services. http://www.lbsinsight.com.
28. Medical Technology Business Europe: Patient Monitoring.
http://www.mtbeurope.info/patientmonitoring/index.htm.
29. Microsoft EasyLiving. http://research.microsoft.com/easyliving/.
30. Mobilib: Community-wide library of mobility and wireless networks measurements. http://nile.usc.edu/MobiLib/.
31. Movies on the Move - First GPS-enabled Movie Guide in the U.S.
http://www.lbsinsight.com/?id=658.
32. NAVTEQ, digital map data. http://www.navteq.com/.
33. Null hypothesis, from wikipedia. (http://en.wikipedia.org/wiki/Null hypothesis).
34. NYC wireless. http://www.nycwireless.net.
35. Onstar. http://www.onstar.com.
36. OpenFT is a file sharing protocol developed by the giFT project.
http://en.wikipedia.org/wiki/OpenFT.
37. Peer-to-peer: Harnessing the power of disruptive technologies.
http://www.freehaven.net/doc/oreilly/accountability-ch16.html.
38. Portolano: An expedition into invisible computing.
http://portolano.cs.washington.edu/.
39. Precise Real-time Location. http://www.ubisense.net/.
40. Public access network oulu. http://www.panoulu.net/.
41. RANDOM GRAPHS and COMPLEX NETWORKS (class offerred by David
Aldous). http://www.stat.berkeley.edu/users/aldous/Networks/.
42. Research Programme on Proactive Computing.
http://www.aka.fi/index.asp?id=d16162a4696042d2821b6291f0732a8e.
43. Roofnet is an experimental 802.11b/g mesh network in development at MIT.
http://pdos.csail.mit.edu/roofnet/doku.php.
44. RSA Laboratories. http://www.rsa.com/rsalabs/node.asp?id=2002.
45. Self-organizing neighborhood wireless mesh networks.
http://research.microsoft.com/mesh/.
46. Silicon architectures for wireless systems by Prof. Rabaey.
http://bwrc.eecs.berkeley.edu/People/Faculty/jan/presentations/Lausanne/lecture4.pdf.
47. Smart Spaces at NIST. http://www.nist.gov/smartspace/.
48. Sony Computer Science Labs. http://www.csl.sony.co.jp/.
49. The Freenet project. http://freenetproject.org/.
50. Ubiquitous Computing at TECO. http://ubicomp.teco.edu/index2.html.
51. UNC/FORTH archive of wireless traces, models, and tools.
http://www.ics.forth.gr/mobile/.
References
187
52. Verizon Wireless. http://www.verizonwireless.com.
53. Wavemarket location intelligence. http://www.wavemarket.com/.
54. Wireless and mobility extensions to ns-2.
http://www.monarch.cs.cmu.edu/cmu-ns.html.
55. Wireless LAN traces from ACM SIGCOMM’01.
http://sysnet.ucsd.edu/pawn/sigcomm-trace/.
56. Wireless topology discovery at ucsd. http://sysnet.ucsd.edu/wtd/.
57. Daniel Aguayo, John Bicket, Sanjit Biswas, Glenn Judd, and Robert Morris.
Link-level measurements from an 802.11b Mesh network. In ACM Symposium
on Communications Architectures and Protocols (SigComm), Portland, OR,
USA, August 2004.
58. Aditya Akella, Glenn Judd, Srinivasan Seshan, and Peter Steenkiste. Selfmanagement in chaotic wireless deployments. In ACM International Conference on Mobile Computing and Networking (MobiCom), pages 185–199,
Cologne, Germany, August 2005.
59. Aditya Akella, Glenn Judd, Srinivasan Seshan, and Peter Steenkiste. Self management in chaotic wireless deployments. In ACM International Conference
on Mobile Computing and Networking (MobiCom), Cologne, Germany, August
2005.
60. Ian Akyildiz and Xudong Wang. A survey on wireless mesh networks. IEEE
Radio Communications, 43(9):S23–S30, September 2005.
61. Reka Albert and Albert-Laszlo Barabasi. Statistical mechanics of complex
networks. arXiv report cond-mat/0106096, June 2001.
62. F. Anjum, M. Elaoud, D. Famolari, A. Ghosh, R. Vaidyanathan, A. Dutta,
P. Agrawa, T. Kodama, and Y. Katsube. Voice performance in WLAN
networks-an experimental study. In IEEE Conference on Global Communications (GLOBECOM), San Francisco, December 2003.
63. I. Antoniou, V. V. Ivanov, Valery V. Ivanov, and P. V. Zrelov. Principal
Component Analysis of Network Traffic Measurements: the “Caterpillar”-SSA
approach. In Int. Workshop on Advanced Computing and Analysis Techniques
in Physics Research, ACAT’2002, Moscow, Russia, June 2002.
64. Somil Asthana and Dimitris Kalofonos. The problem of bluetooth pollution
and accelerating connectivity in bluetooth ad-hoc networks. In Proceedings of
IEEE International Conference on Pervasive Computing and Communications
(Percom), New York, NY, USA, March 2005.
65. R. Atkinson. IP encapsulating security payload. RFC 1827, August 1995.
66. R. Atkinson. Security architecture for the Internet protocol. RFC 1825, August
1995.
67. A. Auvinen, M. Vapa, M. Weber, N. Kotilainen, and J. Vuori. Chedar: Peerto-peer middleware. In Proceedings of the 19th IEEE International Parallel &
Distributed Processing Symposium (IPDPS 2006), Rhodes Island, Greece, July
2006.
68. François Baccelli, Sridhar Machiraju, Darryl Veitch, and Jean Bolot. The role
of PASTA in network measurement. In ACM Symposium on Communications
Architectures and Protocols (SigComm), pages 231–242, Pisa, Italy, September
2006.
69. Adam Back. Hashcash - a denial of service counter-measure. Technical report,
http://www.cypherspace.org/˜ adam/hashcash/, August 2002.
188
References
70. P. Bahl, R. Chandra, and J. Dunagan. Ssch: Slotted seeded channel hopping
for capacity improvement in IEEE802.11 ad-hoc wireless networks. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
Philadelphia, PA, USA, September 2004.
71. Paramvir Bahl and Venkata Padmanabhan. Radar: An in-building RF-based
user location and tracking system. In IEEE Conference on Computer Communications (InfoCom), Tel Aviv, Israel, March 2000.
72. Paramvir Bahl, Venkata N. Padmanabhan, and Anand Balachandran. Enhancements to the radar user location and tracking system. Technical report,
Microsoft Research, Redmond, WA, February 2000.
73. Norman T. Bailey. The mathematical theory of infectious diseases and its
applications. Hafner, 1975.
74. Anand Balachandran, Geoffrey Voelker, Paramvir Bahl, and Venkat Rangan.
Characterizing user behavior and network performance in a public wireless lan.
In ACM Sigmetrics Conference on Measurement and Modeling of Computer
Systems, California, CA, USA, June 2002.
75. Magdalena Balazinska and Paul Castro. Characterizing mobility and network
usage in a corporate wireless local-area network. In First International Conference on Mobile Systems, Applications, and Services (MobiSys), San Francisco,
USA, May 2003.
76. Magdalena Balazinska and Paul Castro. CRAWDAD data set ibm/watson (v.
2003-02-19).
Downloaded from http://crawdad.cs.dartmouth.edu/ibm/watson, February
2003.
77. Udana Bandara, Mikio Hasegawa, Masugi Inoue, Hiroyuki Morikawa, and
Tomonori Aoyama. Design and implementation of a bluetooth signal strength
based location sensing system. In IEEE Radio and Wireless Conference (RAWCON), Atlanta, GA, USA, September 2004.
78. Nikhil Bansal and Zhen Liu. Capacity, delay and mobility in wireless ad-hoc
networks. In IEEE Conference on Computer Communications (InfoCom), San
Francisco, California, September 2003.
79. Daniel Barbara and Tomasz Imielinski. Sleepers and workaholics: Caching
strategies in mobile environments. In ACM SIGMOD International Conference
on Management of Data, pages 1–12, Minneapolis, Minnesota, June 1994.
80. Paul Barford and Mark E. Crovella. Generating representative Web workloads
for network and server performance evaluation. In ACM Sigmetrics Conference
on Measurement and Modeling of Computer Systems, pages 151–160, Madison,
Wisconsin, June 1998.
81. Mohammed Benaissa, Vincent Lecuire, F. Leage, and A. Shaff. Analysing endto-end packet delay and loss in mobile ad hoc networks for interactive audio
applications. In Workshop on Mobile Ad Hoc Networking and Computing,
pages 27–33, Sophia-Antipolis, France, March 2003.
82. Christian Bettstetter, Giovanni Resta, and Paolo Santi. The node distribution
of the random waypoint mobility model for wireless ad hoc networks. IEEE
Transactions on Mobile Computing, 2(3):257–269, July 2003.
83. Amiya Bhattacharya and Sajal K. Das. LeZi-update: an information-theoretic
approach to track mobile users in PCS networks. In Proceedings of the Annual
ACM/IEEE International Conference on Mobile Computing and Networking,
pages 1–12, Seattle, Washington, USA, August 1999.
References
189
84. Giuseppe Bianchi, Antonio Di Stefano, Costantino Giaconia, Luca Scalia, Giovanni Terrazzino, and Ilenia Tinnirello. Experimental assessment of the backoff
behavior of commercial IEEE802.11b network cards. In IEEE Conference on
Computer Communications (InfoCom), pages 1181–1189, Anchorage, Alaska,
USA, May 2007.
85. John Bicket, Daniel Aguayo, Sanjit Biswas, and Robert Morris. Architecture
and evaluation of an unplanned 802.11b mesh network. In ACM International
Conference on Mobile Computing and Networking (MobiCom), Cologne, Germany, August 2005.
86. John C. Bicket. Bit-rate selection in wireless networks. Master’s thesis, Massachusetts Institute of Technology, February 2005.
87. Sanjit Biswas and Robert Morris. Opportunistic routing in multi-hop wireless
networks. In ACM Symposium on Communications Architectures and Protocols
(SigComm), Philadelphia, PA, August 2005.
88. Matt Blaze, John Ioannidis, and Angelos D. Keromytis. Offline micropayments
without trusted hardware. In Proceedings of Financial Cryptography, Cayman
Islands, British West Indies, February 2001.
89. Rajendra V. Boppana and Satyadeva P. Konduru. An adaptive distance vector routing algorithm for mobile, ad hoc networks. In IEEE Conference on
Computer Communications (InfoCom), Anchorage, Alaska, April 2001.
90. Sem C. Borst and Nidhi Hegde. Integration of streaming and elastic traffic in
wireless networks. In INFOCOM, pages 1884–1892, Anchorage, Alaska, USA,
May 2007.
91. Lee Breslau, Pei Cao, Li Fan, Graham Phillips, and Scott Shenker. Web caching
and Zipf-like distributions: Evidence and implications. In IEEE Conference on
Computer Communications (InfoCom), New York, NY, March 1999.
92. Josh Broch, David Maltz, David Johnson, Yih-Chun Hu, and Jorjeta Jetcheva.
A performance comparison of multi-hop wireless ad hoc network routing protocols. In ACM International Conference on Mobile Computing and Networking
(MobiCom), Dallas, Texas, October 1998.
93. Ioannis Broustis, Konstantina Papagiannaki, Srikanth V. Krishnamurthy,
Michalis Faloutsos, and Vivek Mhatre. MDG: measurement-driven guidelines
for 802.11 WLAN design. In ACM International Conference on Mobile Computing and Networking (MobiCom), pages 254–265, Montreal, Quebec, Canada,
September 2007.
94. Raffaele Bruno and Franca Delmastro. Design and analysis of a bluetoothbased indoor localization system. Technical report, Institute for Informatics
and Telematics, Pisa, Italy, 1999.
95. J. Burgess and B. N. Levine. CRAWDAD data set umass/diesel (v. 2006-0117). Downloaded from http://crawdad.cs.dartmouth.edu/umass/diesel, January 2006.
96. Martin Burkhart, Pascal von Rickenbach, Roger Wattenhofer, and Aaron
Zollinger. Does topology control reduce interference? In ACM International
Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), Roppongi Hills, Tokyo, Japan, May 2004.
97. Levente Buttyan and Jean-Pierre Hubaux. Nuglets: a virtual currency to stimulate cooperation in self-organized mobile ad-hoc networks. Technical Report
DSC/2001/001, Swiss Federal Institute of Technology, Lausanne, January 2001.
98. Levente Buttyan and Jean-Pierre Hubaux. Security and Cooperation in Wireless Networks. Cambridge University Press, November 2007.
190
References
99. Vladimir Bychkovsky, Bret Hull, Allen K. Miu, Hari Balakrishnan, and Samuel
Madden. A measurement study of vehicular internet access using in situ WiFi networks. In ACM International Conference on Mobile Computing and
Networking (MobiCom), Los Angeles, CA, September 2006.
100. Tracy Camp, Jeff Boleng, and Vanessa Davies. A survey of mobility models
for ad hoc network research. Wireless Communication & Mobile Computing
(WCMC): Special issue on Mobile Ad Hoc Networking: Research, Trends and
Applications, 2(5):483–502, September 2002.
101. Juan-Carlos Cano and Pietro Manzoni. A performance comparison of energy
consumption for mobile ad hoc network routing protocols. San Francisco,
California, USA, August 2000.
102. Jin Cao, William S. Cleveland, Dong Lin, and Don X. Sun. On the nonstationarity of Internet traffic. In ACM Sigmetrics Conference on Measurement and
Modeling of Computer Systems, pages 102–112, Cambridge, MA, USA, June
2001.
103. Yinhe Cao, Wen wen Tung, Jiafeng Gao, Vladimir A. Protopopescu, and
Lee M. Hively. Detecting dynamical changes in time series using the permutation entropy. Physical Review, E70(4), October 2004.
104. Srdjan Capkun, Maher Hamdi, and Jean-Pierre Hubaux. GPS-Free Positioning
in Mobile Ad-Hoc Networks. In Proceedings of Hawaii International Conference
On System Sciences, Hawaii, January 2001.
105. George Casella and Roger L. Berger. Statistical Inference, Second edition.
Duxbury, June 2001.
106. Paul Castro, Benjamin Greenstein, Richard Muntz, Parviz Kermani, Chatschik
Bisdikian, and Maria Papadopouli. Locating application data across service
discovery domains. In ACM International Conference on Mobile Computing
and Networking (MobiCom), pages 28–42, Rome, Italy, July 2001.
107. Ram.n C.ceres, Peter Danzig, Sugih Jamin, and Danny Mitzel. Characteristics
of wide-area tcp/ip conversations. In ACM Symposium on Communications
Architectures and Protocols (SigComm), Zurich, Switzerland, September 1991.
ACM.
108. Augustin Chaintreau, Pan Hui, Jon Crowcroft, Christophe Diot, Richard Gass,
and James Scott. Impact of human mobility on the design of opportunistic
forwarding algorithms. In IEEE Conference on Computer Communications
(InfoCom), Barcelona, Spain, April 2006.
109. Chris Chambers, Wuchang Feng, Sambit Sahu, and Debanjan Saha.
Measurement-based characterization of a collection of on-line games. In ACM
Internet Measurement Conference (Sigcomm), Philadelphia, PA, USA, August
2005.
110. Probal Chaudhuri and J. S. Marron. Sizer for exploration of structures in
curves. Journal of the American Statistical Association, 94(447):807–823,
September 1999.
111. Santashil Pal Chaudhuri, Rajnish Kumar, and Amit Kumar Saha. A MAC
protocol for multi-frequency physical layer. Technical report, Rice University,
Houston, TX, USA, January 2003.
112. Kameswari Chebrolu, Bhaskaran Raman, and Sayandeep Sen. Long-distance
802.11b links: performance measurements and experience. In ACM International Conference on Mobile Computing and Networking (MobiCom), pages
74–85, Los Angeles, California, USA, September 2006.
References
191
113. Minghua Chen and Avideh Zakhor. Flow control over wireless network and
application layer implementation. In IEEE Conference on Computer Communications (InfoCom), Barcelona, Spain, April 2006.
114. Chun cheng Chen, Eunsoo Seo, Hwangnam Kim, and Haiyun Luo. Self-learning
collision avoidance for wireless networks. In IEEE Conference on Computer
Communications (InfoCom), Barcelona, Spain, April 2006.
115. Francisco Chinchilla, Mark Lindsey, and Maria Papadopouli. Analysis of wireless information locality and association patterns in a campus. In IEEE Conference on Computer Communications (InfoCom), Hong Kong, March 2004.
116. Krishna Chintalapudi, Ramesh Govindan, Gaurav Sukhatme, and Amit Dhariwal. Ad-hoc localization using ranging and sectoring. In IEEE Conference on
Computer Communications (InfoCom), Hong Kong, March 2004.
117. Sunwoong Choi, Kihong Park, and Chong kwon Kim. On the performance
characteristics of WLANs: revisited. In ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, pages 97–108, Banff, Alberta,
Canada, June 2005.
118. Chun-Ting Chou, S. N. Shankar, and Kang G. Shin. Achieving per-stream
QoS with distributed airtime allocation and admission control in IEEE 802.11e
wireless LANs. In IEEE Conference on Computer Communications (InfoCom),
pages 1584–1595, Miami, Florida, USA, March 2005.
119. Bent Guldbjerg Christensen. Lightpeers: A lightweight mobile p2p platform. In
Proceedings of the Fifth IEEE International Conference on Pervasive Computing and Communications Workshops (PERCOMW ’07), pages 132–136, White
Plains, NY, March 2007.
120. William S. Cleveland, Dong Lin, and Don X. Sun. IP packet generation: statistical models for TCP start times based on connection-rate superposition.
In ACM Sigmetrics Conference on Measurement and Modeling of Computer
Systems, pages 166–177, Santa Clara, CA, United States, June 2000.
121. IEEE CoputerSociety LANMAN Stradards Committee. Wireless LAN Medium
Access Control (MAC) and Physical Layer (PHY) Specifications. IEEE Strandard 802.11-1999, New York, NY, USA, 1999.
122. Douglas S. J. De Couto, Daniel Aguayo, John Bicket, and Robert Morris. A
high-throughput path metric for multi-hop wireless routing. In ACM International Conference on Mobile Computing and Networking (MobiCom), San
Diego, CA, September 2003.
123. Mark Crovella and Azer Bestavros. Self-similarity in world-wide-web traffic:
Evidence and possible causes. In Proceedings of SIGMETRICS ’96, 1996.
124. Mark E. Crovella and Azer Bestavros. Self-similarity in world-wide-web traffic: Evidence and possible causes. IEEE/ACM Transactions on Networking,
5(6):835–846, December 1997.
125. Ralph B. D’Agostino and Michael A. Stephens. Goodness-of-Fit Techniques.
Marcel Dekker, 1986.
126. Samir Das, Charles Perkins, and Elizabeth Royer. Performance comparison of
two on-demand routing protocols for ad-hoc networks. In IEEE Conference on
Computer Communications (InfoCom), Tel Aviv, Israel, March 2000.
127. James Davis, Andy Fagg, and Brian Neil Levine. Wearable computers as packet
transport mechanisms in highly partitioned ad-hoc networks. In Proc. International Symposium on Wearable Computers (ISWC), Zurich, October 2001.
192
References
128. Whitfield Diffie, Paul C. van Oorschot, and Michael J. Wiener. Authentication
and authenticated key exchanges. Designs, Codes and Cryptography, 2(2):107–
125, 1992.
129. R. Draves, J. Padhye, and B. Zill. Routing in multi-radio, multi-hop wireless
mesh networks. In ACM International Conference on Mobile Computing and
Networking (MobiCom), Philadelphia, PA, USA, September 2004.
130. Richard Draves, Jitendra Padhye, and Brian Zill. Comparison of routing metrics for static multi-hop wireless networks. In ACM Symposium on Communications Architectures and Protocols (SigComm), Portland, OR, USA, August
2004.
131. Richard Durrett. Lecture notes on particle systems and percolation. Pacific
Grove, CA, 1988.
132. Bradley M. Duska, David Marwood, and Michael J. Feeley. The measured
access characteristics of world-wide-web client proxy caches. In USENIX Symposium on Internet Technologies and Systems, 1997, Monterey, CA, December
1997.
133. Nathan Eagle and Alex (Sandy) Pentland. CRAWDAD data set mit/reality (v.
2005-07-01). Downloaded from http://crawdad.cs.dartmouth.edu/mit/reality,
July 2005.
134. David Eckhardt and Peter Steenkiste. Measurement and analysis of the error
characteristics of an in building wireless network. ACM Computer Communication Review, 26(4):243–254, October 1996.
135. Shane Eisenman. CRAWDAD data set columbia/ecsma (v. 2006-11-17).
Downloaded from http://crawdad.cs.dartmouth.edu/columbia/ecsma, November 2006.
136. Shane Eisenman and Andrew Campbell. E-CSMA: Supporting enhanced
CSMA performance in experimental sensor networks using per-neighbor transmission probability thresholds. In Proceedings of the 26th IEEE International
Conference on Computer Communications (INFOCOM), Anchorage, AL, May
2007.
137. Chip Elliott. Building the Wireless Internet. IEEE Spectrum, 38(1), January
2001.
138. Mike Esler, Jeffrey Hightower, Tom Anderson, and Gaetano Borriello. Next
century challenges: data-centric networking for invisible computing – the Portolano Project at the University of Washington. In Proceedings of the Annual
ACM/IEEE International Conference on Mobile Computing and Networking,
pages 256–262, Seattle, Washington, August 1999.
139. Deborah Estrin, Ramesh Govindan, John Heidemann, and Satish Kumar. Next
century challenges: Scalable coordination in sensor networks. In ACM, editor,
Proceedings of the Annual ACM/IEEE International Conference on Mobile
Computing and Networking, pages 263–270, Seattle, Washington, August 1999.
140. Patrick T. Eugster, Rachid Guerraoui, Anne-Marie Kermarrec, and Laurent
Massoulie. Epidemic information dissemination in distributed systems. IEEE
Computer, 37(5):60–67, 2004.
141. Frederic Evennou and Francois Marx. Improving positioning capabilities for indoor environments with WiFi. In EUSIPCO 2005, Antalya, Turkey, September
2005. IST.
142. Frederic Evennou and Francois Marx. Advanced integration of WiFi and inertial navigation systems for indoor mobile positioning. In EURASIP Journal
on Applied Signal Processing, pages pp. 1–11, 2006.
References
193
143. Johannes F. Network game traffic modelling. In NetGames ’02: Proceedings of
the 1st workshop on Network and system support for games, pages 53–57, New
York, NY, USA, 2002.
144. Kevin Fall and Kannan Varadhan. ns: Notes and Documentation. Technical
report, University of California at Berkeley, LBL, USC/ISI, and Xerox PARC,
October 1998.
145. Jianqing Fan and Irene Gijbels. Local Polynomial Modelling and Its Applications. Chapman and Hall, London, 1996.
146. Lei Fang, Wenliang Du, and Peng Ning. A beacon-less location discovery
scheme for wireless sensor networks. In IEEE Conference on Computer Communications (InfoCom), pages 161–171, Miami, Florida, March 2005.
147. Laura Marie Feeney. Investigating the energy consumption of an IEEE 802.11
network interface. Technical Report SICS-T 99/11-SE, Swedish Institute of
Computer Science, December 1999.
148. Silke Feldmann, Kyandoghere Kyamakya, Ana Zapater, and Zighuo Lue. An indoor Bluetooth-based positioning system: concept, implementation and experimental evaluation. In International Conference on Wireless Networks, pages
109–113, Las Vegas, Nevada, USA, 2003.
149. S. Floyd and V. Paxson. Difficulties in simulating the Internet. IEEE/ACM
Transactions on Networking, 9(4):392–403, August 2001.
150. Sally Floyd and Van Jacobson. Random early detection gateways for congestion
avoidance. IEEE/ACM Transactions on Networking, 1(4):397–413, August
1993.
151. Charalampos Fretzagias and Maria Papadopouli. Cooperative Location Sensing for Wireless Networks. In Second IEEE International conference on Pervasive Computing and Communications, Orlando, Florida, March 2004.
152. Traces from CDMA 1x EV-DO Network.
http://networks.cnu.ac.kr/measurement/cdma-1x-evdo/.
153. Richard M. Fujimoto, Randall Guensler, Michael P. Hunter, Hao
Wu, Mahesh Palekar, Jaesup Lee, and Joonho Ko.
CRAWDAD data set gatech/vehicular (v. 2006-03-15).
Downloaded from
http://crawdad.cs.dartmouth.edu/gatech/vehicular, March 2006.
154. Yan Gao, Jennifer Hou, and Hoang Nguyen. Topology Control for Maintaining Network Connectivity and Maximizing Network Capacity under the Physical Model. In IEEE Conference on Computer Communications (InfoCom),
Phoenix, Arizona, USA, April 2008.
155. Michele Garetto, Jingpu Shi, and Edward W. Knightly. Modeling media access in embedded two-flow topologies of multi-hop wireless networks. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
pages 200–214, Cologne, Germany, August 2005.
156. Richard Gass, James Scott, and Christophe Diot.
CRAWDAD
data set cambridge/inmotion (v. 2005-10-01).
Downloaded from
http://crawdad.cs.dartmouth.edu/cambridge/inmotion, October 2005.
157. Mario Gerla and Jack Tsai. Multicluster, mobile, multimedia radio network.
Journal of Wireless Networks, 1(3):255–265, 1995.
158. Christos Gkantsidis and Pablo Rodriguez. Network coding for large-scale content distribution. In IEEE Conference on Computer Communications (InfoCom), Miami, FL, USA, March 2005.
159. Gnutella. http://gnutella.wego.com.
194
References
160. Sanjay Goel and Stephen Bush. Biological models of security for virus propagation in computer networks. Login, 29(6), December 2004.
161. Tom Goff, Nael B. Abu-Ghazaleh, Dhananjay S. Phatak, and Ridvan Kahvecioglu. Preemptive routing in ad hoc networks. In ACM International Conference on Mobile Computing and Networking (MobiCom), pages 43–52, Rome,
Italy, July 2001.
162. Nina Golyandina, Vladimir Nekrutkin, and Anatoly Zhigljavsky. Analysis of
Time Series Structure: SSA and Related Techniques. Chapman & Hall/CRC,
2001.
163. Amal Graafstra. Hands on: How radio-frequency identification and I got personal. IEEE Spectrum, March 2007.
164. Boris Grondahl. Wireless world meets to lick its wounds.
http://www.thestandard.com/article/display/0,1151,22322,00.html?nl=dnt.
165. Bjorn Gronvall, Assar Westerlund, and Stephen Pink. The design of a
multicast-based distributed file system. In Third Symposium on Operating
Systems Design and Implementation, New Orleans, LA, USA, February 1999.
166. Matthias Grossglauser and Patrick Thiran. Networks out of control: models
and methods for random networks. Technical report, School of Computer and
Communication Sciences, Ecole Polytechnique Federale de Lausanne (EPFL),
2005.
167. Matthias Grossglauser and David Tse. Mobility increases the capacity of mobile
ad-hoc wireless networks. In IEEE Conference on Computer Communications
(InfoCom), Anchorage, Alaska, April 2001.
168. Krishna P. Gummadi, Richard J. Dunn, Stefan Saroiu, Steven D. Gribble,
Henry M. Levy, and John Zahorjan. Measurement, modeling, and analysis of
peer-to-peer file sharing workload. In ACM Symposium on Operating Systems
Principles, October 2003.
169. Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning Ding, and Xiaodong
Zhang. Measurements, analysis, and modeling of BitTorrent-like systems. In
ACM Internet Measurement Conference (Sigcomm), Philadelphia, PA, USA,
August 2005.
170. Piyush Gupta and P.R. Kumar. The capacity of wireless networks. Transactions of Information Theory, 46(2):388–404, March 2000.
171. Youngjune Gwon, Ravi Jain, and Toshiro Kawahara. Robust indoor location
estimation of stationary and mobile users. In IEEE Conference on Computer
Communications (InfoCom), March 2004.
172. Ivaylo Haratcherev, Koen Langendoen, Reginald Lagendijk, and Henk Sips.
Hybrid rate control for IEEE 802.11. In Proceedings of the second international workshop on Mobility management & wireless access protocols (MobiWac), pages 10–18, New York, NY, USA, September 2004.
173. Ivaylo Haratcherev, Jacco Taal, Koen Langendoen, Reginald Lagendijk, and
Henk Sips. Fast 802.11 link adaptation for real-time video streaming by crosslayer signalling. In IEEE Symposium in Circuits and Systems (ISCAS), Kobe,
May 2005.
174. Shlomo Havlin, Menachem Dishon, James E. Kiefer, and George H. Weiss.
Trapping of random walk in two and three dimensions. Physical Review Letter,
53(5):407–410, July 1984.
175. Simon Haykin. Cognitive radio: Brain-empowered wireless communications.
IEEE Journal on Selected Areas in Communications, 23(2):201–220, February
2005.
References
195
176. Tian He, Chengdu Huang, Brian M. Blum, John A. Stankovic, and Tarek Abdelzaher. Range-free localization schemes for large-scale sensor networks. In
ACM International Conference on Mobile Computing and Networking (MobiCom), San Diego, CA, USA, September 2003.
177. Tristan Henderson, David Kotz, and Ilya Abyzov. The changing usage of
a mature campuswide wireless network. In ACM International Conference
on Mobile Computing and Networking (MobiCom), Philadelphia, PA, USA,
September 2004.
178. Félix Hernández-Campos. Generation and Validation of Empirically-Derived
TCP Application Workloads. PhD thesis, University of North Carolina at
Chapel Hill, 2006.
179. Félix Hernández-Campos, Merkourios Karaliopoulos, Maria Papadopouli, and
Haipeng Shen. Spatio-temporal modeling of traffic workload in a campus
WLAN. In Second annual international Wireless Internet Conference, Boston,
USA, August 2006.
180. Félix Hernández-Campos and Maria Papadopouli. Assessing The Real Impact
of 802.11 WLANs: A Large-Scale Comparison of Wired and Wireless Traffic.
In 14th IEEE Workshop on Local and Metropolitan Area Networks, Chania,
Greece, September 2005.
181. Félix Hernández-Campos and Maria Papadopouli. A comparative measurement study of the workload of wireless access points in campus networks. In
16th Annual IEEE International Symposium on Personal Indoor and Mobile
Radio Communications, Berlin, Germany, September 2005.
182. Félix Hernández-Campos and Maria Papadopouli. A comparative measurement study of the workload of wireless access points in campus networks.
Technical Report 353, ICS-FORTH, Heraklion, Greece, March 2005.
183. Jeffrey Hightower and Gaetano Borriello. A Survey and Taxonomy of Location
Sensing Systems for Ubiquitous Computing. Technical Report, University of
Washington, Department of Computer Science and Engineering UW CSE 0108-03, Seattle, WA, August 2001.
184. Jeffrey Hightower and Gaetano Borriello. Particle Filters for Location Estimation in Ubiquitous Computing: A Case Study. In Proceedings of the Sixth
International Conference on Ubiquitous Computing (Ubicomp), Nottingham,
England, September 2004.
185. Jeffrey Hightower, Roy Want, and Gaetano Borriello. SpotON: An indoor 3D
location sensing technology based on RF signal strength. UW CSE tech report
2000-02-02, University of Washington, Seattle, WA, February 2000.
186. Gavin Holland, Nitin Vaidya, and Paramvir Bahl. A rate-adaptive MAC protocol for multi-hop wireless networks. In ACM International Conference on
Mobile Computing and Networking (MobiCom), pages 236–251, Rome, Italy,
July 2001.
187. T. Horozov, A. Grama, V. Vasudevan, and S. Landis. Moby — a mobile peerto-peer service and data network. In Proceedings of International Conference
on Parallel Processing, pages 437–444, Washington, DC, USA, August 2002.
188. A. Howard, S. Siddiqi, and G. Sukhatme. An experimental study of localization using wireless ethernet. In International Conference on Field and Service
Robotics, Yamanaka, Japan, July 2003.
189. Wei-Jen Hsu, Thrasyvoulos Spyropoulos, Konstantinos Psounis, and Ahmed
Helmy. Modeling time-variant user mobility in wireless mobile networks. Anchorage, Alaska, USA, May 2007.
196
References
190. Chunyu Hu and Jennifer C. Hou. A novel approach to contention control in
IEEE 802.11e-operated WLANs. In IEEE Conference on Computer Communications (InfoCom), pages 1190–1198, Anchorage, Alaska, USA, May 2007.
191. Lingxuan Hu and David Evans. Localization for mobile sensor networks. In
ACM International Conference on Mobile Computing and Networking (MobiCom), 2004.
192. Yih-Chun Hu and David B. Johnson. Caching strategies in on-demand routing
protocols for wireless ad hoc networks. In ACM International Conference on
Mobile Computing and Networking (MobiCom), pages 231–242, Boston, MA,
USA, August 2000.
193. Yih-Chun Hu and David B. Johnson. Caching strategies in on-demand routing
protocols for wireless ad hoc networks. In ACM International Conference on
Mobile Computing and Networking (MobiCom), pages 231–242, Boston, MA,
USA, August 2000.
194. Yih-Chun Hu and David B. Johnson. Implicit source routes for on-demand
ad-hoc network routing. In ACM International Symposium on Mobile Ad Hoc
Networking and Computing (MobiHoc), pages 1–10, New York, NY, USA, October 2001.
195. Jean-Pierre Hubaux, Levente Butyan, and Srdan Capkun. The quest for security in mobile ad-hoc networks. In ACM International Symposium on Mobile
Ad Hoc Networking and Computing (MobiHoc), pages 146–155, Long Beach,
CA, October 2001.
196. Barry D. Hughes. Random Walks and Random Environments. Oxford Science
Publications, 1995.
197. Wing-Chung Hung, K.L. Eddie Law, and A. Leon-Garcia. A dynamic multichannel MAC for ad-hoc LAN. In Proceedings of the 21st Biennial Symposium
on Communications, Kingston, Canada, June 2002.
198. Tomasz Imielinski, S. Viswanathan, and B. R. Badrinath. Energy efficient
indexing on air. In ACM SIGMOD International conference on Management
of Data, Minneapolis, MN, USA, May 1994.
199. Mikel Izal, Guillaume Urvoy-Keller, Ernst Biersack, Pascal Felber, Anwar
Hamra, and Luis Garces-Erice. Dissecting BitTorrent: Five months in a torrent’s lifetime. In 5th Passive and Active Measurement Workshop, Antibes
Juan-les-Pines, France, April 2004.
200. Nitin Jain and Samir Das. A multichannel CSMA MAC protocol with receiverbased channel selection for multihop wireless networks. In Proceedings of the
9th Int. Conf. On Computer Communications and Networks (IC3N), Phoenix,
USA, October 2001.
201. Ravi Jain, Dan Lelescu, and Mahadevan Balakrishnan. Model T: an empirical model for user registration patterns in a campus wireless LAN. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
Cologne, Germany, August 2005.
202. Sushant Jain, Kevin Fall, and Rabin Patra. Routing in a delay-tolerant network. In ACM Symposium on Communications Architectures and Protocols
(SigComm), Portland, OR, USA, August 2004.
203. Amit Jardosh, Elizabeth M. BeldingRoyer, Kevin C. Almeroth, and Subhash
Suri. Towards realistic mobility models for mobile ad-hoc networks. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
San Diego, CA, September 2003.
References
197
204. Amit Jardosh, Krishna Ramachandran, Kevin Almeroth, and Elizabeth
MBelding-Royer. Understanding congestion in IEEE 802.11b wireless networks. In Proceedings of the Internet Measurement Conference, Berkeley, CA,
USA, October 2005.
205. Amit Jardosh, Krishna N. Ramachandran, Kevin C. Almeroth, and Elizabeth
Belding. CRAWDAD data set ucsb/ietf2005 (v. 2005-10-19). Downloaded from
http://crawdad.cs.dartmouth.edu/ucsb/ietf2005, October 2005.
206. Amit P. Jardosh, Kimaya Mittal, Krishna N. Ramachandran, Elizabeth M.
Belding, and Kevin C. Almeroth. IQU: practical queue-based user association
management for WLANs. In ACM International Conference on Mobile Computing and Networking (MobiCom), pages 158–169, Los Angeles, California,
USA, September 2006.
207. Per Johansson, Tony Larsson, Nicklas Hedman, Bartosz Mielczarek, and Mikael
Degermark. Scenario-based performance analysis of routing protocols for mobile ad-hoc networks. In ACM International Conference on Mobile Computing
and Networking (MobiCom), pages 195–206, Seattle, Washington, USA, August 1999.
208. Ian T. Jolliffe. Principal Component Analysis. Springer-Verlag, New York,
1986.
209. Julien Jomier. Note sharing application, 2003.
210. Evan P. C. Jones, Lily Li, and Paul A. S. Ward. Practical routing in delaytolerant networks. In ACM Symposium on Communications Architectures and
Protocols (SigComm), pages 237–243, Philadelphia, Pennsylvania, USA, August 2005.
211. Youngmi Joo, Vinay Ribeiro, Anja Feldmann, Anna C. Gilbert, and Walter
Willinger. Tcp/ip traffic dynamics and network performance: a lesson in workload modeling, flow control, and trace-driven simulations. ACM Computer
Communication Review, (2), 2001.
212. Julie Letchner and Dieter Fox and Anthony LaMarca. Large-Scale Localization from Wireless Signal Strength. In Proceedings of the Twentieth National
Conference on Artificial Intelligence (AAAI), Pittsburgh, Pennsylvania, USA,
July 2005.
213. M. Kamenetsky and M. Unbehaun. Coverage Planning for Outdoor Wireless
LAN Systems. In IEEE International Zurich Seminar on Broadband Communications, Zurich, Switzerland, February 2002.
214. Ad Kamerman and Leo Monteban. Wavelan(c)-ii: a high-performance wireless
lan for the unlicensed band. Bell Labs Technical Journal, 2(3):118–133, 1997.
215. Thomas Karagiannis, Andre Broido, Michalis Faloutsos, and Kc Claffy. Transport layer identification of p2p traffic. In ACM Sigcomm Internet Measurement
Conference, San Diego, CA, USA, October 2004.
216. Thomas Karagiannis, Konstantina Papagiannaki, and Michalis Faloutsos.
BLINC: Multi-level Traffic Classification in the Dark. In ACM Symposium
on Communications Architectures and Protocols (SigComm), Philadelpha, PA,
USA, August 2005.
217. Merkouris Karaliopoulos, Maria Papadopouli, Elias Raftopoulos, and Haipeng
Shen. On scalable measurement-driven modelling of traffic demand in large
WLANs. In IEEE Workshop on Local and Metropolitan Area Networks, Princeton NJ, USA, June 2007.
198
References
218. Brad Karp and H. T. Kung. GPSR: greedy perimeter stateless routing for
wireless networks. In ACM International Conference on Mobile Computing
and Networking (MobiCom), pages 243–254, Boston, MA, USA, 2000.
219. Anand Kashyap, Samrat Ganguly, and Samir R. Das. A measurement-based
approach to modeling link capacity in 802.11-based wireless networks. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
pages 242–253, Montreal, Quebec, Canada, September 2007.
220. Anastasia Katranidou and Maria Papadopouli. Location-sensing Using the
IEEE 802.11 Infrastructure and the Peer-to-Peer Paradigm for Mobile Computing Applications. Master’s thesis, University of Crete, Heraklion, Greece,
February 2006.
221. Harry Kesten and Vladas Sidoravicius. A shape theorem for the spread of an
infection. Preprint: math.PR/0312511 at arXiv.org. Math.
222. Jongseok Kim, Seongkwan Kim, Sunghyun Choi, and Daji Qiao. CARA:
Collision-aware rate adaptation for IEEE802.11 WLANs. In IEEE Conference
on Computer Communications (InfoCom), Barcelona, Spain, April 2006.
223. Minkyong Kim and David Kotz. Modeling users’ mobility among WiFi access points. In Proceedings of the International Workshop on Wireless Traffic
Measurements and Modeling, Seattle, WA, June 2005. USENIX Association.
224. Minkyong Kim and David Kotz. Periodic properties of user mobility and AP
popularity. Journal of Personal and Ubiquitous Computing, 11(6), August
2007. Special Issue of papers from LoCA 2005.
225. Tae-Suk Kim, Hyuk Lim, and Jennifer Hou. Improving spatial reuse through
tuning transmit power, carrier sense threshold, and data rate in multihop wireless networks. In ACM International Conference on Mobile Computing and
Networking (MobiCom), Los Angeles, California, USA, September 2006.
226. James J. Kistler and M. Satyanarayanan. Disconnected operation in the Coda
file system. Proc. ACM Symposium on Operating Systems Principles, 8(2):213–
225, October 1991.
227. Jon Kleinberg. The wireless epidemic. Nature (News and Views), 449:287–288,
2007.
228. Young-Bae Ko and Nitin H. Vaidya. Location-aided routing (LAR) in mobile
ad hoc networks. In ACM International Conference on Mobile Computing and
Networking (MobiCom), Dallas, Texas, October 1998.
229. Can Emre Koksal, Kyle Jamieson, Emre Telatar, and Patrick Thiran. Impacts
of channel variability on link-level throughput in wireless networks. In ACM
Sigmetrics Conference on Measurement and Modeling of Computer Systems,
pages 51–62, Saint Malo, France, June 2006.
230. Gerd Kortuem. Proem: a middleware platform for mobile peer-to-peer computing. SIGMOBILE Mobile Computing and Communications Review, 6(4):62–64,
October 2002.
231. Niko Kotilainen and Maria Papadopouli. You’ve got photos! the design and
evaluation of a location-based media-sharing application. In 4th International
Mobile Multimedia Communications Conference (Mobimedia), Oulu, Finland,
July 2008.
232. Niko Kotilainen, Matthieu Weber, Mikko Vapa, and Jarkko Vuori. Mobile Chedar - a peer-to-peer middleware for mobile devices. In Proceedings of the Second International Workshop on Mobile Peer-to-Peer Computing
(MP2P’05), pages 86–90, Kauai Island, Hawaii, March 2005.
References
199
233. David Kotz and Kobby Essien. Analysis of a campus-wide wireless network.
Technical Report TR2002-432, Dept. of Computer Science, Dartmouth College,
September 2002.
234. David Kotz and Kobby Essien. Analysis of a campus-wide wireless network.
In Proceedings of the Eighth Annual International Conference on Mobile Computing and Networking (MobiCom), pages 107–118, September 2002. Revised
and corrected as Dartmouth CS Technical Report TR2002-432.
235. Iordanis Koutsopoulos and Leandros Tassiulas. Joint optimal access point selection and channel assignment in wireless networks. IEEE/ACM Transactions
on Networking, (3), June 2007.
236. John Krumm, Steve Harris, Brian Meyers, Barry Brumitt, Michael Hale, and
Steve Shafer. Multi-Camera Multi-Person Tracking for EasyLiving. In Proceedings of the Third IEEE International Workshop on Visual Surveillance,
Dublin, Ireland, July 2000.
237. Anurag Kumar, Eitan Altman, Daniele Miorandi, and Munish Goyal. New
insights from a fixed point analysis of single cell IEEE802.11 WLANs. In
IEEE Conference on Computer Communications (InfoCom), pages 1550–1561,
Miami, Florida, USA, March 2005.
238. Mathieu Lacage, Mohammad Hossein Manshaei, and Thierry Turletti. IEEE
802.11 rate adaptation: a practical approach. In The Seventh ACM International Symposium on Modeling, Analysis and Simulation of Wireless and
Mobile Systems (MSWiM), pages 126–134, Venice, Italy, October 2004.
239. A. M. Ladd, K.E. Bekris, A. Rudys, G. Marceau, L. E. Kavraki, and D.S.
Wallach. Robotics-Based Location Sensing using Wireless Ethernet. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
Atlanta, GE, USA, September 2002.
240. Youngseok Lee, Kyoungae Kim, and Yanghee Choi. Optimization of AP placement and channel assignment in wireless LANs. In IEEE Conference on Local
Computer Networks, Florida, FL, USA, November 2002.
241. Jeremie Leguay, Anders Lindgren, James Scott, Timur Friedman, Jon
Crowcroft, and Pan Hui. CRAWDAD data set upmc/content (v. 2006-11-17).
Downloaded from http://crawdad.cs.dartmouth.edu/upmc/content, November
2006.
242. Hui Lei and Dan Duchamp. An analytical approach to file prefetching. In
USENIX Annual Technical Conference, Anaheim, CA, January 1997.
243. Will E. Leland, Murad S. Taqq, Walter Willinger, and Daniel V. Wilson. On
the self-similar nature of Ethernet traffic. In Deepinder P. Sidhu, editor, ACM
Symposium on Communications Architectures and Protocols (SigComm), pages
183–193, San Francisco, California, September 1993. ACM. also in Computer
Communication Review 23 (4), Oct. 1992.
244. H. Leung, T. Lo, and S. Wang. Prediction of noisy chaotic time series using
an optimal radial basis function neural network. IEEE Transactions on Neural
Networks, 12(5):1163–1172, September 2001.
245. Peter A.W. Lewis and Gerald S. Shedler. Simulation of nonhomogeneous Poisson process by thinning. Naval Research Logistics Quarterly, 26:403–413, 1979.
246. Jiandong Li, Zygmunt J. Haas, Min Sheng, and Yanhui Chen. Performance
evaluation of modified IEEE 802.11 MAC for multi-channel multi-hop ad-hoc
network. In 17th International Conference on Advanced Information Networking and Applications (AINA), Xi’an, China, March 2003.
200
References
247. Li Li, Joseph Halpern, Paramvir Bahl, Yi-Min Wang, and Roger Wattenhofer.
Analysis of a cone-based distributed topology control algorithm for wireless
multi-hop networks. In PODC ’01, Newport, Rhode Island, August 2001.
248. Lun Li, David Alderson, John Doyle, and Walter Willinger. Towards a theory
of scale-free graphs: definition, properties, and implications. Internet Mathematics.
249. Ning Li and Jennifer Hou. FLSS: A Fault-Tolerant Topology Control. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
Philadelphia, Pennsylvania, September 2004.
250. Ning Li and Jennifer Hou. Localized topology control algorithms for heterogeneous wireless networks. IEEE/ACM Transactions On Networking, 13(6),
December 2005.
251. Ning Li, Jennifer Hou, and Lui Sha. Design and analysis of an mst-based
topology control algorithm. IEEE Transactions on Wireless Communications,
4(3), May 2005.
252. Qun Li and Daniela Rus. Sending messages to mobile users in disconnected adhoc wireless networks. In ACM International Conference on Mobile Computing
and Networking (MobiCom), pages 44–55, Boston, MA, USA, August 2000.
253. T. Lindeberg. Scale Space Theory in Computer Vision. Kluwer, Boston, 1994.
254. Christoph Lindemann and Oliver P. Waldhorst. Modeling epidemic information dissemination on mobile devices with finite buffers. In ACM Sigmetrics
Conference on Measurement and Modeling of Computer Systems, Banff, Alberta, Canada, June 2005.
255. Mark Lindsey. Correlations among nearby mobile web users. Master’s thesis,
Department of Computer Science, University of North Carolina at Chapel Hill,
Chapel Hill, NC, May 2003.
256. The Jakarta Project Lucene. http://jakarta.apache.org/lucene/docs/index.html.
257. Henrik Lundgren, Krishna Ramachandran, Elizabeth Belding-Royer, Kevin
Almeroth, Michael Benny, Andrew Hewatt, Alexander Touma, and Amit Jardosh. Experiences from the design, deployment, and usage of the UCSB MeshNet testbed. IEEE Wireless Communications, 44(4):18–29, April 2006.
258. J. S. Marron, Félix Hernández-Campos, and F. D. Smith. A sizer analysis of IP
flow start times. Institute of Mathematical Statistics Lecture Notes-Monograph
Series, J. Rojo and V. Perez-Abreu (Eds), 44:87–105, 2004.
259. Sergio Marti, T. J. Giuli, Kevin Lai, and Mary Baker. Mitigating routing
misbehavior in mobile ad-hoc networks. In ACM International Conference on
Mobile Computing and Networking (MobiCom), pages 255–265, Boston, MA,
USA, August 2000.
260. Marvin McNett and Geoffrey M. Voelker. Access and mobility of wireless PDA
users. Mobile Computing Communications Review, 9(2):40–55, April 2005.
261. Xiaoqiao Meng, Starsky Wong, Yuan Yuan, and Songwu Lu. Characterizing
flows in large wireless data networks. In ACM International Conference on
Mobile Computing and Networking (MobiCom), pages 174–186, Philadelphia,
PA, September 2004.
262. Dejan Milojicic, Vana Kalogeraki, Rajan Lukose, Kiran Nagaraja, Jim Pruyne,
Bruno Richard, Sami Rollins, and Zhichen Xu. Peer-to-peer computing. Technical Report HPL-2002-57, HP Laboratories, Palo Alto, USA, March 2002.
263. Arunesh Mishra, Vladimir Brik, Suman Banerjee, Aravind Srinivasan, and
William Arbaugh. A client-driven approach for channel management in wire-
References
264.
265.
266.
267.
268.
269.
270.
271.
272.
273.
274.
275.
276.
277.
278.
201
less LANs. In IEEE Conference on Computer Communications (InfoCom),
Barcelona, Spain, April 2006.
Arunesh Mishra, Minho Shin, and William A. Arbaugh. An empirical analysis
of the IEEE 802.11 MAC layer handoff process. ACM SIGCOMM Computer
Communication Review, pages 93–102, April 2003.
Joseph Mitola. Cognitive radio: An integrated agent architecture for software
defined radio. PhD thesis, Royal Institute of Technology (KTH), Stockholm,
Sweden, May 2000.
Michael Mitzenmacher. A brief history of generative models for power law and
lognormal distributions. Internet Mathematics, 2003.
Allen K. L. Miu, Hari Balakrishnan, and Can Emre Koksal. Improving loss
resilience with multi-radio diversity in wireless networks. In ACM International
Conference on Mobile Computing and Networking (MobiCom), pages 16–30,
Cologne, Germany, August 2005.
A. Moore and K. Papagiannaki. Toward the Accurate Identification of Network
Applications. In Passive and Active Measurement Workshop, Boston, MA,
USA, March 2005.
Thomas Moscibroda, Roger Wattenhofer, and Aaron Zollinger. Topology Control Meets SINR: The Scheduling Complexity of Arbitrary Topologies. In ACM
International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), Florence, Italy, May 2006.
Mirco Musolesi and Cecilia Mascolo. A community based mobility model for
ad hoc network research. In Second International Workshop on Multi-hop Ad
Hoc Networks (REALMAN), May 2006.
Tamer Nadeem, Lusheng Ji, Ashok K. Agrawala, and Jonathan R. Agre. Location enhancement to IEEE802.11 DCF. In IEEE Conference on Computer
Communications (InfoCom), pages 651–663, Miami, Florida, USA, March
2005.
Napster. http://www.napster.com.
Asis Nasipuri and Samir R. Das. A multichannel CSMA MAC protocol for
mobile multihop networks. In Proceedings of the IEEE Wireless Communications and Networking Conference (WCNC), New Orleans, LA, USA, September
1999.
Huu Quynh Nguyen, François Baccelli, and Daniel Kofman. A stochastic geometry analysis of dense IEEE802.11 networks. In IEEE Conference on Computer
Communications (InfoCom), pages 1199–1207, Anchorage, Alaska, USA, May
2007.
Drago Niculescu and Badri Nath. Ad Hoc Positioning System (APS). In
IEEE Conference on Global Communications (GLOBECOM), San Antonio,
TX, November 2001.
Dragos Niculescu and Badri Nath. Ad Hoc Positioning System (APS) using
AoA. In IEEE Conference on Computer Communications (InfoCom), San
Francisco,CA, April 2003.
Brian D. Noble and Mahadev Satyanarayanan. Experience with adaptive mobile applications in odyssey. Mobile Networks and Applications, 4(4):245–254,
December 1999.
Carl Nuzman, Iraj Saniee, Wim Sweldens, and Alan Weiss. A compound model
for TCP connection arrivals for LAN and WAN applications. Computer Networks, 40(3):319–337, October 2002.
202
References
279. The New York Times on the Web.
http://www.nytimes.com/adinfo/wireless audience.html.
280. A. Ovchinnikov, S. Timashev, and A. Belyy. Kinetics of Diffusion controlled
chemical processes. Nova Science Publishers, 1989.
281. Thomas W. Page, Richard G. Guy, John S. Heidemann, David Ratner, Peter
Reiher, Ashish Goel, Geoffrey H. Kuenning, and Gerald J. Popek. Perspectives on optimistically replicated peer-to-peer filing. Software—Practice and
Experience, 28(2):155–180, February 1998.
282. Maria Papadopouli, Elias Raftopoulos, and Haipeng Shen. Evaluation of shortterm traffic forecasting algorithms in wireless networks. In 2nd Conference on
Next Generation Internet Design and Engineering, Valencia, Spain, April 2006.
283. Maria Papadopouli and Henning Schulzrinne. Seven degrees of separation
in mobile ad hoc networks. In IEEE Conference on Global Communications
(GLOBECOM), San Francisco, CA, November 2000.
284. Maria Papadopouli and Henning Schulzrinne. Effects of power conservation,
wireless coverage and cooperation on data dissemination among mobile devices. In ACM International Symposium on Mobile Ad Hoc Networking and
Computing (Mobihoc), Long Beach, CA, October 2001.
285. Maria Papadopouli and Henning Schulzrinne. A performance analysis of 7DS
a peer-to-peer data dissemination and prefetching tool for mobile users. In Advances in wired and wireless communications, IEEE Sarnoff Symposium Digest,
Ewing, NJ, March 2001.
286. Maria Papadopouli and Henning Schulzrinne. Performance of data dissemination and message relaying in mobile ad hoc networks. Technical Report
CUCS-004-02, Dept. of Computer Science, Columbia University, New York,
NY, February 2002.
287. Maria Papadopouli, Haipeng Shen, Elias Raftopoulos, Manolis Ploumidis, and
Félix Hernández-Campos. Short-term traffic forecasting in a campus-wide wireless network. In 16th Annual IEEE International Symposium on Personal Indoor and Mobile Radio Communications, Berlin, Germany, September 2005.
288. Maria Papadopouli, Haipeng Shen, and Manolis Spanakis. Characterizing the
duration and association patterns of wireless access in a campus. In 11th
European Wireless Conference, Nicosia, Cyprus, April 2005.
289. Maria Papadopouli, Haipeng Shen, and Manolis Spanakis. Modeling client arrivals at access points in wireless campus-wide networks. In 14th IEEE Workshop on Local and Metropolitan Area Networks, Chania, Greece, September
2005.
290. Maria Papadopouli, Haipeng Shen, and Manolis Spanakis. Modeling client
arrivals at access points in wireless campus-wide networks. Technical Report
357, FORTH-ICS, Heraklion, Greece, May 2005.
291. Konstantina Papagiannaki, Mark Yarvis, and W. Steven Conner.
CRAWDAD data set intel/home (v. 2006-04-16).
Downloaded from
http://crawdad.cs.dartmouth.edu/intel/home, April 2006.
292. Konstantina Papagiannaki, Mark D. Yarvis, and W. Steven Conner. Experimental characterization of home wireless networks and design implications. In
IEEE Conference on Computer Communications (InfoCom), Barcelona, Spain,
April 2006.
293. Theodore Patkos, Antonis Mpikakis, Grigoris Antoniou, Maria Papadopouli,
and Dimitris Plexousakis. A semantic-based framework for context-aware
References
294.
295.
296.
297.
298.
299.
300.
301.
302.
303.
304.
305.
306.
307.
308.
309.
203
pedestrian guiding services. In Second International Workshop on Semantic
Web Technology For Ubiquitous and MobileApplications, Trentino, Italy, 2006.
Vern Paxson. Empirically-derived analytic models of wide-area TCP connections. IEEE/ACM Transactions on Networking, 2(4):316–336, August 1994.
Vern Paxson and Sally Floyd. Wide-area traffic: the failure of Poisson modeling. In ACM Symposium on Communications Architectures and Protocols
(SigComm), pages 257–268, London, United Kingdom, August 1994.
Vern Paxson and Sally Floyd. Wide-area traffic: The failure of Poisson modeling. IEEE/ACM Transactions on Networking, 3(3), June 1995.
Charles Perkins and Pravin Bhagwat. Highly dynamic destination-sequenced
distance-vector routing (DSDV) for mobile computers. In ACM Conference on
Communications Architectures, Protocols and Applications, volume 24, pages
234–244, October 1994.
Charles E. Perkins, Elizabeth M. Royer, Samir R. Das, and Mahesh K. Marina. Performance comparison of two on-demand routing protocols for ad-hoc
networks. IEEE Personal Communications Magazine, 8(1), February 2001.
Roman Pichna, Tero Ojanpera, Harri Posti, and Jouni Karppinen. Wireless
Internet - IMT-2000/wireless LAN Interworking. Journal of Communications
and Networking, 2(1):46–57, March 2000.
Manolis Ploumidis, Maria Papadapouli, and Thomas Karagiannis. Multi-level
application-based traffic characterization in a large-scale wireless network. In
International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), Helsinki, Finland, June 2007.
Nissanka B. Priyantha, Anit Chakraborty, and Hari Balakrishnan. The Cricket
location-support system. In ACM International Conference on Mobile Computing and Networking (MobiCom), pages 32–43, Boston, MA, USA, August
2000.
Nissanka B. Priyantha, Allen K. L. Miu, Hari Balakrishnan, and Seth Teller.
The Cricket compass for context-aware mobile applications. In ACM International Conference on Mobile Computing and Networking (MobiCom), pages
1–14, Rome, Italy, July 2001.
DATAMAN project. http://www.cs.rutgers.edu/dataman/, 2001.
D. Qiao and S. CHoi. Fast-responsive link adaptation for IEEE 802.11 WLANs.
In IEEE International Conference on Communications (ICC), Seoul, Korea,
May 2005.
Daji Qiao, Sunghyun Choi, and Kang G. Shin. Goodput analysis and link
adaptation for IEEE 802.11a wireless LANs. IEEE Transactions on Mobile
Computing, 1(4):278–292, 2002.
Chandra R., Lili Qiu, Jain K., , and M. Mahdian. Optimizing the placement of
Internet taps in wireless neighborhood networks. In International Conference
on Network Protocols (ICNP), Berlin, Germany, October 2004.
R. Harle, A. Ward, and A. Hopper. Single reflection spatial voting. In Proceedings of the First International Conference on Mobile Systems, Applications,
and Services, San Francisco, May 2003.
Krishna Ramachandran, Elizabeth Belding, Kevin Almeroth, and Milind Buddhikot. Interference-aware channel assignment in multi-radio wireless mesh
networks. In IEEE Conference on Computer Communications (InfoCom),
Barcelona, Spain, April 2006.
Ram Ramanathan. A unified framework and algorithm for channel assignment
in wireless networks. Wireless Networks, 5(2):81–94, March 1999.
204
References
310. Ishwar Ramani and Stefan Savage. SyncScan: Practical fast handoff for 802.11
infrastructure networks. In IEEE Conference on Computer Communications
(InfoCom), Miami, FL, USA, March 2005.
311. Theodore S. Rappaport. Wireless Communications: Principles and Practice.
IEEE Press, New York, 1996.
312. Krishnamurthi Ravishankar and Suresh Singh. Broadcasting on [0,L]. Discreet
Applied Mathematics, 53(1–3):299–320, 1994.
313. Krishnamurthi Ravishankar and Suresh Singh. Asymptotically optimal gossiping on [0, l]. Discreet Applied Math, 1995.
314. Krishnamurthi Ravishankar and Suresh Singh. Central limit theorem for time
to broadcast on [0, l]. Probability in the Applied and Informational Sciences,
9:201–209, 1995.
315. RIM. http://www.goamerica.net/coverage/cingular.html.
316. John Risson and Tim Moors. Survey of research towards robust peer-to-peer
networks: Search methods. Technical report UNSW-EE-P2P-1-1, University of
New South Wales, Sydney, Australia, September 2004.
317. Maya Rodrig, Charles Reis, Ratul Mahajan, David Wetherall, John Zahorjan,
and Ed Lazowska. CRAWDAD data set uw/sigcomm2004 (v. 2006-10-17).
http://crawdad.cs.dartmouth.edu/uw/sigcomm2004, October 2006.
318. Miguel Rodriguez, Juan P. Pece, and Carlos J. Escudero. In-building location
using bluetooth. In International Workshop on Wireless Ad-hoc Networks,
Coruna, Spain, May 2005.
319. Sheldon Ross. Introduction to probability models. Academic Press, London,
1993.
320. Sheldon Ross. Stochastic Processes. Jon Wiley and Sons, New York, 1996.
321. Sheldon M. Ross. Stochastic Processes. John Wiley and Sons, New York, New
York, 1983.
322. Abhishek Roy, Archan Misra, and Sajal K. Das. An information theoretic
framework for optimal location tracking in multi-system 4G wireless networks.
In IEEE Conference on Computer Communications (InfoCom), Hong Kong,
March 2004.
323. Roy Want and Andy Hopper and Veronica Falcao and Jon Gibbons. The
Active Badge Location System. ACM Transactions on Information Systems,
10(1):91–102, January 1992.
324. Elizabeth M. Royer and Charles E. Perkins. Multicast operation of the ad-hoc
on-demand distance vector routing protocol. In ACM, editor, ACM International Conference on Mobile Computing and Networking (MobiCom), pages
207–218, Seattle, Washington, August 1999.
325. Bahareh Sadeghi, Vikram Kanodia, Ashutosh Sabharwal, and Edward
Knightly. Opportunistic media access for multirate ad-hoc networks. In ACM
Proceedings of the 8th annual international conference on Mobile computing
and networking (MobiCom), pages 24–35, New York, NY, USA, 2002.
326. Prince Samar, Marc R. Pearlman, and Zygmunt J. Haas. Hybrid routing: the
pursuit of an adaptable and scalable routing framework for ad-hoc networks.
In The handbook of ad hoc wireless networks, pages 245–262. CRC Press, Inc.,
Boca Raton, FL, USA, December 2003.
327. Chris Savarese, Jan Rabaey, and Koen Langendoen. Robust positioning algorithms for distributed ad-hoc wirleess sensor networks. In Proceedings of
Usenix Annual Technical Conference, Monterey, CA, June 2002.
References
205
328. Bruce Schneier. Applied Cryptography. John Wiley and Sons, 1995.
329. Vinay Seshadri, Gergely V. Zaruba, and Manfred Huber. A Bayesian sampling approach to indoor localization of wireless devices using received signal strength indication. In Proceedings of IEEE International Conference on
Pervasive Computing and Communications (Percom), Kauai Island, Hawaii,
March 2005.
330. Jungmin So and Nitin H. Vaidya. Multi-channel mac for ad-hoc networks: handling multi-channel hidden terminals using a single transceiver. In Proceedings
of the 5th ACM international symposium on Mobile ad hoc networking and
computing (MobiHoc), pages 222–233, New York, NY, USA, 2004.
331. Joel Sommers, Hyungsuk Kim, and Paul Barford. Harpoon: a flow-level traffic
generator for router and network tests. In ACM SIGMETRICS poster session,
New York, NY, USA, 2004. ACM.
332. M. Spreitzer, M. Theimer, K. Petersen, A. Demers, and D. Terry. Dealing
with server corruption in weakly consistent, replicated data systems. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
pages 234–240, Budapest, Hungary, September 1997.
333. Sprint Applied Research Group.
http://ipmon.sprintlabs.com/packstat/packetoverview.php.
334. Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In ACM Symposium on Communications Architectures and Protocols
(SigComm), San Diego, CA, August 2001.
335. Jing Su and Stefan Saroiu. CRAWDAD data set toronto/bluetooth (v. 2006-0829). Downloaded from http://crawdad.cs.dartmouth.edu/toronto/bluetooth,
August 2006.
336. Karthikeyan Sundaresan and Konstantina Papagiannaki. The need for crosslayer information in access point selection algorithms. In Internet Measurement
Conference, New York, NY, USA, 2006. ACM.
337. Carl Tait, Hui Lei, Swarup Acharya, and Henry Chang. Intelligent file hoarding
for mobile computers. In ACM International Conference on Mobile Computing
and Networking (MobiCom), Berkeley, CA, November 1995.
338. Diane Tang and Mary Baker. Analysis of a local-area wireless network. In ACM
International Conference on Mobile Computing and Networking (MobiCom),
pages 1–10, Boston, MA, USA, August 2000.
339. S. Thrun, D. Fox, W. Burgard, and F. Dellaert. Robust Monte Carlo Localization for Mobile Robots. Artificial Intelligence, 128(1-2):99–141, 2000.
340. Cristian Tuduce and Thomas Gross. A mobility model based on wlan traces and
its validation. In IEEE Conference on Computer Communications (InfoCom),
Miami, FL, USA, March 2005.
341. Kurt Tutschku. A measurement-based traffic profile of the edonkey filesharing
service. In Passive and Active Measurement Workshop, Antibes Juan-les-Pins,
France, April 2004.
342. George Tzagkarakis, Maria Papadopouli, and Panagiotis Tsakalides. Singular
spectrum analysis of traffic workload in a large-scale wireless lan. In 10th
ACM/IEEE International Symposium on Modeling, Analysis and Simulation
of Wireless and Mobile Systems, Chania, Crete, Greece, October 2007.
343. United Villages. http://www.unitedvillages.com.
206
References
344. Konstantinos Vandikas, Lito Kriara, Tonia Papakonstantinou, Anastasia Katranidou, Haris Baltzakis, and Maria Papadopouli. Empirical-based analysis of
a cooperative location-sensing system. In ACM First International Conference
on Autonomic Computing and Communication Systems (Autonomics), Rome,
Italy, October 2007.
345. A. Vasan, R. Ramjee, and T. Woo. Echos-enhanced capacity 802.11 hotspots.
In IEEE Conference on Computer Communications (InfoCom), Miami, FL,
USA, March 2005.
346. Arunchandar Vasan, Ramachandran Ramjee, and Thomas Y. C. Woo. ECHOS
- enhanced capacity 802.11 hotspots. In IEEE Conference on Computer Communications (InfoCom), pages 1562–1572, Miami, Florida, USA, March 2005.
347. S. Vasudevan, K. Papagiannaki, C. Diot, J. Kurose, and D. Towsley. Facilitating access point selection in ieee 802.11 wireless networks. In Internet
Measurement Conference, Berkeley, CA, USA, 2005. USENIX Association.
348. Vindigo. http://www.vindigo.com/learn more.html.
349. Kashi Venkatesh Vishwanath and Amin Vahdat. Realistic and responsive network traffic generation. In ACM Symposium on Communications Architectures
and Protocols (SigComm), Pisa, Italy, September 2006.
350. Roy Want and Andy Hopper. Active badges and personal interactive computing objects. Technical Report ORL 92-2, Olivetti Research, Cambridge,
England, February 1992. also in IEEE Transactions on Consumer Electronics,
Feb. 1992.
351. M. Weiser. The computer for the 21st century. Scientific American, September
1991.
352. Joshua Weitz, Philip Benfey, and Ned Wingreen. Evolution, interactions, and
biological networks. PLoS Biology, 5(1), January 2007.
353. K. P. White. Simulating a nonstationary poisson process using bivariate thinning: The case of typical weekday arrivals at a consumer electronics store.
Proceedings of the 31st conference on Winter simulation: Simulation—a bridge
to the future, 1:458–461, 1999.
354. W. Willinger, Taqqu M.S., R. Sherman, and D.V. Wilson. Self-similarity
through high-variability: Statistical analysis of ethernet LAN traffic at the
source level. ACM Computer Communication Review, 25(4):100–113, October
1995.
355. W. Willinger, V. Paxson, and M. Taqqu. Self-similarity and heavy tails: Structural modeling of network traffic. In R. Adler, R. Feldman, and M. Taqqu,
editors, A Practical Guide to Heavy Tails: Statistical Techniques and Applications, pages 27–53. Birkhauser, 1998.
356. Alec Wolman, Geoffrey M. Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,
and Henry M. Levy. On the scale and performance of cooperative Web proxy
caching. In Proc. ACM Symposium on Operating Systems Principles, Kiawah
Island, SC, December 1999.
357. Starsky H. Y. Wong, Hao Yang, Songwu Lu, and Vaduvur Bharghavan. Robust
rate adaptation for 802.11 wireless networks. In ACM International Conference on Mobile Computing and Networking (MobiCom), pages 146–157, Los
Angeles, California, USA, September 2006.
358. Starsky H.Y. Wong, Hao Yang, Songwu Lu, and Vaduvur Bharghavan. Robust
rate adaptation for 802.11 wireless networks. In ACM International Conference
on Mobile Computing and Networking (MobiCom), Los Angeles, CA, September 2006.
References
207
359. Ya Xu, John Heidemann, and Deborah Estrin. Geography-informed energy
conservation for ad-hoc routing. In ACM International Conference on Mobile
Computing and Networking (MobiCom), Rome, Italy, August 2001.
360. J. Yang, C.-K. Lee Y. Chen, and M. Ammar. Ferry replacement protocols in
sparse manet message ferrying systems. In IEEE Wireless Communications
and Networking (WCNC), New Orleans, LA, March 2005.
361. Mark Yarvis, Konstantina Papagiannaki, and W. Steven Conner. Characterization of 802.11 wireless networks in the home. In Proceedings of the First Workshop on Wireless Network Measurements (WiNMee), Trentino, Italy, April
2005.
362. Tao Ye, H.-Arno Jacobsen, and Randy Katz. Mobile awareness in a wide area
wireless network of info-stations. In ACM International Conference on Mobile
Computing and Networking (MobiCom), Dallas, Texas, October 1998.
363. J. Yoon, M. Liu, and B. Noble. Random waypoint considered harmful. In
IEEE Conference on Computer Communications (InfoCom), San Franciso,
CA, September 2003.
364. Moustafa Youssef and Ashok Agrawala. The Horus WLAN location determination system. In International onference on Mobile Systems, Applications and
Services (MobiSys), Seattle, USA, June 2005.
365. Moustafa Youssef, Adel Youssef, Chuck Rieger, Udaya Shankar, and Ashok
Agrawala. PinPoint: An asynchronous time-based location determination system. In International onference on Mobile Systems, Applications and Services
(MobiSys), pages 165–176, Uppsala, Sweden, June 2006.
366. Wenrui Zhao, Mostafa Ammar, and Ellen Zegura. A message ferrying approach
for data delivery in sparse mobile ad-hoc networks. In IEEE Conference on
Computer Communications (InfoCom), Hong Kong, March 2004.
367. Wenrui Zhao, Yang Chen, Mostafa Ammar, Mark D. Corner, Brian Neil Levine,
and Ellen Zegura. Capacity Enhancement using Throwboxes in DTNs. In IEEE
International Conference on Mobile Ad hoc and Sensor Systems, Vancouver,
Canada, October 2006.
368. Junlan Zhou, Zhengrong Ji, and Rajive Bagrodia. TWINE: A hybrid emulation testbed for wireless networks and applications. In IEEE Conference on
Computer Communications (InfoCom), Barcelona, Spain, April 2006.
369. Lidong Zhou and Zygmunt J. Haas. Securing ad-hoc networks. IEEE Network,
13(6), November 1999.
370. Jing Zhu, Benjamin Metzler, Xingang Guo, and York Liu. Adaptive CSMA
for scalable network capacity in high-density WLAN: A hardware prototyping approach. In IEEE Conference on Computer Communications (InfoCom),
Barcelona, Spain, April 2006.
Index
blinc, 88
ftp, 92
http traces, 81
snmp, 79, 81, 88
syslog, 79, 82
syslog messages, 81, 109
tcp, 79
7DS, 12, 17
7DS host, 7DS node, 48
7DS node, 13
7DS peer, 13
7DS prototype, 38
7DS simulation parameters, 51
Active Badge, 37
Active Bat, 29, 37
active querying, 20, 49
ad-hoc, 9
advertisement, 48
advertisements, 20
AP, 5, 78
AP path, 109
AP-coresident client repeated request,
97
AP-coresident-client repeated requests,
96
application type, 88
application-based classification, 88
archives, 181
arrival process of client visits, 124
arrivals of clients, 123
association process, 116
asynchronous, 4
asynchronous mode, 14, 22, 48, 51
BiPareto distribution, 112
BitTorrent, 8, 9
building path, 109
building types, 122, 142
cache at each AP, 99
cache attached to an AP, 96
cache hierarchy, 99
caching paradigms, 96
campus-wide deployments, 182
campus-wide repeated requests, 96
campus-wide testbeds, 182
CBR, 106
CCDF, 111
channel selection, 171
channel switching, 172
chemotaxis, 175
classify flows, 88
client, 78
closed-loop, 158
CLS, 14, 17
cognitive radio, 169
collaborative location-sensing system,
14
complementary cumulative distribution,
111
connectivity problems, 168
cooperation, 24
CRAWDAD, 181
Cricket, 29
DAG, 80
210
Index
data
data
data
data
dissemination, 68
repositories, 181
sharing (DS), 49
sharing and forwarding enabled
(DS+FW), 49
default querying mechanism, 20
Default values in 7DS simulations, 52
delay-tolerant, 12
delay-tolerant networks, 15
device mobility, 106, 174
diffusion, 175
diffusion process, 69
disconnections to the Internet, 12
disruption-tolerant network, 182
distance-prediction based, 29
dominant application, 9, 92
dominated, 87
downloaders, 87
drop-in client, 84
DTN, 15, 182
dynamic port, 88
e-checks, 25
empirical studies, 78, 181
epidemic models, 68
exponential quantile plot, 125
exponentiality, 124
File transfer, 92
filesystems, 2
FIS, 49, 70
fixed information server, 49
flow arrival count process, 142
flow inter-arrival time, 142
flow inter-arrivals, 142
forwarding (FW), 49
forwarding is enabled, 49
Freenet, 8
Gnutella, 8
handoffs, 103
hit ratios, 99
hoarding, 4
homeAP, 109
hotspot, 122
hybrid S-C, 49
information dissemination, 175
information locality, 102
information providers, 11
infostation, 44, 72
infrastructure-wide modeling, 137
inter-AP transition, 83
inter-building transition, 84
inter-building transitions, 109
known-port limitation, 88
Kolmogorov-Smirnov test, 124
landmarks, 31
link-level measurements, 182
Location-aware services, 2
location-based services, 1
Location-sensing systems, 29
Lognormal distribution, 133
measurements, 77
mesh network, 181
mesh networks, 5, 9, 78
micropayment, 25
MIS, 49
misclassified traffic, 88
mobile client, 84
mobile information access, 4
mobile information server, 49
mobile peer-to-peer, 9
mobile peer-to-peer computing
paradigm, 12
mobile session, 84, 113
mobile sessions, 110
monitoring, 79, 88, 172
monitoring tools, 79
moving-average model, 156
multi channel, 169
multi-radio, 169
multimedia traveling journal, 41
Napster, 8
network capacity, 169
Network management, 92
network-independent modeling, 129
notesharing, 39
ns-2, 51
null hypothesis, 123
on-line collaborations, 17
one-state history, 118, 119
Index
P-P, 13, 48
P-P variations, 49
particle filter, 35
passive querying, 20, 48
peer-to-peer, 7, 9, 88, 92
peer-to-peer caching, 96
popular applications, 2, 92
positioning systems, 29
power law, 74
power management, 171
prefetching, 21
quantile plot, 130
queries, 20
query, 49
query interval, 51
querying, 23
Radar, 29
random waypoint mobility, 50
report, 20
reusability, 137
revisit, 115
RF tags, 6
roaming client, 84
roaming session, 84
Rosenstock’s trapping model, 69
S-C, 13, 48
same-AP repeated requests, 96, 97
same-building repeated requests, 96
same-client repeated request, 96
same-client repeated requests, 96
sampling variability, 126
scalability, 137
scanning activity, 92
security problems, 168
self-similar traffic, 157
sensors, 2
session, 83, 111
session arrivals, 142
session duration, 111
sessions, 110, 129
signature based, 29
simple epidemic model, 68
social network analysis, 17
source-level modeling, 159
spatial locality of information, 10, 94
state, 83
211
state history, 83
stationary session, 83, 111
stationary sessions, 110
stochastic order, 111
straight S-C, 49
streaming, 2, 88
streaming audio, 2
subsequent request, 96
synchronous access, 4
synchronous mode, 14, 22, 48
temporal locality, 96
testbeds, 78
testbeds in conferences, 182
testbeds in research labs, 182
time-varying Poisson process, 122, 123
token-based approaches, 25
topology modeling, 166
traffic load, 87
traffic of APs, 84
traffic share across applications, 92
traffic variation, 140
transient phenomena, 79
transient sessions, 114
trapping model, 69
Ubisense, 29
UNC, 78
United Villages project, 9, 43
uploaders, 87
user cache, 96
vehicle-based services, 2
vehicular-based testbed, 182
video, 2
visit, 79, 83
web, 9, 94
web browsing, 92
web requests, 96
whiteboard, 39
WiFi-enabled Kiosks, 9
wireless infrastructure, 79
wireless Internet, 11
wireless measurements, 79
Wireless PANs, 6
Wireless WANs, 5
wireless workload of APs, 101
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertisement