device-fingerprinting

device-fingerprinting
1
On Passive Wireless Device Fingerprinting using
Infinite Hidden Markov Random Field
Feng Chen, Qiben Yan, Chowdhury Shahriar, Chang-Tien Lu, Wenjing Lou, and T.
Charles Clancy
{chenf, qbyan, cshahria, ctlu, wjlou, tcc}@vt.edu
Virginia Tech, Falls Church, VA, USA
Abstract
This paper presents a new concept of device fingerprinting (or profiling) to enhance wireless security using
Infinite Hidden Markov Random Field (iHMRF). Wireless device fingerprinting is an emerging approach for
detecting spoofing attacks in wireless network. Existing methods utilize either time-independent features or
time-dependent features, but not both concurrently due to the complexity of different dynamic patterns. In
this paper, we present a unified approach to fingerprinting based on iHMRF. The proposed approach is able to
model both time-independent and time-dependent features, and to automatically detect the number of devices
that is dynamically varying. We propose the first iHMRF-based online classification algorithm for wireless
environment using variational incremental inference, micro-clustering techniques, and batch updates. Extensive
simulation evaluations demonstrate the effectiveness and efficiency of this new approach.
Index Terms
Passive Device Fingerprinting, Hidden Markov Random Field, Nonparametric Bayesian Methods
I. I NTRODUCTION
Nowadays, the proliferation of mobile devices moves the wireless networks toward “anytimeanywhere” mobile service model. However, the open nature of wireless networks renders them
susceptible to various types of spoofing attacks. For example, the adversaries can collect nodes’
identity information by passively monitoring the network traffic, and then masquerade as legitimate
nodes to disrupt network operations. Various attacks can be launched, such as packet injection
[1], Sybil attack [2], masquerade attack [3], etc. These identify-based attacks may hinder normal
2
communication and result in privacy leakage, which will lead to a huge outbreak of cybercrimes. As
a result, how to detect the presence of identity spoofing becomes a critical issue.
Two categories of existing solutions exist to detect identity spoofing attacks, namely, active detection
and passive detection. Active detection allows additional messages to be injected into the network,
such as challenges and responses used in cryptographic-based schemes for user authentication. In the
case that the entire node being compromised such that the cryptographic keys are exposed, location
related information can be used to facilitate node authentication. For example, in [4], specific chipset,
firmware or the driver of an 802.11 wireless device can be identified by watching its responses
to a crafted malformed 802.11 frames. However, the downside of active detection methods lies in
its requirement on extra message exchanges, which will accelerate the energy usage and consume
available bandwidth. In addition, the responses can also be spoofed, if they are device dependent.
In contrast, passive detection methods extract device specific features from message transmissions,
which can be categorized as time-independent and time-dependent features. The main strength is
that these features are device dependent and hence can be used as an unique pattern to fingerprint a
specific device. Particularly, time-independent features include clock skew (observed from message
time stamps), sequence number anomalies (in MAC frames), timing (of probe frames for channel
scanning), and various RF parameters (transient phases at the onset of transmissions, frequency offsets,
phase offsets, I/Q offsets, etc.) [5]. Time-dependent features include radio signal strength (RSS), angle
of arrival, time of arrival, differential received signal strength, frequency difference of arrival, etc.
Note that, time-independent features refer to the signal measurements that have constant mean values
and are only randomized by white noises across the time. Time-dependent features refer to the signal
measurements whose mean values are time varying due to the essential dynamical nature.
For the fingerprinting methods based on time-independent features [3], [5]–[9], though with a variety
of implementations, basically it is assumed that the features form a cluster for each device, which
can be regarded as the unique fingerprinting pattern to identify the device. Two most recent works
are conducted by Brik et al. [6] and Nguyen et al. [5]. Brik et al. [6] proposed the Passive RAdiometric Device Identification System (PARADIS) utilizing modulation domain radio-metrics, such as
carrier frequency error, I/Q offset, etc. Nguyen et al. [5] further proposed an unsupervised clustering
method based on non-parametric Bayesian method and infinite Gaussian mixture model, which can
automatically determine the number of clusters. To summarize, time-independent features can be
regarded as accurate and robust wireless signatures for particular devices. However, the fingerprinting
3
methods using time-independent features also have some limitations. For example, these features are
much harder to extract. Usually, some high-end measurement devices are required to perform feature
extraction. Moreover, the accuracy of these feature relies on the precision of the measurement devices.
Therefore, although time-dependent features are accurate wireless signatures, the extracted features
might include some errors due to the limitation of wireless measurements.
For time-dependent features, the most popular family of methods for device identification is RSSbased. In [10], a geographic location based identification against masquerading threats was employed,
where two approaches are proposed: distance ratio test (DRT), which utilizes the received signal
strength (RSS) of a device, and distance difference test (DDT), which relies on the received signal’s
relative phase difference when the signal is received at different devices. Zhao et al. [11] proposed a
radio environment map (REM) which is a comprehensive database of geographical features, available
services, spectral regulations, locations, and activities of radio devices and policies. Identification of
cognitive radio (CR) node through an analysis of the transmitted signal is investigated in [12] where
wavelet transform is utilized to identify the transmitter fingerprint. However, the RSS measurements
are time varying and only provide coarse spatial resolution. Therefore, due to the dynamic nature,
time-dependent features, such as RSS, cannot be regarded as an accurate and reliable signature alone.
The goal of this paper is to improve existing detection methods by considering additional features
that could potentially help improve the fingerprinting performance. Studies have been shown that both
time-independent features (e.g., frequency difference and phase shift difference) and time-dependent
features (e.g., RSS and time difference of arrival) can be used to do spoofing detection [3], [5]–[9],
[13], [14]. In this paper, we propose to concurrently model all the useful features in a unified statistical
framework, based on infinite hidden Markov random field (iHMRF). All the device dependent features
can be categorized into time-independent and time-dependent features. The autocorrelation on timedependent features is captured by using the so-called Markov Property in iHMRF, in which data points
that are similar on time-dependent features tend to have consistent cluster labels. The time-independent
features are captured through embedded Gaussian mxitures in iHMRF. The main contributions of this
work can be summarized as follows:
1) Design of a unified fingerprinting framework. To the best of our knowledge, this is the first
statistical approach to model both time-dependent and time-independent features in a systematic
framework for device fingerprinting.
2) Formulation of the fingerprinting problem via iHMRF modeling. We propose a novel
4
application of the iHMRF model to the device fingerprinting problem that captures correlations
on time-dependent features using the Markov property, and correlations on time-independent
features using an embedded Gaussian mixture model.
3) Design of an online learning algorithm. We propose a new online classification algorithm
for the fingerprinting problem based on variational incremental inference, micro-clustering
techniques, and batch updates.
4) Comprehensive empirical validations. We conducted extensive simulations on a variety of
scenarios to validate the effectiveness and efficiency of our proposed techniques, competing
with existing state-of-the-art methods.
The rest of the article is organized as follows. Section 2 formalizes the fingerprinting problem based
on both time-dependent and time-independent features. Section 3 discusses theoretical preliminaries,
including Hidden Markov Random Field (HMRF) and infinite Gaussian Mixture Model (iGMM).
Section 4 formulates an infinite hidden Markov random field (iHMRF) model for the fingerprinting
problem, and Section 5 presents a new incremental inference algorithm for wireless streaming environment. Empirical validations of our proposed fingerprinting framework are presented in Section 6.
The paper concludes and discusses our future work in Section 8.
II. R ELATED WORK
A large body of literature has been dedicated to the issue of wireless device identification for
detecting spoofing attacks. In this section, we review the most relevant work in the literature. Based
on different types of features utilized, we classify these methods into two categories, including radiometric based methods, and radio signal strength (RSS) based methods.
A. Radio-metric Based Device Fingerprinting
In [6], Brik et al. proposed the Passive RAdio-metric Device Identification System (PARADIS)
utilizing modulation domain radio-metrics, such as carrier frequency error, I/Q offset, etc. The experimental results show that these device dependent radio-metrics can effectively differentiate devices.
However, this method requires a training phase to collect the fingerprints of legitimate nodes. Nguyen
et al. [5] further proposed an unsupervised clustering method based on non-parametric Bayesian
method and infinite Gaussian mixture model. Without knowing the number of devices, this method
can automatically identify different devices by clustering their emitted packets into different clusters.
5
Our method also builds upon a non-parametric Bayesian framework for unsupervised clustering.
However, our method not only considers device dependent radio-metrics, but also takes other device
independent features into consideration to greatly improve the device identification performance.
B. RSS Based Device Fingerprinting
Compared with radio-metric features, RSS feature is much easier to obtain, which makes RSS a
popular feature for device fingerprinting. Faria et al. [3] demonstrated strong correlations between RSS
signals and the physical location of devices, and proposed to use signalprint, a vector of RSS values
measured by surrounding Access Points (APs), to identify wireless devices for detecting spoofing
attacks. Sheng et al. [7] extended [3] and applied Gaussian mixture model to identify clusters of the
RSS readings. Chen et al. [8] used RSS and K-means cluster analysis. In both [7] and [8], the number
of clusters needs to be predefined. Later, Yang et al. [9] proposed two cluster-based mechanisms that
can automatically determine cluster numbers.
However, the aforementioned methods [3], [7]–[9] only work in a static network (e.g., each device
is fixed in a specific location) and may raise a large number of false alarms in a mobile network.
The RSS profiles may change over time due to the nature of wireless device mobility. To capture
the RSS time-dependent property, Yang et al. [13] proposed the DEMOTE system that partition the
RSS trace of a node identity into two separate RSS traces, in which one trace is related to a genuine
node, and the other is related to a potential attacker. If the correlation between the two traces is lower
than a threshold, an alarm is alerted. They focused on two-class situations where one genuine node
and one attacker share a single identity (e.g., MAC address). This solution may not be applicable to
situations with multiple attackers sharing the same identity. Zeng et al. [14] proposed a reciprocal
channel variation-based identification (RCVI) technique to detect spoofing attacks in mobile wireless
networks. RCVI applies location de-correlation and reciprocal channel variation to detect the original
devices of all packets. However, this method assumes a bidirectional communication between the
genuine and the victim nodes. Therefore, it is not a completely passive detection and requires senders
to send the RSS information, which may cause unnecessary network overheads.
Our paper also focuses on dynamic mobile networks. We observe that the RSS based solution for
mobile networks share two more limitations. First, they assume that wireless devices and access points
(AP) communicate periodically, and hence high sample rate location features (e.g., RSS, TDOA) could
be extracted. Second, they consider device identification (e.g., MAC address) into their fingerprinting
6
process. The use of forgeable user identity information (UII) may make the methods vulnerable to
advanced spoofing attacks. For example, an attacker may inject packets with randomly assigned device
MAC addresses into the wireless network. This attack will be hard to detect if these MAC-addresses
related victim devices are evaluated separately. On contrary, our method takes the low sampling rate
into consideration. In addition, we neglect the forgeable UII in our fingerprinting framework.
III. F EATURES
FOR
D EVICE F INGERPRINTING
Device fingerprinting means utilizing a set of unique features of devices that when exploited can
be used to differentiate wireless devices. Fingerprinting features can be classified in several ways. For
example, it can be categorize as time-dependent or time-independent features. Some of the features
varies over the time, whereas the others remain unchanged. There can be device dependent and deviceindependent features as well. There can be transmitter fingerprinting and receiver fingerprinting. The
transmitter fingerprints are different than receiver’s radio-metric parameters’ such as received power
and are unique to the transmitter and not altered by the channel condition and receiver structure.
In this section we briefly discuss about notable features that can be exploited for iHMRF based
device fingerprinting. Typically some common features of signal measurements/classifications are:
angle-of-arrival (AOA), received signal strength (RSS), time-of-arrival (TOA) and frequency-of-arrival
(FOA). However, sometimes difference measurement features are well suited for creating traces
for particular applications. For example, time-difference-of-arrival (TDOA), frequency-difference-ofarrival (FDOA), differential received signal strength (DRSS), phase shift difference (PSD) etc.
A. Time Measurement
The time required for a signal to travel from the transmitter (client or node) to the receiver (anchor
or access point) is directly proportional to the distance between them. The TOA and TDOA follows
this principle [15]. Propagation time measurement requires synchronization between transmitter and
receiver and knowledge of transmission and reception times at one position. On the other hand,
time difference measurements eliminates need for node to be synchronized to anchors, but requires
synchronization between anchors and doesn’t directly give the distance between transmitter and
receiver. The trilateration, conversion of the observations to distances, from TOA or TDOA is done
by d = cτ , where d is the distance, τ is the observed time of flight (transmit time - receive time),
7
and c is the propagation speed. The distance (from observations) related to positions
dm = k(x, y) − (xm , ym)k ,
m = 1, 2, 3.
(1)
where (x, y) is the client position, (x1 , y1 ), (x2 , y2 ), and (x3 , y3) are anchor positions, and k(x, y)k =
p
x2 + y 2. Here we have three non-linear equations with two unknowns, and it can be shown that
there is a single solution. Solving the equations requires more advanced algorithm unless linearization
technique applied. Using two observation points, TDOA can be calculated by
d = d1 − d2 = k(x, y) − (x1 , y1 )k − k(x, y) − (x2 , y2 )k .
The key sources of errors are: 1) synchronization error for imperfect reference clock, measurement
error to determine the signal’s exact time of arrival and signal fading (i.e., multipath), and environmental errors (e.g., non-line-of-sight (NLOS) propagation) that adds delay not related to distance.
B. Frequency Measurement
Measuring ∆f , the difference between the carrier frequency of the received signal and the one of
the transmitted signal, can provide estimation about the device’s whereabout. The frequency difference
is a strong feature since each wireless transmitter has its own oscillator, and each oscillator creates
a unique carrier frequency. Frequency shift of the received signal is related to the velocity vector
of the transmitter relative to the receiver. Note that this mobility of transmitter introduces Doppler
effect in the signal that smears signal frequency that can be measured. Frequency difference are more
commonly used and obtained from Cross Ambiguity Function
Z T
C (∆f, ∆t) =
x (t) x∗ (t + ∆t) e−jπ∆f t dt.
(2)
0
It differs from time dependent features in that the frequency/phase shift feature observation points
must be in relative motion with respect to each other and the source, and FDOA is calculated by
f = f1 − f2 =
v2
v1
cos θ1 − cos θ2 .
λ
λ
(3)
A major drawback of this feature is that a large amounts of data must be moved between observation points or to a central position to do the cross-correlation that is necessary to estimate the
frequency shift. Other common source of measurement errors are: 1) imperfect frequency reference,
2) measurement errors such as noise, multipath etc., and non-stationary nature of the frequency.
8
C. Phase Shift Difference Measurement
On top of aforementioned method, one can differentiate devices by looking into device’s I-Q phase
characteristic [5], [16]. Ideally the phase shift from one constellation to a neighbor one is 180◦ for
BPSK modulation and 90◦ for QPSK modulation. I-Q phase characteristics are different for I-phase
and Q-phase. The constellation may deviate from original position due to hardware variability, and
different devices have different constellations. Therefore, this feature can be measured and used as
classifier as well. Figure 1 shows an illustrative example of device signal constellations. In this example
we used QPSK as modulation of choice and considered feature extracted from the constellation of
QPSK. In QPSK, four symbols with different phases are transmitted where each symbol represents
two bits. Mathematically the transmitted symbol can be represented as
r
2Es
π
,
cos 2πfc t + (2n − 1)
si (t) =
T
4
(4)
where Es is transmission power, T is symbol period, fc is carrier frequency, and n is constellation
index. By changing n, we can vary the phases of signal, creating four phases π4 ,
3π 5π
, 4,
4
and
7π
.
4
In
ideal case, the phase shift from one symbol to its neighbor is 90◦ . However, the transmitter amplifiers
for I-phase and Q-phase might be different. Consequently, the degree shift can have variations.
Device 1
Device 2
Fig. 1: Illustration of phase shift difference for constellation of QPSK symbols of two transmitters
D. Angle of Arrival Measurement
The direction of the nodes (or devices) relative to the AP (or anchor) is equal to the observed
received angle-of-arrival (AOA or DOA), that can be used to create trace of device by calculating the
position of the nodes, or determining the angle of the position of node relative to the access point.
This process is called ‘triangulation’ where a minimum of two anchors (i.e., location (x1 , y1) and
(x2 , y2 )) and reference coordinate are needed and can be calculated by two linear equations
y = tan θ1 x + (y1 − tan θ1 x1 ) ,
y = tan θ2 x + (y2 − tan θ2 x2 ) ,
(5)
9
where θ are angle between device and anchor. Possible source of AOA errors are reference error
(what is east?), measurement error for thermal noise, environmental error (i.e., NLOS).
E. Radio Signal Strength (RSS) Measurement
In free space signal power decays exponentially with distance that can be roughly estimated by
received signal strength. Translation of RSS measurement to distance requires knowledge of the
transmit power (i.e., reference value) and and knowledge of the relationship between distance and
power decay (propagation model). The received signal power can be expressed as
!
d
+ Xσ ,
Pr (d) = P0 + 10n log10
d0
(6)
where P0 is the received power at reference distance d0 and Pr is the observed received power, d is
the distances, and n is the path loss exponent. The trilateration from RSS is done in the same way
as time measurement, except the conversion of the observations to distances is done by
d0 = d10
P0 −Pr
10n
.
(7)
Differential RSS measurements eliminate need for transmit power knowledge and can provide
improved performance in correlated shadowing. The key limitations of this feature are: 1) imperfect
knowledge of the transmit power or antenna gain, 2) measurement error such as signal fading
(i.e., multipath), interference, and thermal noise, and 3) environmental errors (e.g., non-line-of-sight
propagation) such as shadowing, biases the resulting distance estimate, and imperfect knowledge of
the propagation exponent (model error).
Interestingly, the channel gain can be used as trait as well. The amplitude of received signal is
proportional to the channel gain, Ap . The general consensus is that the signals transmitted from the
same device over a short duration tend to have similar amplitude or effect of channel, even though
the absolute value of the amplitude is generally unknown. If the channel is Rayleigh faded multipath
channel, the channel gain can be expressed as
Ap ∼
= d−β |h|,
(8)
where |h| is the fading component that is normally distributed with N (0, σh2 ), d is distance from a
transmitted device to the sensing device, β is the path loss exponent. Thus the received signal gain
10
Ap can be described by the distribution
Ap ∼ N (0, d−2β σh2 ).
(9)
A notable difference is that by looking into channel characteristics only does not infer the locations
of devices directly, rather Ap as one more feature for the identification. Note that the aforementioned
features are generic and there are other features that can be used for specific radio technologies. For
example, second-order cyclostationary feature of OFDM can be used for identification.
Features
Time Independent
Time Dependent
Device Dependent
Frequency-of-arrival (FOA)
I/Q Offset
Radio Signal Strength (RSS)
Signal Noise Ratio (SNR)
Time-Difference-Of-Arrival (TDOA)
Device Independent
Phase Shift Difference (FSD)
Carrier Frequency Offset (CFO)
Time-Of-Arrival (TOA)
Angle-Of-Arrival (AOA)
Frequency-Difference-Of-Arrival (FDOA)
TABLE I: Device Fingerprinting Features
IV. P ROBLEM F ORMULATION
Suppose we are given a sequence of N packet feature vectors {(x1 , s1 , t1 ), · · · , (xN , sN , tN )}, where
xi ∈ Rp , si ∈ Rd , p and d refer to the numbers of time-independent and time-dependent features,
respectively, and ti refers to the arrival time of the ith packet on an AP. The goal is to identity the
sequence of hidden states (device labels): z1 , · · · , zN , where zi ∈ {1, 2, · · · , C} refers to the hidden
state of the packet feature vector (xi , si , ti ), and C refers to the total number of hidden states. There
may exist some ti s, in which the time distance between ti and ti+1 is large, and the dependence
between si and si+1 may be highly degraded because of the low collection rate. The number C of
hidden states is unknown and will be estimated using nonparametric Bayesian techniques.
The process of feature extraction is shown in Figure 2. Suppose multiple access points (APs) are
deployed across the network environment, which collect and send traffic information to a centralized
server, called a wireless appliance (WA). Each AP reports the RSS measurement for each packet
received, as well as other device dependent features, such as frequency difference and phase shift
difference. WA receives all the information and creates a fingerprint feature vector for each packet.
Note that, there may be some duplicated features reported by APs, such as frequency differences of
the repeated packets received by different APs. We will randomly select and keep one version, since
for device dependent features all different versions should exhibit similar patterns.
11
Fig. 2: Features extraction from packets
Several assumptions and constraints are stated as follows:
1) There is no training data about the fingerprints of legitimate devices available. The problem
will be addressed in a completely unsupervised manner.
2) The collection rate of RSS measurements may be unstable. Sometimes the collection rate will
be low, e.g., some devices are in standby status and there are no communications between the
devices and access points. Sometimes the collection rate will be high, e.g., the device users are
using calling services, sending text messages, or serving internet.
3) The number of clients (devices) is unknown and dynamic. Current clients may leave the network
and new clients may join the network in any time.
4) A wireless network may have a large number of concurrent clients. We will need to evaluate
the impact of the number of concurrent clients on the fingerprinting performance.
5) It is not allowed to add any additional out-band message exchanges. The problem will be
addressed using passive detection strategies.
6) Attackers have the ability to adjust transmission powers to increase localization uncertainties.
7) Attackers have the ability to masquerade as a large number of clients. Hence, we will not trust
device identity information and only consider device dependent features for fingerprinting.
V. T HEORETICAL BACKGROUNDS
This section introduces two basic statistical models: Hidden Markov Random Field (HMRF) and
infinite Gaussian Mixture Model (iGMM). These two models provide theoretical fundamentals to Infinite Hidden Markov Random Field (iHMRF) that will be applied to do wireless device fingerprinting.
12
A. Hidden Markov Random Field
Suppose we have a set of observations {(x1 , s1 ), · · · , (xN , sN )}, where each observation (xi , si )
has p features (xi ∈ Rp ) and d spatial coordinates (si ∈ Rd ). Denote X = {x1 , · · · , xN } and
S = {s1 , · · · , sN }. The objective is to infer the latent variables Z = {z1 , · · · , zn } based on X and S,
where zi ∈ C, and C = {1, · · · , C} denotes the set of class labels.
Hidden Markov Random Field (HMRF) can be described as a two-layer hierarchical model,
including the latent layer Z and the observation layer X. For the latent layer Z, HMRF considers
spatial dependencies between the observations Z. Nearby variables will have higher correlations than
distant ones. The neighborhood relationship is decided based on their closeness on spatial coordinates
{s1 , · · · , sn }, such as by the K-nearest neighbors rule. This Markov property can be formulated as
!
X
1
exp −
Vc (zi = c, N(zi )|β) ,
(10)
p(zi = c|N(zi ); γ) =
Z(γ)
c∈C
i
where Z(β) refers to a normalization constant, β is called the inverse temperature of the model,
N(zi ) refers to the neighbors of zi , and Ci refers to the set of cliques, each of which contains zi as
a member. A clique c is defined as any set of variables such that all the variables in c are neighbors
to each other. Vc (·) is called clique potential, which is a measure of the consistence of the variables
in c. A clique potential Vc (Z|β) can be defined as
Vc (Z|γ) = β
Y
δ(zi − zj ).
(11)
i,j∈c
The joint distribution p(Z|β) of an HMRF model is
p(Z) =
Y
i
!
X
1
exp −
Vc (Z|β) ,
p(zi |N(zi ); γ) =
Z(γ)
c∈C
(12)
where Z(β) is a normalization constant.
For the observation layer, HMRF defines the conditional distribution p(X|Z) as
p(X|Z; Θ) =
N
Y
p(xi |zi ; Θzi ),
(13)
i=1
p(xi |zi ; Θzi ) = N (xi |µzi , Σzi ),
(14)
where each observation xi follows a Gaussian distribution conditioned on the latent variable zi . Each
class is related to a distinct Gaussian distribution, and we have totally C Gaussian mixtures. Denote
13
the parameters Θ = {Θc }C
c=1 , and Θc = {µc , Σc }.
B. Infinite Gaussian Mixture Model
Infinite Gaussian Mixture Model (iGMM), also named Dirichlet Process Gaussian Mixture Model
(DPGMM), is an extension of the traditional Gaussian Mixture Model (GMM) to support an finite
number of Gaussian mixtures. Denote X = {x1 , · · · , xN } as observations, and Z = z1 , · · · , zN as
latent class labels, where zi ∈ Ci = {1, · · · , C}. Note that, different from HMRF, spatial coordinates
(attributes) are not considered here. The iGMM model can be defined as
vc |α ∼ Beta(1, α), c = 1, · · · , ∞,
Θc |G0 ∼ G0 , c = 1, · · · , ∞,
xi |zi = c; Θc ∼ N (µc , Σc ),
zi |π(v) ∼ Multi(π(v)),
where πc (v) = vc
Qc−1
i=1 (1
(15)
(16)
(17)
(18)
− vi ). To interpret this model, we can look at its data generating process:
1) Draw vc |α ∼ Beta(1, α), c = {1, 2, · · · },
2) Draw Θc = {µc , Σc }|G0 ∼ G0 , c = {1, 2, · · · },
3) For the ith data point
a) Draw zi |{v1 , v2 , · · · } ∼ Multi(π(v)),
b) Draw xi |zi = c ∼ N (, µc , Σc ).
Particularly, step 1 samples a countably infinite set of random variables v from a beta distribution
Beta(1, α), where α is a hyper-parameter. The prior probabilities π(v) can then be calculated as
c−1
Y
πc (v) = vc (1 − vi ), c = 1, 2, · · ·.
(19)
i=1
Step 2 samples the model parameters Θc for each mixture c from a base distribution G0 , which is
Σc ∼ InverseW ishartυ0 (Λ0 ),
(20)
µc ∼ N (µ0 , Σc /K0 ),
(21)
where {υ, µ0, Λ0 } are the hyper-parameters. Steps 1 and 2 are called the stick-breaking construction of
a dirichlet process (DP). Given the prior probabilities π(v) and the Gaussian distribution parameters
14
{Θ1 , Θ2 , · · · }, the last step (Step 3) is to i.i.d. sample N observations {xi , zi }, i = 1, 2, · · · , N . For
each point i, step 3.1 samples its class label from Multi(π(v)), and step 3.2 samples its features xi
from the corresponding Gaussian distribution N (µc , Σc ).
VI. I NFINITE H IDDEN M ARKOV R ANDOM F IELD ( I HMRF)
Given the data set X = {x1 , · · · , xN }, S = {s1 , · · · , sN }, and T = {t1 , t2 , · · · , tN }, with the
unknown class labels Z = {z1 , · · · , zN }. The iHMRF model can be represented by a graphical model
as shown in Figure 3. Each node represents a random variable (or vector), and each dot represents
a hyper-parameter. The filled nodes refer to observations and blank nodes refer to latent variables.
Basically, we first use spatio-temporal features {(s1 , t1 ), · · · , (sN , tN )} to build a neighborhood graph
for the latent state variables {z1 , · · · , zN }, in which states zi and zj are connected by an undirected
edge if they are spatial temporal neighbors. Each latent state variable zi will emit an observation xi .
The iHMRF model is designed by this manner. According to the key property of a hidden Markov
random field, the hidden states should be consistent if they are neighbors to each other. However, two
neighbor nodes zi and zj could be assigned different cluster labels if their emission observations xi
and xj belong to two different Gaussian distributions.
The iHMRF model can be defined as follows:
Definition 1. Infinite Hidden Markov Random Field (iHMRF)
α|λ1 , λ2 ∼ Gamma(λ1 , λ2 )
βc |α ∼ Beta(1, α), c = 1, · · · , ∞,
Θc |G0 ∼ G0 , c = 1, · · · , ∞,
xi |zi = c; Θc ∼ N (µc , Σc ),
zi |π(β) ∼ Multi(π(β)),
p(Z) =
N
Y
p(zi |π(β), {zi , N(zi )}),
(22)
(23)
(24)
(25)
(26)
(27)
i=1
p(zi |π(β), zi , N(zi )) = p(zi = c|π(β))
×p(zi = c|zi , N(zi ); γ),
(28)
15
where Θc |G0 stands for:
Σc ∼ InverseW ishartυ0 (Λ0 ),
(29)
µc ∼ N (g0 , Σc /η0 ),
(30)
and
p(zi = c|N(zi ); γ) =
!
1
exp −
Vc (zi = c, N(zi ); γ) ,
Z(γ)
c∈C
X
i
(31)
where {λ1 , λ2 , γ, υ0, g0 , Λ0, η0 } are hyper-parameters.
Fig. 3: Graphical Model Representation of iHMRF
Compared with HMRF and iGMM, the iHMRF model has three major advantages: First, iHMRF
is able to capture Gaussian mixtures information and spatial dependencies between latent variables
{zi }N
i=1 concurrently, through Equations (25) and (28). As a result, iHMRF tends to decide the value
of zi both based on its neighbors and its closest Gaussian mixture. When conflicts occur, that means
the class labels of its spatial neighbors are not consistent with its closest Gaussian mixture, we can
adjust the inverse temperature parameter γ to decide the weight we put on each side. A smaller value
of γ implies that the model will favor more on the Gaussian mixtures information. In the extreme
when γ = 0, the model will degenerate and become equivalent to iGMM. Second, iHMRF is able to
automatically estimate the number of class labels (clusters), since Dirichlet Process (DP) is used as the
prior distribution for zi and xi . Third, iHMRF is robust to transmission power changes. When a device
changes its transmission power, it tends to increase the spatial entropy and makes its spatial trajectory
more highlighted than those of other devices. We observe that iHMRF inherits the advantages of both
HMRF and iGMM.
Based on the above iHMRF model specification, the fingerprinting problem can be reformulated
16
as a maximum-a-posterior (MAP) problem. It is to estimate the latent variables {z1 , · · · , zN }, such
that their joint posterior probability based on the observations {x1 , · · · , xN } can be maximized:
{z1 , · · · , zN } = argmin p(z1 , · · · , zN |x1 , · · · , xN ).
(32)
{z1 ,··· ,zN }
Because the wireless device environment under study is a streaming environment, it is more
appropriate to do incremental inference (or classification). We will introduce efficient incremental
techniques in the next section VII.
VII. I NCREMENTAL VARIATIONAL I NFERENCE
FOR THE
IHMRF M ODEL
Inference for the iHMRF model can be conducted based on variational inference, Markov chain
Monte Carlo (MCMC), and other methods. In this paper, we are focused on variational inference,
because it is computationally more scalable than MCMC techniques, and hence more applicable to
wireless streaming environment. Denote Φ = {Z, Θ, v} as the set of all latent random variables,
and θ = {γ, λ1 , λ2 , υ0 , g0 , Λ0 } as the set of hyper-parameters. The objective is to infer the latent Φ
given the observations X and hyper-parameters θ. Because it is intractable to calculate the posterior
p(Φ|X, θ), variance inference is applied to approximate the posterior with a parametric family of
factorized distributions q(Φ|X, θ) of the form
q(Φ|X, θ) = q(Z)q(α; λ1 , λ2 )
C−1
Y
q(βc ; ζc,1, ζc,2)
c=1
×
C
Y
q(µc , Σc ; υ̃c , η̃c , g̃c , Λ̃c ).
(33)
c=1
Denote the variational Free Energy functional as
Z
p(Φ|X, θ)
dΦ,
F (q; X, θ) = q(Φ; θ) log
q(Φ; θ)
(34)
which is a lower bound of the original log-evidence ln p(X|θ). The optimal solution based on the
parametric family can be obtained by maximizing the Free Energy functional:
minimize F (q(θ̃); X, θ),
(35)
θ̃
where the variational parameters to be estimated include θ̃ = {λ1 , λ2 , ζc,1, ζc,2, υc , ηc , gc , Λc , }C
c=1 .
These parameters can be optimized iteratively by coordinate accent until convergence to a local
optimum. The results have been derived by Chatzis et.al. [17].
17
In this section, we will focus on incremental inference, instead of the above offline inference
(35). Incremental inference is more suitable for a streaming environment as existing in our device
fingerprinting problem. Assume that we have a buffer bucket with a limited size (e.g., N) to store
the streaming observations. When the bucket is full, it will be processed and all the observations
in the bucket will be classified. Then the bucket is cleaned and is ready to accept new incoming
observations. We may consider multiple buckets in the process line, such that when one bucket
is being processed, other buckets are ready to store new incoming observations. Denote a bucket
(i)
(i)
(i)
(i)
(i)
(i)
data as B(i) = {(x1 , s1 , t1 ), · · · , (xN , sN , tN )}, where i refers to the bucket sequence number.
The incremental inference problem is to process the incoming buckets B(1) , B(2) , · · · incrementally.
We consider a similar strategy as used in iGMM [18], [19], and propose an incremental inference
framework for iHMRF. The key components are summarized as follows:
1) Compression Phase: When the observations have been classified to different clusters, each
cluster is separated into a number of microclusters that tend to have consistent cluster labels,
even when the clusters have been reformed due to the process of new bucket data. For each
microcluster, its sufficient statistics are stored and the data points inside are discarded to save
memory space and improve computational efficiency.
2) Model Building Phase: The incremental inference will be conducted based on microclusters,
instead of data points. Some microclusters are allowed to be isolated data points.
3) Incremental Batch Update Phase: The incremental model updates based on the new bucket and
previous buckets need not to start from scratch. The model information estimated based on
previous buckets will be considered to improve the incremental update efficiency.
The technical details of the above three components are discussed in Sections VII-A, VII-B, and
VII-C, respectively.
A. Model Building Phase
This phase assumes that the observations in the current buckets have already been grouped to a set of
microclusters. When this phase is first run (as the initialization step), each observation will be regarded
a microcluster. For later iterations, the microclusters are generated from the previous iterations (see
P
Section VII-B). Denote A as a specific microcluster, nA as the cluster size, and xA = n1A xi ∈A xi .
18
The model building phase is to solve the following constrained optimization problem
Z
p(W |X; θ)
q(Φ; θ) log
minimize
dΦ
q(Φ;θ)
q(Φ; θ)
(36)
subject to q(zi ) = q(zj ), if ∃A s.t. zi , zj ∈ A,
where q(Φ; θ) is a factorized parametric form as defined in 33. Notice the difference the above problem
(36) and the traditional offline problem (35). New constraints are defined such that the data points in
a same microcluster must have identical class labels. Because each microcluster is now summarized
by its sufficient statistics, the computational efficiency is greatly improved. The above problem can
be optimized iteratively by coordinate accent until convergence to a local optimum. The solution for
each iteration can be obtained as
X
ζc,1 = 1 +
nA q(A = c)
(37)
A
ζc,2 = hαi +
C
X
X
nA q(A = k)
(38)
k=c+1 A
wc =
X
nA q(A = c)
(39)
nA q(A = c)xA
wc
(40)
nA q(A = c)(xA − x̄c )(xA − x̄c )T
(41)
A
x̄c =
Ξ =
P
A
X
A
q(A = c) ∝ p(A = c|(N)(A); γ)π̃c (β)p̃(xA |Θc ),
(42)
where N(A) refers to the neighbors of the micro-cluster A, which are defined similar to those based
on data points. Here, we use the spatial center point of a microcluster to represent its spatial location,
P
with sA = n1A si ∈A si , and use the center time to represent its time domain location, with tA =
P
1
ti ∈A ti . Note that, only the solution components that are different from the traditional offline
nA
solution are presented above. Readers are referred to [17] for the estimation of the other model
parameters that have the same result as the offline iHMRF model, including ζ̃c , Λ̃c , υ̃c , η̃c , and g̃c .
B. Compression Phase
This phase focuses on the generation of microclusters. The microclusters will be generated such
that the data points in each microcluster tend to be located in a same cluster, even when the overall
clusters have been reformed due to the process of new bucket data. To address this challenge, a
19
straightforward strategy is to generate multiple candidate clusters from different ways and then look
for the micoclusters, each of which never overlaps with more than one candidate cluster concurrently.
However, this strategy has two potential deficiencies: First, it is computationally expensive since the
number of different groups increases exponentially with the data size; Second, it does not consider
the behavior of future data points. An optimized strategy is to predict up to ∆ future points based
on the empirical distribution estimated from existing data (x1 , x2 , · · · , xT ):
p̃(xT+1 , · · · , xT +∆ ) =
TY
+∆
T
1X
δ(xi − xt ).
T
t=1
i=T +1
(43)
We define a modified Free Energy functional by taking expectation on ∆ unobserved future points as
Z
F̃ (q; X, θ) =
dxT+1 , · · · , dxT +∆ F (q; X, θ)
·p̃(xT+1 , · · · , xT +∆ ).
(44)
The solution by maximizing the above modified Free Energy functional can be obtained as
ζc,1 = 1 + (1 +
ζc,2
∆ X
)
nA q(A = c)
T A
C
∆ X X
= hαi + (1 + )
nA q(A = k)
T k=c+1 A
∆ X
)
nA q(A = c)
T A
P
nA q(A = c)xA
∆
x̄c = (1 + ) A
T
wc
X
∆
Ξ = (1 + )
nA q(A = c)(xA − x̄c )
T A
wc = (1 +
(xA − x̄c )T
q(A = c) ∝ p(A = c|(N)(A); γ)π̃c (β)p̃(xA |Θc ),
(45)
(46)
(47)
(48)
(49)
(50)
To conduct the compression phase, we first apply the model building phase to generate clusters.
Then for each candidate cluster, we split it into two clusters along its principal component, and refine
the clusters based on the above update rules 45, until convergence. The gain on the free energy
function is denoted as ∆F̃ (q; X, θ). The cluster with the largest ∆F̃ (q; X, θ) will be selected as the
final splitting cluster. Iterate the process until convergence, e.g., the gain ∆F̃ (q; X, θ) is smaller than
20
a predefined threshold or the consumed memory is greater than the memory space limit.
C. Incremental Batch Update Phase
This phase assumes that all previous bucket data have been processed, and we have obtained the
estimate variational parameters {ηc , Λc , υc , gc , ζc,1:2, λ1:2 , wc , x̄c , Ξc }C
c=1 . Suppose a new bucket data
have been arrived, and it is necessary classify the new bucket data points and update all existing
(n)
(n)
(n)
(n)
(n)
(n)
clusters. Denote the new bucket data as {(x1 , s1 , t1 ), · · · , (xN , sN , tN )}. The incremental Batch
update phase can be described as
ζ̃c,1 = ζc,1 +
N
X
(n)
q(zi
= c)
(51)
i=1
ζ̃c,2 = ζc,2 +
C
N
X
X
(n)
q(zi
= k)
(52)
k=c+1 i=1
w̃c = wc +
N
X
(n)
q(zi
= c)
(53)
i=1
x̄c =
x̄c wc +
Ξ̃ = Ξ +
PN
N
X
(n)
q(zi
w̃c
i=1
(n)
q(zi
(n)
= c)xi
(n)
= c)(xi
(54)
− x̄c )
i=1
(n)
(xi
(n)
q(zi
− x̄c )T
(n)
= c) ∝ p(zi
(55)
(n)
= c|N(zi ))π̃c (β)p̃(xi |Θc ).
(56)
(n)
The basic idea is to apply Equation (56) to estimate q(zi ), and apply Equations (51) to (55) to
update the variational parameters ζ̃c,1:2 , w̃c , x̄c , and Ξ̃. The other parameters that are consistent with
the offline iHMRF model are then updated by the equations derived in [17].
VIII. S IMULATION R ESULT
This section presents an extensive simulation study to validate the effectiveness and efficiency of
our proposed techniques, compared with existing solutions, such as Gaussian Mixture Model (GMM)
and infinite Gaussian Mixture model (iGMM) [5]. For our fingerprinting framework, we studied the
performances of two inference algorithms, including the offline variational inference algorithm [17]
and our proposed online (incremental) inference algorithm.
21
A. Simulation Setup
The simulation data generator includes two components. The first component is the generation
of time-independent features. The same simulator design as used in [5] were applied to generate
time-independent features. Basically, a number of devices will be chosen randomly in an area of
40 × 40 in the time-independent feature space, with variances of the clusters chosen random in the
range from 0 to 1. We considered two time-independent features, such that the data can be easily
visualized. The second component is the generation of time dependent features. We considered RSS
features and assumed that the collected RSS features have been triangulated to three dimensional
spatial coordinates. This is appropriate for mobile devices, because for different time periods users
may travel to different spatial regions and different Access Points (AP) will be able to collect the
related RSS traces data. By converting the RSS features to spatial coordinates, we do not need to
consider the issue of missing values for different access points. We used UdelModels to generate
mobile device traces data, which is a widely used simulator for generating human trajectory data
[20]. Changes of transmission power were simulated by shifting a trace segment with a randomly
selected distance and direction.
We considered four major metrics to evaluate the effectiveness of our framework, including precision, recall, F-measure, and rand index (IR). These metrics are defined based on true positive rate
(TP), false positive rate (FP), false native rate (FN), and true negative rate (TN), as interpreted in
Table II. These metrics are defined as P recision = T P/(T P + F P ); Recall = T P/(T P + F N); F −
Measure =
P recision×Recall
;
P recision+Recall
and Rand − Index(RI) =
T P +T N
.
T P +T N +F N +F P
TABLE II: Definition of TP, FP, FN, and TN
Same Cluster Different Clusters
Same Class
Different Classes
TP
FP
FN
TN
We used UdelModels to generate twelve simulation datasets to cover a variety of scenarios,
including indoor and outdoor environments. The basic features of these data sets are summarized in
the following table III. For each setting, we generated five different realizations, in order to calculate
the uncertainty (standard deviation) of the classification performance. For all our comparison results,
we reported the mean and standard deviation values for each method, in order to mitigate potential
random effects.
22
Figure 4 shows the spatial distributions of two simulation datasets under different scenarios, with
two time-independent and two time-dependent features. For both datasets, the wireless device carriers
are all pedestrians, but the left one has a stable RSS sample rate, and the right one has an unstable
sample rate. Each dataset has 15 clusters (devices). For each device, we generated a sequence of
8000 time-stamped observations at 1-minute interval. For each observation, two time-independent
features and two time-dependent features were generated, as illustrated in the left and right plots in
Figure 4 (a). As shown from the plots, the clusters are not well separable and have overlaps in the
time-independent feature space (see the left plot), and also not well separable in the time-dependent
feature space (see the right plot). If these features are not clustered jointly, it can be seen that it is very
difficult to differentiate different clusters. However, as shown in our followup experiments, by jointly
considering all the features into the clustering (fingerprinting) process, our approach significantly
improves the accuracy of the clustering (fingerprinting) process. Note that, because there are no
enough colors that can be used to display 15 different clusters, in our visualization in Figure 4, we
randomly selected a color for each cluster. Therefore, it is observed that some clusters happened to
have the same color, but we demonstrate that the clusters are not well separable from each other
without any ambiguity.
TABLE III: Simulation Data Settings
Description
1 Building 10 Floors
Real City (Chicago9B1k)
# of Penetrations (Peds)
# of Cars
5, 10, 15
5, 10, 15
5, 10, 15
5, 10, 15
We compared our framework with two existing approaches, including GMM and iGMM. For our
framework, we employed two inference algorithms, including the offline variational inference algorithm for iHMRF [17], abbreviated as iHMRF-VI, and our proposed incremental inference algorithm,
abbreviated as Inc-iHMRF-VI. For GMM, it is required to predefine the number of clusters. In our
simulation study, we set the value as the true number of clusters (devices), in order to study the
best performance that a GMM model could achieve. iGMM is a nonparametric method. Although it
still needs to set the number of clusters, iGMM is able to automatically determine the number of
clusters. Therefore, we randomly set the initial cluster number. All the other hyperparameters were
set such that the corresponding parameters are uniform-distributed. Similar strategies were used for
the nonparametric methods iHMRF-VI and Inc-iHMRF-VI. One more setting in both iHMRF and
23
(a) Chicago9B1k Data with Only Pedestrians
(b) Chicago9B1k Data with Unstable RSS Rates
Fig. 4: Spatial Distribution of Simulation Data
Inc-iHMRF-VI is to define spatio-temporal neighborhood relationships. We defined neighbors as the
data points that are 5 nearest spatial neighbors to each other and have the time stamp distance smaller
than 50. These settings can be loosely decided and we observed that the resulting performance is not
rapidly varied. We set the memory bound and the bucket size of Inc-iHMRF-VI to 2000 and 2000,
respectively. We observed similar patterns based on different settings of these two parameters.
TABLE IV: Simulation Results Based on UdelModels with 1 Building 10 Floors
Methods
# of Devices
iHMRF-VI
5
10
15
0.97 (0.02) 0.91 (0.13) 0.93 (0.07)
0.72 (0.13) 0.81 (0.13) 0.76 (0.11)
0.73 (0.09) 0.82 (0.06) 0.77 (0.07)
0.96 (0.04)
0.93 (0.04)
0.96 (0.01)
Inc-iHMRF-VI
5
10
15
0.88 (0.10) 0.94 (0.05) 0.91 (0.05)
0.65 (0.28) 0.85 (0.14) 0.72 (0.23)
0.51 (0.13) 0.79 (0.08) 0.62 (0.12)
0.95 (0.02)
0.90 (0.09)
0.92 (0.02)
iGMM-VI
5
10
15
0.86 (0.09) 0.44 (0.15) 0.57 (0.15)
0.73 (0.11) 0.43 (0.10) 0.54 (0.10)
0.56 (0.01) 0.30 (0.07) 0.38 (0.06)
0.80 (0.09)
0.91 (0.03)
0.92 (0.01)
GMM-EM
5
10
15
0.91 (0.15) 0.85 (0.22) 0.86 (0.16)
0.72 (0.14) 0.83 (0.13) 0.77 (0.11)
0.64 (0.11) 0.77 (0.06) 0.70 (0.08)
0.90 (0.10)
0.93 (0.04)
0.94 (0.01)
Precision
Recall
F-Measure Relative Index (RI)
24
TABLE V: Simulation Results Based on UdelModels - Chicago9Blk - with Pedestrians and Cars
# of Devices
Methods
Precision
Recall
F-Measure Relative Index (RI)
iHMRF-VI
5 Peds, 5 Cars 0.99 (0.01) 0.98 (0.01) 0.99 (0.01)
10 Peds, 10 Cars 0.91 (0.10) 0.99 (0.10) 0.95 (0.05)
15 Peds, 15 Cars 0.90 (0.09) 0.97 (0.02) 0.94 (0.05)
0.99 (0.01)
0.99 (0.01)
0.99 (0.01)
Inc-iHMRF-VI
5 Peds, 5 Cars 0.98 (0.02) 1.00 (0.00) 0.99 (0.01)
10 Peds, 10 Cars 0.80 (0.13) 0.97 (0.04) 0.87 (0.08)
15 Peds, 15 Cars 0.57 (0.07) 0.92 (0.08) 0.70 (0.07)
0.99 (0.01)
0.96 (0.02)
0.93 (0.02)
iGMM-VI
5 Peds, 5 Cars 0.90 (0.12) 0.29 (0.05) 0.44 (0.07)
10 Peds, 10 Cars 0.67 (0.08) 0.31 (0.06) 0.42 (0.06)
15 Peds, 15 Cars 0.63 (0.06) 0.29 (0.06) 0.40 (0.06)
0.80 (0.06
0.89 (0.02)
0.92 (0.01)
GMM-EM
5 Peds, 5 Cars 0.92 (0.13) 0.89 (0.06) 0.89 (0.07)
10 Peds, 10 Cars 0.69 (0.08) 0.79 (0.11) 0.73 (0.09)
15 Peds, 15 Cars 0.69 (0.12) 0.78 (0.06) 0.72 (0.08)
0.95 (0.03)
0.93 (0.03)
0.95 (0.02)
TABLE VI: Simulation Results Based on UdelModels - Chicago9Blk - with Only Cars
Methods
# of Devices
iHMRF-VI
5 Cars
10 Cars
15 Cars
0.95 (0.03) 0.59 (0.08) 0.72 (0.06)
0.83 (0.09) 0.55 (0.05) 0.66 (0.05)
0.68 (0.08) 0.53 (0.09) 0.59 (0.08)
0.89 (0.02)
0.93 (0.01)
0.94 (0.01)
Inc-iHMRF-VI
5 Cars
10 Cars
15 Cars
0.89 (0.12) 0.98 (0.02) 0.93 (0.02)
0.73 (0.11) 0.77 (0.09) 0.75 (0.06)
0.56 (0.08) 0.83 (0.06) 0.66 (0.07)
0.97 (0.04)
0.93 (0.02)
0.93 (0.02)
iGMM-VI
5 Cars
10 Cars
15 Cars
0.82 (0.08) 0.30 (0.07) 0.44 (0.07)
0.65 (0.10) 0.32 (0.07) 0.43 (0.08)
0.55 (0.06) 0.29 (0.05) 0.38 (0.05)
0.82 (0.02)
0.89 (0.01)
0.92 (0.01)
GMM-EM
5 Cars
10 Cars
15 Cars
0.91 (0.12) 0.87 (0.13) 0.89 (0.12)
0.79 (0.07) 0.81 (0.09) 0.89 (0.08)
0.73 (0.04) 0.79 (0.09) 0.76 (0.06)
0.95 (0.05)
0.95 (0.02)
0.96 (0.01)
Precision
Recall
F-Measure Relative Index (RI)
B. Comparisons on Precision, Recall, and F-Measure
For the simulation data, we considered two scenarios, including indoors and outdoors. For indoors,
we generated simulation data with the number of devices 5, 10, and 15, and the sample rate one reading
every 20 seconds. The results are shown in Table IV. For outdoors, we simulated mobile traces of a
real downtown area in Chicago with 5, 10, and 15 penetrations. The results are shown in Table V.
The results on the scenarios with 5, 10, and 15 cars are shown in table VI. Table VII shows the results
with concurrent pedestrians and cars. From all these results, we observe that our framework based on
the iHMRF model outperformed GMM and iGMM in the majority of cases, especially compared with
25
iGMM. Recall that the GMM method used the true number of clusters (devices) as the initial setting.
Its according performance should represent the close-to-the-best performance of general clustering
algorithms based on time-independent features.
However, we did notice that as shown in table VI, when the mobile devices are vehicles, the
GMM’s performance was comparable to our methods. But our methods still outperformed iGMM.
This pattern is potentially related to the assumption of the iHMRF model. That is, data points that are
spatially and temporally close tend to have consistent class labels. Vehicles are moving mush faster
than pedestrians and tend to have lower sample rates and have more overlaps on their spatial traces.
When devices have more overlaps spatially and temporally, the overlapped spatial trace features can
not be well used to distinguish different mobile devices anymore. However, there still exist some
trace segments that are not overlapped together, which can be regarded as useful information for the
classification process. It potentially explains why iHMRF’s performance was degraded in this situation
but still performed better than iGMM.
TABLE VII: Simulation Results Based on UdelModels - Chicago9Blk - with Only Pedestrians
Methods
# of Devices
iHMRF-VI
5 Peds
10 Peds
15 Peds
0.98 (0.04) 0.83 (0.13) 0.90 (0.09)
0.92 (0.08) 0.80 (0.13) 0.85 (0.10)
0.91 (0.05) 0.86 (0.05) 0.88 (0.04)
0.96 (0.03)
0.97 (0.02)
0.98 (0.00)
Inc-iHMRF-VI
5 Peds
10 Peds
15 Peds
0.86 (0.10) 0.92 (0.08) 0.88 (0.07)
0.71 (0.08) 0.89 (0.07) 0.79 (0.05)
0.61 (0.08) 0.92 (0.02) 0.72 (0.06)
0.95 (0.03)
0.95 (0.03)
0.95 (0.01)
iGMM-VI
5 Peds
10 Peds
15 Peds
0.82 (0.12) 0.31 (0.05) 0.44 (0.06)
0.73 (0.11) 0.36 (0.08) 0.48 (0.10)
0.63 (0.05) 0.35 (0.06) 0.45 (0.05)
0.85 (0.01)
0.92 (0.01)
0.94 (0.00)
GMM-EM
5 Peds
10 Peds
15 Peds
0.73 (0.15) 0.90 (0.07) 0.80 (0.11)
0.69 (0.12) 0.84 (0.09) 0.75 (0.11)
0.68 (0.11) 0.86 (0.04) 0.75 (0.08)
0.91 (0.05)
0.94 (0.03)
0.96 (0.02)
Precision
Recall
F-Measure Relative Index (RI)
In overall all, both iHMRF-VI and Inc-iHMRF-VI achieved comparable accuracies, but iHMRF-VI
performed slightly better. This can be interpreted as the results of data compression by the use of
microclusters in Inc-iHMRF-VI. For all the simulation data sets, the average data size is around 8000
observations. In our implementation, we set the memory bound to 2000 observations. That means,
we compressed 8000 observations into 2000 microclusters, which greatly reduced the computational
cost and the required memory size, but with slight sacrifices of the accuracy.
26
C. Impacts of Instable RSS Collection Rates
We evaluated the impacts of instable RSS collection rates baesd on the ChicagoBlk pedestrians
data set. We randomly selected 50 percent of devices, segmented each selected device trace into eight
segments, and then randomly removed 50 percent of the segments. This process leads to discontinuous
RSS trace data. The classification results based on those modified data are shown in table VIII, and a
visualization of the generated simulation data is shown in Figure 4. We observe that iHMRF-VI and
Inc-iHMRF-VI performed the best in the majority of cases, which is consistent with our observations
in previous results. However, by comparing Table VIII and Table VIII, we observe that unstable RSS
rates slightly degraded the accuracies. This is potentially due to the reduction of samples size, since
we have removed 50 percent of observations from 50 percent randomly selected devices. However,
as long as each segment is still composed of spatial and temporally adjacent data points, the iHMRF
model can be applied to capture the corresponding autocorrelations.
D. Impacts of Transmission Power Changes
Studies have shown that attackers may hide their actual locations by periodically changing the
transmission powers of their mobile devices [21]. To simulate this behavior, we used the ChicagoBlk
pedestrians data set. Fifty percent of devices were selected, the trace of each selected device was
segmented into 8 same length, and fifty percent of these pieces were shifted to random directions
with random spatial distances. The corresponding classification results are shown in Table IX. We
observe that the changes of transmission power did not have significant impacts on the accuracies. One
potential interpretation is that the changes of transmission power will increase the spatial entropy,
making the devices’ corresponding traces more separated from other traces. This will reduce the
potential overlaps between device traces, and could even help improve the accuracies of iHMRF-VI
and Inc-iHMRF-VI.
E. Comparison on Time Costs
We evaluated the time costs of the four algorithms on three data sets, including ”1 Building 10
Floors” (7224 observations), ”Chicago9B1k with 10 Pedestrians and 10 Cars” (4525 observations),
and ”Chicago9B1k with 10 Pedestrians” (6000 observations). We set bucket size to 2000. That means,
the data will be processed bucket by bucket, 2000 observations each time. The results are summarized
in Figure 5. The X axis refers to the titles of the three data sets and the Y axis refers to running
27
TABLE VIII: Unstable RSS Rates (UdelModels - Chicago9Blk - with Only Pedestrians)
Methods
# of Devices
iHMRF-VI
5 Peds
10 Peds
15 Peds
0.91 (0.11) 0.77 (0.08) 0.83 (0.05)
0.96 (0.05) 0.82 (0.11) 0.88 (0.08)
0.84 (0.10) 0.83 (0.07) 0.83 (0.07)
0.93 (0.02)
0.98 (0.02)
0.98 (0.01)
Inc-iHMRF-VI
5 Peds
10 Peds
15 Peds
0.91 (0.17) 0.88 (0.15) 0.89 (0.15)
0.77 (0.13) 0.86 (0.09) 0.81 (0.11)
0.62 (0.10) 0.92 (0.02) 0.73 (0.07)
0.97 (0.04)
0.95 (0.03)
0.95 (0.02)
iGMM-VI
5 Peds
10 Peds
15 Peds
0.82 (0.13) 0.32 (0.07) 0.46 (0.09)
0.71 (0.10) 0.31 (0.05) 0.43 (0.07)
0.62 (0.09) 0.33 (0.06) 0.43 (0.06)
0.83 (0.02)
0.91 (0.01)
0.94 (0.01)
GMM-EM
5 Peds
10 Peds
15 Peds
0.75 (0.16) 0.90 (0.06) 0.81 (0.10)
0.67 (0.08) 0.81 (0.07) 0.73 (0.04)
0.71 (0.04) 0.82 (0.06) 0.76 (0.03)
0.90 (0.06)
0.93 (0.02)
0.96 (0.00)
Precision
Recall
F-Measure Relative Index (RI)
TABLE IX: Change of Transmission Power (UdelModels - Chicago9Blk - with Only Pedestrians)
Methods
# of Devices
iHMRF-VI
5 Peds
10 Peds
15 Peds
0.98 (0.02) 0.70 (0.07) 0.82 (0.06)
0.95 (0.06) 0.77 (0.06) 0.85 (0.06)
0.93 (0.04) 0.79 (0.05) 0.85 (0.02)
0.94 (0.02)
0.97 (0.01)
0.98 (0.00)
Inc-iHMRF-VI
5 Peds
10 Peds
15 Peds
0.76 (0.14) 0.98 (0.03) 0.85 (0.09)
0.74 (0.12) 0.88 (0.08) 0.80 (0.09)
0.58 (0.08) 0.86 (0.69) 0.69 (0.06)
0.93 (0.04)
0.96 (0.02)
0.95 (0.01)
iGMM-VI
5 Peds
10 Peds
15 Peds
0.83 (0.13) 0.31 (0.05) 0.45 (0.06)
0.72 (0.11) 0.35 (0.07) 0.47 (0.09)
0.65 (0.07) 0.35 (0.04) 0.45 (0.05)
0.85 (0.02)
0.92 (0.01)
0.94 (0.01)
GMM-EM
5 Peds
10 Peds
15 Peds
0.74 (0.11) 0.89 (0.04) 0.81 (0.07)
0.63 (0.13) 0.83 (0.08) 0.71 (0.11)
0.69 (0.04) 0.85 (0.04) 0.76 (0.02)
0.91 (0.04)
0.93 (0.03)
0.96 (0.00)
Precision
Recall
F-Measure Relative Index (RI)
duration (seconds). We can observe that our proposed incremental inference algorithm Inc-iHMRF-VI
is much more efficient than the offline inference algorithm iHMRF-VI. Our algorithm Inc-iHMRF-VI
is even faster than iGMM. This indicates an significant improvement on the computational efficiency.
The savings on time cost by Inc-iHMRF-VI will become greater when the data size increases. Note
that, GMM has the lowest time cost. However, since GMM does not need to automatically estimate
the number of clusters. Its time complexity should be much smaller than iGMM and iHMRF.
28
160
iHMRF−VI
iGMM
GMM
Inc−iHMRF−VI
140
120
100
80
60
40
20
0
1 Building 10 Floors
Chicago9B1k−Peds−Cars
Chicago9B1k−Peds
Fig. 5: Comparison on Time Costs (Seconds)
F. A Case Study on Detecting Masquerade Attacks
This section presents a case study on masquerade attacks detection, which is one of the most
dangerous attack types. A masquerade attack refers to the attacking behavior where an attacker
impersonates an authorized user of a system by using a faked identity (e.g., MAC address) in order
to gain access to unauthorized personal resources. In order to simulate this attack behavior, we used
the ChicagoBlk pedestrians data set and the 1-Building-10-Floors data set, and randomly selected
k clusters and set their cluster identities into an identical cluster identity. By using fingerprinting
techniques, this type of attackers can be identified if we discover that multiple clusters share the same
identity information. Here k refers to the number of masquerade devices. We considered different
settings of k, from 3 to 6, and evaluated the related detection rates based on different detection
methods. The results are summarized in Table ??. The results indicate that our framework (by either
iHMRF-VI or Inc-iHMRF-VI) achieved the highest detection rate in most cases. The GMM method
performed slightly than Inc-iHMRF-VI and iHMRF-VI. However, here we used the true number of
clusters as the initial setting for the GMM method. In real applications, where the actual number is
unknown, the GMM method will perform much worse.
IX. C ONCLUSION
AND
F UTURE WORKS
Device fingerprinting is a fundamental problem for wireless network security. Passive fingerprinting
techniques are effective since they are designed based on device-dependent features (e.g., RSS, AOD,
and TOA) that attackers can not manipulate. However, existing solutions can only support either timedependent or time-independent features, but no methods can handle both. This paper presents the first
29
Peds Cars Att. iHMRF Inc-iHMRF iGMM GMM
-VI
-VI
# # #
10
10
10
10
15
15
15
15
10
10
10
10
15
15
15
15
3
4
5
6
3
4
5
6
0.98
0.97
0.97
0.97
0.97
0.97
0.97
0.97
0.81
0.84
0.87
0.88
0.86
0.85
0.86
0.87
0.46
0.55
0.62
0.67
0.62
0.62
0.64
0.66
0.76
0.81
0.84
0.86
0.85
0.85
0.86
0.87
Peds Cars Att. iHMRF Inc-iHMRF iGMM GMM
-VI
-VI
# # #
10
10
10
10
15
15
15
15
0
0
0
0
0
0
0
0
3
4
5
6
3
4
5
6
0.71
0.85
0.89
0.93
0.80
0.85
0.88
0.95
0.86
0.89
0.91
0.98
0.60
0.86
0.86
0.91
0.44
0.72
0.73
0.87
0.36
0.56
0.65
0.78
0.76
0.94
0.88
0.95
0.68
0.87
0.88
0.93
TABLE X: Detection Rates for Masquerade TABLE XI: Detection Rates for Masquerade
Attacks Based on UdelModels - Chicago9B1k Attacks on UdelModels - Chicago9B1k - 1
- Pedestrains
Building 10 Floors
unified fingerprinting approach based on infinite hidden Markov random field (iHMRF). It is able to
model both time-independent and time-dependent features concurrently and is able to automatically
detect the number of devices. We present a novel incremental classification algorithm that is suitable
for a streaming environment with limited memory and computational resources. Extensive numerical
analysis further validated the effectiveness and efficiency of our proposed approach. For our future
work, we are planning to evaluate the performance of our proposed approach in real life devices.
We will also extend our approach to handle other related wireless security problems, such as the
identification of primary and secondary users to prevent dynamic spectrum access and malicious
behavior attacks in cognitive radio networks.
R EFERENCES
[1] T. M. Gil and M. Poletto, “Multops: a data-structure for bandwidth attack detection,” in Proceedings of the 10th conference on
USENIX Security Symposium - Volume 10, ser. SSYM’01. Berkeley, CA, USA: USENIX Association, 2001, pp. 3–3.
[2] J. R. Douceur, “The sybil attack,” in Revised Papers from the First International Workshop on Peer-to-Peer Systems, ser. IPTPS
’01.
London, UK: Springer-Verlag, 2002, pp. 251–260.
[3] D. B. Faria and D. R. Cheriton, “Detecting identity-based attacks in wireless network using signalprints,” in Proceedings of the
2006 ACM Workshop on Wireless Security (WiSe ’06).
ACM Press, September 2006, pp. 43–52.
[4] S. Bratus, C. Cornelius, D. Kotz, and D. Peebles, “Active behavioral fingerprinting of wireless devices,” in Proceedings of the
first ACM conference on Wireless network security, ser. WiSec ’08.
New York, NY, USA: ACM, 2008, pp. 56–61.
[5] N. T. Nguyen, G. Zheng, Z. Han, and R. Zheng, “Device fingerprinting to enhance wireless security using nonparametric bayesian
method.” in INFOCOM. IEEE, 2011, pp. 1404–1412.
[6] V. Brik, S. Banerjee, M. Gruteser, and S. Oh, “Wireless device identification with radiometric signatures,” in Mobicom, 2008.
[7] Y. Sheng, K. Tan, G. Chen, D. Kotz, and A. Campbell, “Detecting 802.11 mac layer spoofing using received signal strength.” in
INFOCOM. IEEE, 2008, pp. 1768–1776.
30
[8] Y. Chen, W. Trappe, and R. P. Martin, “Detecting and localizing wireless spoofing attacks,” in Proceedings of the Fourth Annual
IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks,2007, pp. 193–202.
[9] J. Yang, Y. Chen, W. Trappe, and J. Cheng, “Determining the number of attackers and localizing multiple adversaries in
wireless spoofing attacks.” in INFOCOM. IEEE, 2009, pp. 666–674.
[10] R. Chen and J. Park, “Ensuring trustworthy spectrum sensing in cognitive radio networks,” in 1st IEEE Workshop on Networking
Technologies for Software Defined Radio Networks, Sept. 2006, pp. 110–119.
[11] S. M. Y. Zhao, J. H. Reed and K. K. Bae, “Overhead analysis for radio environment map-enabled cognitive radio networks,” in
1st IEEE Workshop on Networking Technologies for Software Defined Radio Networks, Sept. 2006, pp. 18–25.
[12] X. J. L. H. Caidan Zhao, Liang Xie and Y. Yao, “A phy-layer authentication approach for transmitter identification in cognitive
radio networks,” in International Conference on Communications and Mobile Computing, vol. 2, Apr. 2010, pp. 154–158.
[13] J. Yang, Y. Chen, and W. Trappe, “Detecting spoofing attacks in mobile wireless environments,” in SECON, 2009, pp. 1–9.
[14] K. Zeng, K. Govindan, D. Wu, and P. Mohapatra, “Identity-based attack detection in mobile wireless networks.” in INFOCOM.
IEEE, 2011, pp. 1880–1888.
[15] S. Venkatesh, “The design and modeling of ultra-wideband position-location networks,” Ph.D. dissertation, Bradley Dept. of
Electrical and Computer Engineering, Virginia Tech, VA, USA, 2007.
[16] N. T. Nguyen, R. Zheng, and Z. Han, “On identifying primary user emulation attacks in cognitive radio systems using
nonparametric bayesian classification,” IEEE Transactions on Signal Processing, vol. 60, no. 3, pp. 1432–1445, 2012.
[17] S. P. Chatzis and G. Tsechpenakis, “The infinite hidden markov random field model,” Trans. Neur. Netw., vol. 21, pp.
1004–1014, June 2010.
[18] K. Kurihara, M. Welling, and N. A. Vlassis, “Accelerated variational dirichlet process mixtures.” in NIPS’06, 2006, pp. 761–768.
[19] R. Gomes, M. Welling, and P. Perona, “Incremental learning of nonparametric bayesian mixture models.” in CVPR.
IEEE
Computer Society, 2008.
[20] J. Kim, V. Sridhara, and S. Bohacek, “Realistic mobility simulation of urban mesh networks,” Ad Hoc Netw., vol. 7, pp.
411–430, March 2009.
[21] C. Karlof and D. Wagner, “Secure routing in wireless sensor networks: attacks and countermeasures,” Elsevier: Ad Hoc Networks,
vol. 1, pp. 293–315, 2003.
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Download PDF

advertising