Estimation sous contraintes de communication

Estimation sous contraintes de communication
THÈSE
Pour obtenir le grade de
DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE
Spécialité : signal, image, parole, télécommunications (SIPT)
Arrêté ministériel : 7 août 2006
Présentée par
M. Rodrigo CABRAL FARIAS
Thèse dirigée par M. Jean-Marc BROSSIER
préparée au sein du
laboratoire Grenoble, images, parole, signal, automatique
(GIPSA-lab)
dans l’école doctorale d’électronique, électrotechnique,
automatique et traitement du signal (EEATS)
Estimation sous contraintes de
communication : algorithmes et
performances asymptotiques
Thèse soutenue publiquement le 17/07/2013,
devant le jury composé de :
M. Eric MOULINES
Professeur Télécom ParisTech, Rapporteur
M. Jean-Yves TOURNERET
Professeur ENSEEIHT, Rapporteur
M. Josep VIDAL
Professeur Universitat Politènica de Catalunya, Examinateur
M. Jean-François BERCHER
Professeur associé ESIEE, Examinateur
M. Eric MOREAU
Professeur Université du Sud Toulon-Var, Examinateur
M. Jean-Marc BROSSIER
Professeur Grenoble-INP, Directeur de thèse
Acknowledgements
I would like to thank the Erasmus program Euro Brazilian Windows II for funding this thesis
and the director of the GIPSA-lab, Jean-Marc Thiriet, for welcoming me in his laboratory. I
would like to express my gratitude to my thesis director, Jean-Marc Brossier, for allowing me
to be a free researcher during my thesis, pointing carefully the mistakes in some of my strange
ideas and motivating me to look deeper when the ideas were not that strange. Special thanks
are extended to the professors Eric Moisan, Laurent Ros, Olivier Michel and Steeve Zozor,
who helped me countless times.
Also, I would like to thank the members of my jury Eric Moreau, Eric Moulines, Jean-Yves
Tourneret, Josep Vidal and Jean-François Bercher for their precious remarks on my work and
for all the insights on how I can extend it.
During these three years, I bothered a non negligible quantity of PhD students in the
laboratory. I would like to acknowledge those who have survived to this torture for their
patience. Thanks to the survivors: Aude, Damien, Douglas, Gailene, Humberto, Jonathan,
Robin, Wei, Xuan Vu and Zhong Yang.
I am particularly grateful for the assistance given by Vio, who was able to encourage me
to move forward, even when I felt the pressure was excessive.
To my family, mãe, Dea e vó, I can only express that it was very difficult to stay such a
long time far from you.
3
Contents
Notations
11
Abbreviations and acronyms
13
Assumptions
15
Introduction
17
I
Estimation based on quantized measurements:
algorithms and performance
1 Estimation of a constant parameter
25
31
1.1
Measurement model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.2
Maximum likelihood, Cramér–Rao bound and Fisher information . . . . . . . . 37
1.3
Binary quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.4
Multibit quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
1.5
Adaptive quantizers: the high complexity fusion center approach . . . . . . . . 60
1.6
Chapter summary and directions . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2 Estimation of a varying parameter
75
2.1
Parameter and measurement model . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2
Optimal estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.3
Particle Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.4
Evaluation of the estimation performance . . . . . . . . . . . . . . . . . . . . . 87
2.5
Quantized innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
2.6
Chapter summary and directions . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3 Adaptive quantizers for estimation
105
3.1
Parameter model and measurement model . . . . . . . . . . . . . . . . . . . . . 108
3.2
General estimation algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.3
Estimation performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.4
Optimal algorithm parameters and performance . . . . . . . . . . . . . . . . . . 125
5
6
Contents
3.5
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.6
Adaptive quantizers for estimation: extensions . . . . . . . . . . . . . . . . . . . 149
3.7
Chapter summary and directions . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Conclusions of Part I
II
Estimation based on quantized measurements:
high-rate approximations
4 High-rate approximations of the FI
169
171
177
4.1
Asymptotic approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.2
Bit allocation for scalar location parameter estimation . . . . . . . . . . . . . . 200
4.3
Generalization with the f –divergence . . . . . . . . . . . . . . . . . . . . . . . . 207
4.4
Chapter summary and directions . . . . . . . . . . . . . . . . . . . . . . . . . . 213
Conclusions of Part II
217
Conclusions
219
Main conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219
Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
A Appendices
223
A.1 Why? - Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223
A.2 More? - Further details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
A.3 How? - Algorithms and implementation issues . . . . . . . . . . . . . . . . . . . 248
B Résumé détaillé en français (extended abstract in French)
253
B.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
B.2 Estimation et quantification : algorithmes et performances . . . . . . . . . . . . 261
B.3 Estimation et quantification : approximations à haute résolution . . . . . . . . 286
B.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Bibliography
295
List of Figures
1
Estimation using a sensing system. . . . . . . . . . . . . . . . . . . . . . . . . . 20
2
Scalar remote sensing problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3
Estimation based on quantized measurements. . . . . . . . . . . . . . . . . . . . 21
1.1
Quantizer function Q (Yk ) with NI quantization intervals and uniform threshold
spacing with length ∆. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.2
Scheme representing the general measurement/estimation system. . . . . . . . . 42
1.3
Quantity related to the CRB for quantized measurements B and its upper
bound B̄. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.4
Function M × δ 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
1.5
PDF for the uniform/Gaussian distribution. . . . . . . . . . . . . . . . . . . . . 49
1.6
CRBB
q and simulated MLE MSE for uniform/Gaussian noise. . . . . . . . . . . 50
1.7
CRBB
q and simulated MLE MSE for GGD noise. . . . . . . . . . . . . . . . . . 52
1.8
FI as a function of the normalized difference between the central threshold and
the true parameter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.1
Scheme representing the adjustable quantizer. . . . . . . . . . . . . . . . . . . . 110
3.2
Block representation of the estimation scheme. . . . . . . . . . . . . . . . . . . 111
3.3
ODE bias approximation and simulated bias for the estimation of a Wiener
process with the adaptive algorithm. . . . . . . . . . . . . . . . . . . . . . . . . 118
3.4
Adaptive algorithm loss of estimation performance due to quantization of measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
3.5
Quantization loss of performance for GGD noise and NB ∈ {2, 3, 4, 5} when Xk
is constant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
3.6
Quantization loss of performance for STD noise and NB ∈ {2, 3, 4, 5} when Xk
is constant. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
3.7
Simulated quantization performance loss for a Wiener process Xk with σw =
0.001. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
3.8
Comparison of simulated and theoretical losses in the Gaussian and Cauchy
noise cases when estimating a wiener process with σw = 0.1 or σw = 0.001. . . . 142
3.9
Comparison of simulated and theoretical losses in the Gaussian and Cauchy
noise cases for estimating a Wiener process with constant mean drift. . . . . . . 143
3.10 Minimum CRB and simulated MSE for the adaptive algorithm with decreasing
gain and for the adaptive algorithm based on the MLE. . . . . . . . . . . . . . 144
7
8
List of Figures
3.11 Asymptotic MSE for the optimal estimator of a Wiener process with small σw
and simulated MSE for the adaptive algorithm with constant gain and for the
PF with dynamic central threshold. . . . . . . . . . . . . . . . . . . . . . . . . . 146
3.12 Scheme representing the adjustable quantizer. The offset and gain are adjusted
dynamically. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
3.13 CRB for estimating a location parameter of Gaussian and Cauchy distributions
based on quantized and continuous measurements and simulated MSE for the
estimation of the location parameter with the adaptive location-scale parameter
estimator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
3.14 Scheme representing the sensor network with a fusion center. . . . . . . . . . . 158
3.15 Cramér–Rao bound and simulated MSE for the adaptive algorithm in the fusion
center approach with different numbers of sensors and 4 quantization intervals. 162
3.16 Cramér–Rao bound and simulated MSE for the adaptive algorithm with different numbers of sensors and fixed total number of bits. . . . . . . . . . . . . . . 163
4.1
Interval densities for the estimation of a GGD location parameter. . . . . . . . 192
4.2
Simulated MSE for the adaptive algorithm with nonuniform thresholds considering Gaussian and Cauchy measurement distributions. . . . . . . . . . . . . . . 199
4.3
Water-filling solutions for multicarrier modulation power allocation and for rate
constrained sensing system bit allocation. . . . . . . . . . . . . . . . . . . . . . 206
A.1 Geometric scheme to show that the probability of the interval A0 + A1 is less
than the probability of the exterior region of the left quarter circle C1 . . . . . . 225
A.2 Log-likelihood function for Cauchy noise distribution. . . . . . . . . . . . . . . . 235
A.3 An iteration of the binary threshold update in a finite grid. . . . . . . . . . . . 238
List of Tables
4.1
FI for the estimation of Gaussian and Cauchy location parameters based on
quantized measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
4.2
Functions characterizing the GFD for different inference problems and interval
densities maximizing the inference performance based on quantized measurements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
9
Notations
Vector and sequences
Sets
N
Natural numbers
X (boldface)
Vector or matrix
R
Real numbers
Sup⊤
Transposition
Sub+
Set with only positive elements
diag (X)
Diagonal matrix from X
Sup⋆
Set including zero
X1:N
Sequence X1 , X2 , · · · , XN
Probability
X (uppercase) Random variable
Var (X)
Variance
x (lowercase)
Realization or parameter
VarX| (X)
Conditional variance
P ()
Probability measure
f () or p () Probability density function
P (|)
Conditional probability
f (|)
Conditional density
E (X)
Expectation
F ()
Cumulative distribution function
EX| (X)
Conditional expectation
f (; x)
Parametrization by x
Whenever the random variables related to a function are not clear from the context, they are
indicated implicitly by the variable in the argument of the function.
Main variables and parameters
X
Unknown parameter
L
Likelihood
Y
Continuous measurement
S
Score function
i
Quantized measurement
I
Fisher information
X̂
Estimator
CRB
Cramér–Rao bound
V
Measurement noise
BCRB
Bayesian Cramér–Rao bound
W
Parameter variation
MSE
Mean squared error
11
12
Notations
Main variables and parameters
σw
Increments variance
NI
Number of quantization intervals
δ
Noise scale
NB
Number of quantization bits
ε
Estimation error
N
Number of samples
Quantizer shift
Ns
Number of sensors
k
Sample/time index
τ
Quantization threshold
Lq
Loss due to quantization
τ′
Threshold variation
u
Deterministic drift
q
Quantization interval
γ(scalar)
Adaptive algorithm step
∆
Interval length
Γ(matrix)
Adaptive algorithm step
Quantizer input parameter
η
Adapt. algorithm correction λ ()
Interval density
The subscript k can be used either to indicate one specific sample of a sequence or to make
explicit that a quantity is a sequence.
Most quantities related to continuous measurements have a subscript c and most quantities
related to quantized measurements have a subscript q.
Functions
()−1
Inverse of a function
sign ()
Sign function
Γ ()
Gamma function
γ (, )
Incomplete gamma function
B
Beta function
I (, )
Incomplete beta function
()+
Ramp function
Abbreviations and acronyms
AUV
Autonomous underwater vehicle
BCRB
Bayesian Cramér–Rao bound
BI
Bayesian information
CRB
Cramér–Rao bound
CDF
Cumulative distribution function
DSP
Digital signal processing
FI
Fisher information
GFD
Generalized f –divergence
GGD
Generalized Gaussian distribution
i.i.d.
Independent and identically distributed
KLD
Kullback–Leibler divergence
MAP
Maximum a posteriori estimator
MLE
Maximum likelihood estimator
MSE
Mean squared error
MMSE
Minimum mean squared error
ODE
Ordinary differential equation
PF
Particle filter
PDF
Probability density function
r.v.
Random variable
RHS
Right-hand side
STD
Student’s-t distribution
w.r.t.
With respect to
13
Assumptions
Assumptions (on the noise distribution):
AN1 The marginal CDF of the noise, denoted F , admits a PDF f w.r.t. the standard
Lebesgue measure on (R, B (R)).
AN2 The PDF f (v) is a strictly positive even function and it strictly decreases w.r.t. |v|.
AN3 F is locally Lipschitz continuous.
Assumptions (on the quantizer):
AQ1 NI is considered to be an even natural number and the set I where ik is defined is
NI
NI
.
I = − , · · · , −1, 1, · · · ,
2
2
AQ2 The quantizer is symmetric around the central threshold. This means that the vector
of thresholds τ is given by
⊤
′
′
′
′
τ = τ− NI = τ0 − τ NI · · · τ−1 = τ0 − τ1 τ0 τ1 = τ0 + τ1 · · · τ NI = τ0 + τ NI
2
2
2
2
with the threshold vector elements forming a strictly increasing sequence and the nonnegative vector of threshold variations w.r.t. the central threshold given by
⊤
′
′
′
′
τ = τ0 = 0 τ1 · · · τ NI = +∞ .
2
AQ3 The quantizer output levels have odd symmetry w.r.t. i:
ηi = −η−i ,
with ηi > 0 for i > 0.
Modified assumptions (on the quantizer):
AQ2’ The quantizer is symmetric around the central threshold which is equal to zero. This
means that the vector of thresholds τ is given by the vector of threshold variations
⊤
′
′
′
′
τ = −τ NI · · · − τ1 0 + τ1 · · · + τ NI
,
2
2
where the threshold variations τi′ form an increasing sequence.
AQ3’ The quantizer output levels ηx [i] are odd and the output levels ηδ [i] are even.
ηx [i] = −ηx [−i] ,
with ηx [i] > 0 for i > 0 and ηδ [1] < 0.
15
ηδ [i] = ηδ [−i] ,
16
Assumptions
Assumptions on Iq for the MLE update to have asymptotically optimal performance:
A1.MLE Iq (ε) is maximum for ε = 0.
A2.MLE Iq (ε) is locally decreasing around zero.
A3.MLE The function Iq (ε) has bounded Iq (0),
dIq (ε) dε ε=0
= 0, bounded
fore accepting a Taylor approximation around zero (for small ε′ ):
ε′2 d2 Iq (ε) ′
′2
Iq ε = Iq (0) +
+
◦
ε
,
2
dε2 ε=0
d2 Iq (ε) ,
dε2 ε=0
there-
◦(ε′2 )
where the ◦ ε2 here is equivalent to say that the quantity ε′2 tends to zero when ε′
tends to zero.
Introduction
Quantization: the stranger in the room
Open a book, any basic book on digital signal processing (DSP), and count the number
of pages dedicated to the sampling theorem and discrete-time signal processing: FFT, Ztransform, FIR and IIR filtering. Now, count the number of pages dedicated to quantization.
Even if half of the "digital world" comes from quantization, by reading some basic books on
DSP, we have the feeling that it is a completely unimportant subject.1
A curious person might think: is it really unimportant? Maybe it is simply so difficult to
be treated and explained in an easy way, that most DSP books skip a detailed description
of quantization. We think this explanation is the reason most of the texts presenting DSP
assume that signals are quantized with a very high resolution, so they have the possibility
of explaining quantization almost in a footnote. As a consequence, quantization seems to be
the stranger that comes to the "DSP party" and almost nobody wants to speak with (even
if it is one of the party organizers). Some signal processing domains find useful (and in some
circumstances they are not wrong) to refuse "contact" with quantization. Whenever they need
to address quantization issues they always call it in a derogatory way – "quantization noise".
In this thesis we expect to make one of the subjects in the signal processing party to
"talk" with quantization in a polite way, without detracting terms. The subject we chose is
estimation.
In the following, we will explain the motivation and the main points of their "conversation".
Sensor networks and quantization: the welcome guest
Although we do not explicitly design estimation algorithms using a sensor network architecture, this thesis is intended to contribute in the development of estimation techniques that
can be applied or extended to sensor networks.
Sensor network emergence. With the reduction in cost and size of electronic devices such
as sensors and transceivers, a whole new field emerged under the name Sensor Networks. This
term, in general, means any set of sensors capable of communication and processing used for
a specific task, e.g. estimation, detection, tracking, classification, etc.
Sensor networks are attractive for many reasons [Akyildiz 2002], [Intanagonwiwat 2000],
[Zhao 2004, pp. 7–8]:
1
Note that the real problem of digitizing a signal by considering sampling and quantization as a joint
operation is simply a non-issue in signal processing literature. We do not study this problem in this thesis
either, but it is an interesting problem.
17
18
Introduction
• fault-tolerance and flexibility. By using multiple sensors to realize a sensing task, even if
one of them is unable to measure, the other sensors guarantee that the sensing system is
still working. By proper design, the sensor network can reconfigure the way it operates,
so that if a failure occurs in a sensor or a small set of sensors, the performance of the
sensing system is not strongly affected.
• Easy deployment. The decreased cost of the sensors makes it possible to deploy large
quantities of sensors in a given area without detailed placement of the sensors. This
simplifies the deployment of sensing systems in difficult access and hostile environments.
• Risky environment sensing. By allowing the sensors to communicate wirelessly, remote
sensing can be done in areas where human activity is impossible or cannot be sustained
for long periods of time.
• No maintenance sensing. The fault tolerance capabilities of sensor networks allows it to
be applied in applications where maintenance of the sensing system is difficult.
• Multi-hop communication. By using the communication capabilities of the sensors to
allow multi-hop communication, the total energy used in communication for the sensing task may decrease, as the attenuation of transmitted signals is smaller for smaller
distances.
• Enhanced signal-to-noise ratio. In tracking or detection applications, the performance
of the task is normally dependent on the signal-to-noise ratio of the measurements.
If we consider that the signal we measure attenuates with distance, then in a sensor
network, as the density of sensors can be high, it is expected that at least a few sensors
will measure the signal with high signal-to-noise ratio, enhancing in this way the final
performance.
Sensor network applications. Based on the advantages of sensor networks presented
above, a plethora of applications can be developed in many different domains [Arampatzis 2005],
[Chong 2003], [Durisic 2012], [Puccinelli 2005]:
• environmental monitoring. Habitat monitoring, bio-complexity mapping, weather forecasting and disaster prevention (volcanic eruptions, floods, earthquakes).
• Agricultural monitoring. Precision irrigation, fertilization and pest control.
• Civil engineering. Building automation, building emergency systems and structural
health monitoring.
• Urban monitoring. Pollution monitoring, video surveillance and traffic control.
• Health applications. Monitoring of human physiological data, tracking of doctors and
patients in a hospital.
• Commercial applications. Support for logistics, production surveillance and automation.
Introduction
19
• Military applications. Self-healing landmines, soldier detection and tracking, shot origin
information, perimeter protection, chemical, biological and explosive vapor detection,
missile canister monitoring and blast localization.
The need for quantization. Even if progress in sensor and communication technologies
motivates the use of a large number of communicating sensors, practical considerations such
as the use of non replenishable energy sources (sensors are self-powered with batteries) and
maximum size constraints impose three design constraints:
• energy constraint: which comes directly from the choice that the sensors use a non
replenishable energy source.
• Rate constraint: this constraint is related to the fact that the communication channel
bandwidth must be shared by a large quantity of sensors and that the energy is also
constrained.
The energy spent in a sensor network can be divided mainly in three activities, sensing,
communication and processing. It is known that the major energy consumer of these
activities is communication [Akyildiz 2002]. As bandwidth is constrained, the simplest
solution to have reduced energy consumption is to find a way to achieve the same or
similar goal by communicating with a lower rate (number of bits per unit of time).
• Complexity constraint: although much less important in energy consumption, complexity
both in terms of processing and memory must be small to keep the cost and size of the
sensors small.
One way to treat these problems is to consider that the sensors quantize their measurements
before the realization of any other operations2 . This allows to
• reduce complexity by using pre-stored tables for the computations and also by bounding
memory requirements.
• Reduce directly the rate by controlling the number of quantization intervals.
• Reduce energy requirements, as a consequence of the reduction in complexity and rate.
These are the main reasons for studying quantization in this thesis.
Different objectives and the scope of the thesis
In a sensing system the main task is to infer some information that is embedded in the
measurements. The two main classes of inference problems studied in signal processing are
detection and estimation. The literature on the joint subjects, detection based on quantized
2
We do not claim here that imposing quantization of the measurements is the optimal solution. In some
cases, it can be shown that a complete analog scheme is optimal [Gastpar 2008].
20
Introduction
measurements and estimation based on quantized measurements, is not expressive if compared
with the literature on the separated subjects, however, as a consequence of the emergence of
sensor networks its size is increasing. Some references on these subjects are the following:
• Detection: [Benitz 1989], [Gupta 2003], [Kassam 1977], [Longo 1990], [Picinbono 1988],
[Poor 1977], [Poor 1988], [Tsitsiklis 1993], [Villard 2010], [Villard 2011].
• Estimation: [Aysal 2008], [Fang 2008], [Gubner 1993], [Luo 2005], [Marano 2007],
[Papadopoulos 2001], [Poor 1988], [Ribeiro 2006a], [Ribeiro 2006b], [Ribeiro 2006c],
[Wang 2010].
Estimation based on quantized measurements. As mentioned before, in this thesis we
will study the second of the subjects mentioned above, namely, estimation based on quantized
measurements. We will start by explaining the general estimation problem in a sensing system.
By making a sequence of simplifications in the general problem, we will get to the main scope
of this thesis.
In the general scheme, each sensor measures a continuous amplitude quantity X (i) , processes locally its measurement and sends it to the point where the estimate will be evaluated.
The point of evaluation can be either a fusion center, one of the sensors or all sensors. In the
last case, all sensors broadcast their processed measurements. This scheme is shown in Fig.
1. The quantity in this case can be a sequence of vectors, a sequence of scalars, a constant
vector or a constant scalar.
X(1)
Sensing
Processing
Transmission

Sensor 1
X(2)
Sensing
Processing
Transmission
Estimation




X̂(1)
X̂(2)
..
.
X̂(Ns )





Sensor 2
..
.
X(Ns )
Sensing
Processing
Transmission
Sensor Ns
Figure 1: Estimation problem using a sensing system. Multiple sensors send preprocessed
information to the final estimator that must recover the quantities of interest.
The first simplification that we will make is to consider only one of the terminals (sensors)
in the sensing system, eventually, we might consider the problem with multiple terminals but
with the same quantity being measured by all sensors. We will also consider that the quantity
to be estimated is either a sequence of scalars or one scalar. We will use the notation Xk for
the quantity to be estimated in both cases, k is the sample index and, in most cases, it will be
Introduction
21
also the discrete-time index. When Xk is a scalar constant, we have Xk = x. The simplified
problem, which can also be called scalar remote sensing problem, is depicted in Fig. 2.
Xk
Sensing
Processing
Transmission
Estimation
X̂k
Sensor
Figure 2: Scalar remote sensing problem. A scalar single terminal simplification of the problem
depicted in Fig. 1.
The parameter Xk is measured with continuous amplitude additive noise Vk . The continuous measurement will be denoted Yk = Xk + Vk .
The estimation problem we mainly deal here is location estimation, as Xk in this case is
a location parameter characterizing the measurement distribution. Other technical considerations about the noise sequence will be presented later. In some points of the thesis we will
not constrain Xk to be a location parameter and we will let it be a general parameter.
According to the previous discussion on the design constraints, the processing block is
replaced by a scalar quantizer. Thus, each noisy continuous measurement Yk will generate a
quantized measurement ik according to the quantizer function Q (). Each quantized measurement is defined in a finite set of values so the rate (number of possible values per measurement
of the alphabet) is fixed and known. We suppose that the rate in bits per unit of time is chosen
such that the transmission channel capacity is not exceeded, thus, by adding proper channel
coding in the transmission block, we can consider that the channel is perfect.
For each time k we are interested in estimating Xk based on the set of past measurements
i1 , i2 , · · · , ik . The problem is then depicted in Fig. 3.
Vk
Xk
Yk
Noise
Q (Yk )
Quant.
ik
Perfect
transmission
g (i1 , · · · , ik )
X̂k
Estimation
Measurement
Figure 3: Estimation based on quantized measurements. A parameter is measured with additive noise, the measurements are then quantized and transmitted through a perfect channel.
Based on the past quantized measurements, the objective is to estimate Xk for each time k
with the sequence of mappings g ().
As it is shown in Fig. 3, we also consider that the quantizer structure can depend on the
past quantized measurements.
What we want to study. We want to propose algorithms for estimating Xk based on ik .
The parameter Xk , which will be detailed later, can be either a deterministic constant or a
slowly varying random process.
22
Introduction
After proposing the algorithms, we want to evaluate their performance. Given the algorithm performance, we want to study the effects of the quantizer function parameters, the
quantization thresholds, and of the quantizer resolution, the number of quantization intervals
or bits.
For assessing how quantization impacts on estimation, we will also compare the estimation
performance of the proposed algorithms with the estimation performance of their corresponding continuous measurement versions.
The objective here is to estimate Xk based on the interval information (we know only in
which interval the measurement is) of a noisy version of it.
What we do not want to study (and we will not study). We do not want to reconstruct
the measurement Yk from the quantized measurements and then estimate Xk based on the
reconstructed measurements, as if they were continuous. By doing this, we would simply join
their optimal separated solutions, which are well known.
We do not want to consider quantization as additive noise either. We want to consider the
problem in its true form, that is to study how to exploit the information contained in intervals
and not in continuous values.
What we want to study but we will not study. To specify in a precise way the scope of
the thesis, we also have to state the problems we may have consciously overlooked. Consciously
overlooked in this case means that, differently from the class of problems above, we wanted
to study them, but to keep the subject simple, they will be neglected. These subjects are:
vector parameters and vector quantization, presence of noisy channels (fading or additive)
and channel coding, fast varying signals, estimation of continuous time signals and Bayesian
estimation of a random constant.
Introduction
23
Structure of the thesis and outline
This thesis is formed by a general introduction, two parts and a general conclusion. Each part
is divided in introduction, chapters and conclusion. In the first part, there are three chapters
and in the second a single chapter. Each chapter is subdivided in three parts: introduction
with the main contributions of the chapter, the main development and a summary/conclusion
with some directions for future work. The conclusions in the order thesis–part–chapter increase
in level of details. The thesis conclusion is a general overview, the part conclusion presents the
points that we think we must retain without explaining the technical details and the chapter
summary is a detailed account of the points observed in the chapter.
The thesis outline is the following:
• Part I: a study of algorithms/performance for estimation based on quantized measurements.
– Chapter 1: the main details on the quantizer structure and noise are given. The
fundamental algorithms and performance for the estimation of a deterministic scalar
constant parameter are presented. Algorithms both for static quantization and
adaptive quantization are studied.
– Chapter 2: the time-varying parameter counterpart of Ch. 1 is presented. We
consider the parameter to be a slowly varying scalar Wiener process and we present
Bayesian algorithms for tracking the parameter.
– Chapter 3: Low complexity algorithms are proposed as alternatives to those presented in Ch. 1 and 2. We also study some extensions of the scalar location
problem: an extension that considers that the noise scale parameter is unknown
and an extension that considers multiple sensors.
• Part II: a high resolution (high-rate) approximate analytical expression for the estimation
performance.
– Chapter 4: an open problem from Part I is how to set completely the quantizer key
parameters so that estimation performance is maximized. In this chapter, we study
how to solve this problem approximately by considering high resolution approximations (small quantization intervals approximation). We give a practical solution
to obtain the optimal quantizer and the corresponding asymptotic estimation performance.
Each part will begin with an example, which can be seen as a background for the presentation of the problem. The examples serve only for presentation purposes and their specific
subjects (water management and deep-sea water mining) are not the main subject of this
work.
The appendices of this thesis are divided in three parts, one part for presenting proofs that
are considered not important to develop in the main text Why? - App. A.1, another for
giving more details about a subject More? - App. A.2 and one part for explaining some
implementation issues How? - App. A.3.
24
Introduction
For defining a new abbreviation or acronym we write the expression in boldface with
the abbreviation in parenthesis (). For citing a reference that was already cited similarly
elsewhere, we write the reference and the work where it was cited with (cited in ...).
Publication
During this thesis three papers were presented in international conferences
• Rodrigo C. Farias and Jean-Marc Brossier, Adaptive Estimation Based on Quantized
Measurements, IEEE International Conference on Communications (ICC), 2013, Budapest, Hungary.
• Rodrigo C. Farias and Jean-Marc Brossier, Adjustable Quantizers for Joint Estimation
of Location and Scale Parameters , IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP), 2013, Vancouver, Canada.
• Rodrigo C. Farias and Jean-Marc Brossier, Asymptotic Approximation of Optimal Quantizers for Estimation, IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), 2013, Vancouver, Canada.
One paper was accepted for presentation in a French conference
• Rodrigo C. Farias and Jean-Marc Brossier, "Quantification asymétrique optimale pour
l’estimation d’un paramètre de centrage dans un bruit de loi symétrique", "Colloque
GRETSI", 2013, Brest, France
and one article was published
• Rodrigo C. Farias and Jean-Marc Brossier, Adaptive quantizers for estimation, Signal
Processing, Elsevier, vol. 93, november 2013.
Part I
Estimation based on quantized
measurements:
algorithms
and
performance
25
27
"A word to the wise is enough" - popular saying.
Motivation
As a background to introduce this part, we start with an application example. A recent trend
in placing water as a key element in government strategic decisions (including possible future
military interventions) lead to the choice of the motivational example.
Agriculture is responsible for 70% of freshwater withdrawals. Food production for satisfying the daily caloric needs of a person consumes 3000 liters of water, a very large quantity
when compared with the 2-5 liters used for drinking. Add to these ingredients the fact that
world population is growing and that a large part of the population is changing its diet, consuming more meat and vegetables and therefore even more water [Molden 2007] and we have
a possible recipe for future water scarcity.
One possible policy for preventing future water scarcity is to develop or improve irrigation
systems in sub-developed countries, where water use efficiency is very low [Molden 2007]. For
doing so, measuring accurately the soil moisture of crop fields is a main issue. Thus, as a
background scenario for introducing this chapter, we will consider the problem of estimating
the moisture level of crop fields.
Consider that multiple crop field areas will have each a set of sensors noisy measurements of
some quantities related to soil moisture. All the data will be transmitted to a central processor,
which after estimating the moisture levels, will decide which crops must be irrigated. As the
number of sensed areas can be large, for example, when the irrigation system is integrated for
an entire geographic region, quantization will be applied to respect communication constraints.
The solution to this problem can be simplified by assuming that the decision (control)
part of the problem can be decoupled from the estimation part. We will focus here only on
the estimation part. In a first approach, we can assume that the moisture levels are unknown
deterministic scalars, unrelated from one region to the other and that they are approximately
constant for a block of N independent measurements. If humidity sensors are used, from the
symmetry of the problem and the assumption that the moisture levels are not related, the
joint estimation problem of all levels decouples into many scalar estimation problems with
identical general form. This general form is the following:
(a) Estimate a constant scalar location parameter x, based on N
independent noisy measurements
Y1:N = {Y1 = x + V1 , · · · , YN = x + VN } ,
which are scalarly quantized with a quantizer function Q (to be
defined later)
i1:N = {i1 = Q (Y1 ) , · · · , iN = Q (YN )} .
28
A more detailed meaning for "estimate" is
(1) Give an analytical form or a procedure describing the parameter estimator X̂.
(2) Give the estimation performance or an approximation of the estimation performance as a
function of
• number of measurements;
• noise characteristics;
• the quantizer function.
After giving a solution for this problem, we may be interested in considering a more
complex model for x, for instance, instead of considering it as a constant, we can assume that
it varies randomly with time.
A simple dynamical model is
Xk = Xk−1 + Wk ,
where k is the discrete-time index, Wk is an independent and identically distributed
2 . Thus, if X is Gaussian, X
(i.i.d.) Gaussian process, with zero mean and variance σw
0
k
is a Gaussian process known as a discrete-time Wiener process or as a discrete-time
random walk process. This type of process is commonly used to describe slowly varying
parameters when their evolution is random but with unknown form. A reason to use this
model is that by constraining the increments to be Gaussian distributed, minimal quantity of
information is imposed for a given increment variance (in terms of information theory quantity
of information).
Now, suppose we have statistics about precipitation on the crop field region, for example
its average, we also know the last quantities of water irrigated on the field and how to relate
both precipitation and irrigated water to average increase in moisture level, denoted uk . This
will allow us to use a more precise dynamical model for Xk , using as increments Gaussian
random variables (r.v.) with mean uk . Consequently, our model will become a discretetime Wiener process with a deterministic drift.
The objective is the same as before, estimate Xk based on scalarly quantized Yk . However,
the relation between measurements can be exploited now. Instead of considering the static
estimation problem for separate blocks of measurements, we can now use all past measurements
in the estimation of the varying parameter, under the constraint that the parameter evolution
must follow the dynamical model.
29
Therefore, we are also interested in solving the following problem:
(b) Estimate a varying random parameter Xk at time k based on the
last and present scalarly quantized measurements i1:k .
This is a filtering problem, as estimates depend not only on past measurements but also
on the present measurement [Jazwinski 1970]. The problems of estimation based on past
measurements, i.e. prediction, and of estimation based on additional future measurements,
i.e. smoothing, will not be treated in this thesis.
Outline for this part
For the problem at hand, 3 types of model with increasing complexity can be considered, these
3 models are related to the two estimation problems (a) and (b) as it is shown below:
Constant model
Location
(a)parameter
estimation
Scalar Wiener process model
(b)Filtering
Scalar Wiener process model with drift
Many other practical estimation problems rely on the models presented above and consequently can be cast as (a) or (b). We will look now for their solutions.
First, we will present algorithms and performance for the estimation of a constant location
parameter. We will study maximum likelihood estimators and their asymptotic performance
through the Cramér–Rao bound. We will see that estimation performance is sensible to the
distance between the quantizer dynamic range and the parameter. For commonly used noise
models, we will see that estimation performance actually degrades when the dynamic range
is far from the parameter. As a solution, we will search for adaptive schemes that place the
dynamic range close to the parameter. We will show that in the binary case, the asymptotically
optimal adaptive algorithm is given in a simple recursive form.
After that, we will focus on filtering. A general solution using recursive integral expressions
will be given. As this solution is analytically intractable, an approximate solution based on
sequential Monte Carlo methods (particle filtering) will be considered. Its performance will
be assessed through a lower bound, the Bayesian Cramér–Rao bound. Then, by analyzing the
bound, we will see that a good estimation scheme can be obtained by quantizing the measurement prediction error, usually called the innovation. We will show that the asymptotically
optimal filter based on the quantized innovation is also given in a simple recursive form when
30
the parameter varies slowly.
Motivated by the recursive forms that are obtained asymptotically, both in the constant
and varying parameter cases, we will present a low complexity adaptive algorithm for estimation using quantized measurements. The estimation performance and the optimal algorithm
parameters will be obtained for constant and Wiener process models. Extensions of the algorithm for the cases when multiple sensors with a fusion center are used and when the noise
scale factor (a measure of its amplitude) is unknown will also be obtained.
At the end of this part some conclusions will be drawn on the overall aspects of estimation
based on quantized measurements.
Chapter 1
Estimation of a constant parameter:
what is done and a little more
In this chapter we study the problem of estimation of a constant location parameter based on
quantized measurements. We start the chapter with the measurement model, which is mainly
the noise model and the definition of the quantizer. The first sections of the chapter deal
with a fixed quantizer structure (fixed quantization thresholds), while in the last sections, we
present estimation schemes with an adaptive quantizer structure.
In the part concerning a fixed quantizer structure, we start by giving a general estimation
algorithm based on the maximum likelihood method. Its performance is given in terms of
the Cramér–Rao bound. Then, we study the general effects of quantization on estimation
performance. This is done through the analysis of the Cramér–Rao bound, a quantity that
is directly related to the Fisher information. We also analyze the performance of binary and
multibit quantization as a function of the quantizer tuning parameter. We give a detailed
implementation of the maximum likelihood estimator for general noise distributions in the
binary case, while in the multibit case, the maximum likelihood estimator is detailed for a
more restricted class of noise distributions, more precisely, log-concave distributions.
As a main result of the performance analysis for the fixed threshold scheme, we will see
that, for commonly used noise models, the estimation performance degrades as the quantization dynamic range is distant from the true parameter. This is used as a motivation to
study estimation schemes that adaptively place the quantizer dynamic range close to the true
parameter. We study two adaptive schemes. One based on a simple update of the quantizer
main parameter, but with the final estimate given by maximum likelihood estimation and the
other based on the use of the maximum likelihood last estimate as the quantizer main parameter. Their performances are given also in terms of the Cramér–Rao bound. We will also
see that the estimator based on the maximum likelihood threshold update is asymptotically
equivalent to a low complexity recursive algorithm.
We finish this chapter with a summary of the main points that were studied and with the
directions for further research. The directions will point for further work that is presented in
other chapters or that will be studied in the future.
31
32
Chapter 1. Estimation of a constant parameter
Contributions presented in this chapter:
• Global and local analysis for binary quantization. By reading carefully the literature on
the subject, we have the impression that setting the quantization threshold on the true
parameter value is optimal for symmetric distributions [Wang 2010, p. 265]. But this
affirmation is actually false. We present here global and local conditions on the noise
distribution that guarantees that this threshold value (equal to the parameter value) is
indeed optimal.
• Asymmetric threshold case. Differently from the literature where only the symmetric
cases are shown, we show some cases where the noise distribution is symmetric and the
optimal quantization threshold is not the median.
• Laplacian noise. In the literature, most of the analysis is focused on the Gaussian noise
case, where, as it is expected, quantization strictly decreases estimation performance.
Here, we study also the Laplacian case. The Laplacian case is easier to analyze and it
is a nice counterexample to the intuition that quantization strictly decreases estimation
performance (see p. 55).
• Adaptive binary quantization scheme in a finite grid. We present a method to obtain
the asymptotic threshold probabilities in the adaptive binary threshold scheme (see
(More? - App. A.2.4)). Differently from the method presented in [Fang 2008], where a
truncation approximation is used, in the method presented here, we define boundaries
on the possible threshold values so that the number of threshold values is finite and the
asymptotic probabilities can be evaluated analytically.
• Multibit adaptive scheme based on the maximum likelihood estimator and its convergence.
We extend the binary adaptive scheme presented in [Fang 2008] to the multibit case and
we also extend its proof of convergence to the general multibit non Gaussian case.
• Asymptotic binary adaptive scheme based on the MLE. We give a less heuristic proof
that the adaptive quantization scheme based on the maximum likelihood estimator is
given asymptotically in a simple recursive form.
33
Contents
1.1
1.2
1.3
1.4
1.5
1.6
Measurement model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
1.1.1
Noise model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.1.2
Quantization model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Maximum likelihood, Cramér–Rao bound and Fisher information . .
37
1.2.1
Maximum likelihood estimator . . . . . . . . . . . . . . . . . . . . . . . . 38
1.2.2
Cramér–Rao bound and the Fisher information . . . . . . . . . . . . . . . 39
1.2.3
Quantization loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Binary quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
1.3.1
The Gaussian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.3.2
The Laplacian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1.3.3
The general case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
1.3.4
Asymmetric threshold: surprising cases . . . . . . . . . . . . . . . . . . . 48
1.3.5
Conclusions on binary quantization performance . . . . . . . . . . . . . . 52
1.3.6
MLE for binary quantization . . . . . . . . . . . . . . . . . . . . . . . . . 53
Multibit quantization . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
1.4.1
The Laplacian case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
1.4.2
The Gaussian and Cauchy cases under uniform quantization . . . . . . . . 56
1.4.3
Summary of the main points . . . . . . . . . . . . . . . . . . . . . . . . . 58
1.4.4
MLE for multibit quantization with fixed thresholds . . . . . . . . . . . . 58
Adaptive quantizers: the high complexity fusion center approach . .
60
1.5.1
MLE for the adaptive binary scheme . . . . . . . . . . . . . . . . . . . . . 61
1.5.2
Performance for the adaptive binary scheme . . . . . . . . . . . . . . . . . 62
1.5.3
Adaptive scheme based on the MLE . . . . . . . . . . . . . . . . . . . . . 66
1.5.4
Performance for the adaptive multibit scheme based on the MLE . . . . . 66
1.5.5
Equivalent low complexity asymptotic scheme . . . . . . . . . . . . . . . . 70
Chapter summary and directions . . . . . . . . . . . . . . . . . . . . . .
73
34
Chapter 1. Estimation of a constant parameter
1.1
Measurement model
We start by explaining the measurement model. The unknown deterministic scalar constant
parameter to be estimated is
x∈R
and it is measured N times, N ∈ N⋆ , with i.i.d. additive noise Vk . For k ∈ {1, · · · , N } the
continuous measurements are
Y k = x + Vk .
1.1.1
(1.1)
Noise model
The continuous sequences of r.v. Yk and Vk are defined on the probability space P = (Ω, F, P)
with values on (R, B (R)). For simplification purposes the following hypotheses on the noise
distribution will be considered:
Assumptions (on the noise distribution):
AN1 The marginal cumulative distribution function (CDF) of the noise, denoted F ,
admits a probability density function (PDF) f with respect to (w.r.t.) the
standard Lebesgue measure on (R, B (R)).
AN2 The PDF f (v) is a strictly positive even function and it strictly decreases w.r.t. |v|.
Assumption AN1 is a commonly used assumption that in practice will be used when the
derivative of F w.r.t. its arguments is needed. AN2 means that the noise distributions are
unimodal and symmetric around zero and it will be used for the following reasons:
1. The unimodal behavior of the noise will allow to have a general qualitative characterization of estimation performance as a function of quantization parameters. More precisely,
it will be observed that for unimodal densities very poor estimation performance occurs
for quantizers having the dynamical range far away from x.
2. It will be used as a condition for the convergence of some new adaptive estimation
algorithms presented in this thesis.
3. In the lack of physical constraints (e.g. positivity), there should be no reason for the
components of the noise to be asymmetric. Thus, if we consider that the noise is a
normalized sum of an infinite number of symmetric i.i.d. r.v. (an infinite sum of small
perturbations), then it is known that the resulting noise r.v. distribution is a symmetric
stable distribution [Samorodnitsky 1994], which is unimodal.
Even if not all unimodal symmetric distributions are stable the generalized central limit
theorem above serves as an additional motivation.
1.1. Measurement model
1.1.2
35
Quantization model
From the reasons presented in the Introduction and in the motivational example given above,
the measurements are quantized. We will consider that they are scalarly quantized, which
means that each measurement is quantized separately from the others. The quantizer output
can be written as
ik = Q (Yk ) ,
(1.2)
where ik is a value from a finite set I of R with NI elements. Due to notation issues, we
denote both the quantized measurement random variable and its realization with lowercase i.
NI is the number of quantization intervals. A simple example of quantizer Q with uniform
threshold spacing is given in Fig. 1.1.
ik = Q (Yk )
...
NI
2
..
.
..
.
2
1
τ− NI = −∞
2
τ0 −
...
NI
2
− 1 ∆ τ0 − 2∆
...
τ0 − ∆
τ0
...
τ1
= τ0 + τ1′ τ0 + 2∆ τ0 +
−1 = τ0 + ∆
...
NI
2
−1 ∆
τ NI = +∞ Yk
2
..
.
−2
...
..
.
− N2I
Figure 1.1: Quantizer function Q (Yk ) with NI quantization intervals and uniform threshold
spacing with length ∆. The number of quantization intervals NI is even, the quantizer is
symmetric around the central threshold τ0 and the output indexes are integers without the
zero.
Except for the uniform thresholds, Fig. 1.1 shows the main elements of the general quantization model that will be used:
• the number of quantization intervals NI will be an even number, this will lead to a
clearer presentation, as in each analysis we will not need to deal with the additional
central interval.
• The outputs of the quantizer will be defined on a set of integers from − N2I to N2I , without
zero. This will simplify the notation of the algorithms that will be presented later. Note
that as we will consider that the output of the quantizer is obtained without additional
noise (it passes through a noiseless channel), the assignment of the output values ik is
not important, as long as the assigned values are different. For estimation purposes,
only a label is needed at quantization output. The estimator or parts of the estimation
procedure will carry out the role of the output quantization levels, as they are used in
standard quantization, by generating estimates (values) based on the information from
the intervals (indicated by the labels) where the continuous measurements lie.
36
Chapter 1. Estimation of a constant parameter
Observe that if we introduce in the model a noisy communication channel and constraints
on transmission power, the assignment of the output values becomes important. As it
was stated in the Introduction, we are not going to consider this model in this thesis,
but we can keep this extended problem as a possibility for future work.
• The quantizer is defined by NI + 1 thresholds τi , which can be separated in three types:
one central threshold τ0 , N2I − 1 thresholds that are larger than τ0 with an additional
threshold at +∞ and N2I − 1 that are smaller with an additional threshold at −∞. We
will consider that the non central thresholds are symmetric w.r.t. τ0 , thus, for example,
the threshold τi is given by τ0 plus a variation τi′ and the threshold τ−i is given by
τ0 minus the same variation. In the figure, the variations are integer multiples of ∆,
which corresponds to uniform quantization. In general, we will not impose uniform
quantization.
The assumption on the symmetry of the quantizer is difficult to justify at this point, but
the main idea is that, as it will be shown further for commonly used noise models, the
best central threshold for estimation purposes is exactly x, thus if we set τ0 = x, from
the assumption of noise symmetry, it seems reasonable to assume that the quantizer
(a good one) is symmetric. In Part II, it will be shown that for large NI the optimal
quantizer is indeed symmetric around τ0 for symmetric noise distributions.
The infinite thresholds for the extreme positive and negative thresholds are used to
have the same notation for the probabilities of the granular (region inside the quantizer
input dynamic range) and overload regions (region outside the quantizer input dynamic
range).
From Fig. 1.1 and the explanations above, the quantizer function can be described as
follows: if we have a measurement Yk ≥ τ0 that falls in the quantization interval qi = [τi−1 , τi ),
then its output will be i. Otherwise, if Yk ≤ τ0 and it falls in q−i = [τ−i , τ−i+1 ), then the
quantizer output will be −i.
As an example, consider that we have a uniform quantizer with 16 quantization levels,
τ0 = 0 and uniform quantization step-length ∆ = 1, then for the input
y1:10 = {−20, −8.5, −3.4, −5.6, −0.1, 0.7, 3.2, 10.7, 7.1, −2.3} ,
we obtain
y1 = −20 → i1 = −8,
y6 = 0.7 → i6 = 1,
y3 = −3.4 → i3 = −4,
y8 = 10.7 → i8 = 8,
y5 = −0.1 → i5 = −1,
y10 = −2.3 → i10 = −3.
y2 = −8.5 → i2 = −8,
y4 = −5.6 → i4 = −6,
y7 = 3.2 → i7 = 4,
y9 = 7.1 → i9 = 8,
1.2. Maximum likelihood, Cramér–Rao bound and Fisher information
37
Observe that by using the threshold variations τi′ , we can write the input–output relation
in a more compact way:
ik = i sign (Yk − τ0 ) ,
′
for |Yk − τ0 | ∈ τi−1
, τi′ .
(1.3)
Note that the index k here is the time or sample index and it is not the particular value of i.
Before proceeding, we will state explicitly the assumptions on the quantizer.
Assumptions (on the quantizer):
AQ1 NI is considered to be an even natural number and the set I where ik is defined is
I=
NI
NI
− , · · · , −1, 1, · · · ,
2
2
.
AQ2 The quantizer is symmetric around the central threshold. This means that the vector
of thresholds τ is given by (⊤ is the transpose operator)
′
τ = τ− NI = τ0 − τ NI · · · τ−1 = τ0 −
2
2
τ1′
τ0
τ1 = τ0 +
τ1′
′
· · · τ N I = τ 0 + τ NI
2
2
⊤
with the threshold vector elements forming a strictly increasing sequence and the nonnegative vector of threshold variations w.r.t. the central threshold given by
′
τ =
1.2
τ0′
=0
τ1′
···
′
τ NI
2
⊤
= +∞ .
Maximum likelihood, Cramér–Rao bound
and Fisher information
We want to estimate x based on i1:N = {i1 , · · · , iN } (problem (a)). For doing so, we will look
for an estimator
X̂ (i1:N ) - which is a r.v. as it is a function of r.v.,
that must be as close as possible to x. In our case, we are going to choose the quantitative
meaning of "as close as possible" to be with minimum (or small) mean squared error
(MSE):
2 ,
(1.4)
MSE = E X̂ − x
E is the expectation w.r.t. the joint distribution of the noise. The MSE is a commonly
used performance criterion for estimation problems. Although it is widely used, it has the
inconvenient that it is impossible to find in a general form the X̂ minimizing it by direct
analytical minimization [Van Trees 1968, p. 64].
38
Chapter 1. Estimation of a constant parameter
1.2.1
Maximum likelihood estimator
A common solution for this problem is to suppose that N is large, in theory N must tend to
infinity, and that X̂ is constrained to be unbiased, which means
h i
E X̂ = x,
in this case, the optimal X̂ minimizing the MSE is known to be the maximum likelihood
estimator (MLE) [Kay 1993, p. 160]. The MLE consists of maximizing the likelihood
function which is the joint distribution of the measurements considering that the measurements
are fixed parameters and that the parameter x is variable1 . For the estimation problem
considered here, the likelihood for an independent block of measurements i1:N is
L (x; i1:N ) =
N
Y
P (ik ; x) ,
(1.5)
k=1
where P (ik ; x) is the probability of having a quantizer output ik at time k for a parameter x.
This probability can be rewritten using the noise CDF and the thresholds:
P (ik ; x) =
(
P (τik −1 6 Yk < τik ) , if ik > 0,
P (τik 6 Yk < τik +1 ) , if ik < 0,
using the definition of Yk = x + Vk given by (1.1)
P (ik ; x) =
(
=
(
P (τik −1 6 x + Vk < τik ) , if ik > 0,
P (τik 6 x + Vk < τik +1 ) , if ik < 0,
F (τik − x) − F (τik −1 − x) , if ik > 0,
F (τik +1 − x) − F (τik − x) , if ik < 0.
(1.6)
The MLE is the value of x maximizing L (x; i1:N ) for a given i1:N :
X̂M L,q = X̂M L (i1:N ) = argmax L (x; i1:N ) .
(1.7)
x
The subscript q is used to make explicit that the estimation is done with quantized measurements. As the logarithm is a strictly increasing function on R⋆+ and most used likelihood
functions are given in exponential form, it is common to solve an equivalent maximization
problem:
X̂M L,q = argmax log L (x; i1:N ) .
x
1
Clearly, this is an inversion of roles from the modeling point of view and this is the main reason why we
do not call the likelihood function simply by joint PDF.
1.2. Maximum likelihood, Cramér–Rao bound and Fisher information
1.2.2
39
Cramér–Rao bound and the Fisher information
The MLE is the procedure to find the estimate. We still need its performance. Unfortunately,
no finite sample (finite N ) performance results are available for the MLE. We will focus then
on asymptotic results for which in some sense, as stated before, the MLE is optimal.
The MSE for the MLE can be written as
E
X̂M L,q − x
2 i2
h + Var X̂M L,q = bias2 + variance.
= E X̂M L,q − x
As it was stated, the MLE is asymptotically unbiased:
h
i
E X̂M L,q
=
N →∞
x.
(1.8)
Therefore, it is characterized asymptotically only by its variance.
The Cramér–Rao bound (CRB) is a lower bound on the variance of any unbiased
estimator [Kay 1993, p. 30] and the bound is valid even for finite N . Under some regularity
conditions, the asymptotic variance of the MLE is known to be minimum and it attains the
CRB [Kay 1993, p. 160]:
Var X̂M L,q
∼ CRBq ,
(1.9)
N →∞
later, we will compare this CRBq with its corresponding version for continuous measurements
that we will denote CRBc . The symbol ∼ used here means that both quantities are
N →∞
equivalent
lim
N →∞
Var X̂M L,q
CRBq
= 1.
As the MLE is asymptotically unbiased and with asymptotically minimum variance it is usually
called an asymptotically efficient estimator in classical estimation terms.
Note that the optimality in asymptotic variance does not imply optimality in MSE sense,
as a biased estimator can attain a lower asymptotic MSE when compared with the MLE.
Also, it is important to stress that the variance of the MLE will tend to the CRB only if the
maximum of the likelihood can be achieved. This can be an issue when we need to evaluate the
maximum of the likelihood through a numerical method, in this case we have to ensure that
the numerical method will converge to the global maximum. In what follows, we will assume
that the MLE, either evaluated analytically or numerically, is always the global maximum of
the likelihood. For further discussion on the issues of finding the MLE see (More? - App.
A.2.1).
The CRB is the inverse of the Fisher information (FI) [Kay 1993, p. 30]. The FI is
given by the variance of the score function Sq . As the expected value of the score function
is zero [Kay 1993, p. 67], the FI is given by the second order moment of the score function.
Starting from the definition of the score function for N quantized measurements and going in
40
Chapter 1. Estimation of a constant parameter
the direction of the asymptotic variance of the MLE, we have the following expressions:
Sq,1:N
=
Iq,1:N
=
Var X̂M L,q
∼
N →∞
∂ log L (x; i1:N )
∂x
(
)
2 ∂ log L (x; i1:N ) 2
E Sq,1:N = E
∂x
CRBq =
1
=
Iq,1:N
- score function,
- FI,
1
E
h
∂ log L(x;i1:N )
∂x
i2 - variance and CRB.
the subscript is used to indicate that these quantities are related to the quantized measurements i1:N . Due to the fact that the measurements are i.i.d., whenever we want to refer to the
score function and FI for one measurement ik , we can drop the sample indexes, thus writing
Sq and Iq . Under the assumption of independent measurements (independent noise), we have
the following:
• the joint probability in the FI expression decomposes in a product of marginal probabilities.
• The logarithm of the product of marginal probabilities becomes the sum of the logarithm
of each probability.
• After differentiating the sum of logarithms w.r.t. x, the square of the differentiated sum
can be decomposed in a sum of squared terms and a sum of products between different
terms.
• The expectation of the products between different terms is zero because the factors in
the products are independent and with zero mean (they are score functions thus having
zero mean [Kay 1993, p. 67]).
• The expectation of each squared term is the FI for the corresponding individual measurement.
Therefore, as the measurements are also identically distributed, the FI for N quantized measurements is N times the FI for one measurement Iq :
1
.
(1.10)
Var X̂M L,q
∼ CRBq =
N →∞
N Iq
The score function for one measurement Sq is
∂P(i ;x)
k
∂ log L (x; ik )
∂x
=
Sq =
∂x
P (ik ; x)
and the corresponding FI is
(
)
∂ log L (x; ik ) 2
Iq = E
=
∂x
=
X
ik ∈I
X
ik ∈I
"
∂P(ik ;x)
∂x
(1.11)
#2
P (ik ; x)
h
i2
∂P(ik ;x)
∂x
P (ik ; x)
.
P (ik ; x) ,
(1.12)
1.2. Maximum likelihood, Cramér–Rao bound and Fisher information
41
Defining the difference between the central threshold and the parameter as ε = τ0 − x, using
the CDF, PDF notations and the symmetry of the quantization thresholds we have
NI  2 
′ − f ε − τ′

X2  f ε + τi′ −1 − f ε + τi′ 2
f
ε
−
τ
ik
k
k
+ ik −1 .
Iq =
 F ε + τ′ − F ε + τ′
F ε − τi′k −1 − F ε − τi′k 
ik =1
ik −1
ik
ik =
(1.13)
The solution to problem (a) (p. 27) given by the MLE is the following:
Solution to (a) - MLE for a fixed thresholds set τ
(a1) 1) Estimator
X̂M L,q
=
argmax L (x; i1:N )
x
or
X̂M L,q (i1:N )
=
argmax log L (x; i1:N ) ,
x
with L (x; i1:N ) given by (1.5)
L (x; i1:N ) =
N
Y
P (ik ; x) .
k=1
2) Performance (asymptotic)
X̂M L,q is asymptotically unbiased
h
i
E X̂M L,q
=
N →∞
x
and its asymptotic MSE or variance is given by
Var X̂M L,q
∼
N →∞
CRBq =
1
,
N Iq
with Iq given by (1.13).
The CRB given above is not only related to the MLE, but can be used to approximately
assess the performance of any good (close to optimal) estimator. In our case, it can be used to
characterize the performance of the measurement/estimation system (Fig. 1.2) independently
of the estimator.
1.2.3
Quantization loss
The solution given above does not contain any direct characterization of the estimation performance as a function of NI and/or τ . We are going to look into these details now.
42
Chapter 1. Estimation of a constant parameter
Y1:N =
Noisy
{Y1 , · · · , YN }
Quantization
continuous
measurements
i1:N =
{i1 , · · · , iN } Parameter
estimation
Parameter
estimation
X̂q
CRBq
X̂c
CRBc
Figure 1.2: Scheme representing the general measurement/estimation system. The continuous
measurements sequence Y1:N is scalarly quantized and the quantized sequence i1:N is used for
estimation. X̂q and X̂c are the estimators based on quantized or continuous measurements
and CRBq and CRBc are their respective CRB.
Loss with respect to the continuous measurement
We will start analyzing the general effect of quantization on estimation. An approximate way
of doing this (exact for N → ∞) is to study the quantized FI for one measurement Iq and its
difference with respect to the continuous measurement FI Ic . Iq was given in (1.13), while Ic
is given by
(1.14)
Ic = E Sc2 ,
where Sc is the score function for continuous measurements given by
Sc (y) =
∂ log f (y − x)
.
∂x
(1.15)
h
i
The difference between Ic and Iq can be obtained by evaluating the quantity E (Sc − Sq )2
[Marano 2007]. Indeed,
h
i
E (Sc − Sq )2 = E Sc2 + E Sq2 − 2E [Sc Sq ] = Ic + Iq − 2E [Sc Sq ]
and it can be shown that E [Sc Sq ] = E Sq2 (Why? - App. A.1.1). Thus from above, we have2
h
i
Ic − Iq = E (Sc − Sq )2 > 0,
(1.16)
as the right-hand side (RHS) is the expectation of a squared function, the FI difference is
nonnegative, meaning that the FI for quantized measurements is always less or equal to its
continuous measurement equivalent. Therefore, as the corresponding CRB will have larger or
equal values, it is clear, as it was already expected, that quantization of measurements reduces
estimation performance (see Fig. 1.2 for the two estimation settings).
2
Special attention must be given to the fact that to obtain (1.16), the measurement PDF form f (y − x) is
not used, in the proof in App. A.1.1 a general form f (y; x) is used, thus the conclusion above is also valid for
general parameter estimation problems, not only location parameter estimation.
1.2. Maximum likelihood, Cramér–Rao bound and Fisher information
43
Loss with respect to the number of quantization intervals
Even if performance loss is positive or zero, nothing guarantees, until now, that estimation
performance increases with increasing NI , as it is intuitively expected. We will suppose that we
have a threshold set τ for NI quantization intervals. We will suppose ε = 0 for simplification.
We will add one threshold τ ′ between two thresholds τi−1 and τi (τi > τ ′ > τi−1 ), i > 0 is
assumed only to simplify notation. The sum elements defining Iq does not change, except for
the term corresponding to interval qi . The old and new FI only for this region are respectively
[f (τi ) − f (τi−1 )]2
f (τi ) − f (τi−1 ) 2
τ
=
[F (τi ) − F (τi−1 )] ,
(1.17)
Iq,i =
F (τi ) − F (τi−1 )
F (τi ) − F (τi−1 )
[f (τi ) − f (τ ′ )]2 [f (τ ′ ) − f (τi−1 )]2
+
.
(1.18)
F (τi ) − F (τ ′ )
F (τ ′ ) − F (τi−1 )
We can expand (1.17) adding and subtracting a term f (τ ′ ) in the numerator of the first
factor, adding and subtracting F (τ ′ ) in the denominator of the first factor and multiplying
and dividing the results numerator terms by F (τi ) − F (τ ′ ) and F (τ ′ ) − F (τi−1 ). This gives
2

′
′
 [f (τi )−f (τ ′)] [F (τi ) − F (τ ′ )] + [f (τ ′ )−f (τi−1 )] [F (τ ′ ) − F (τi−1 )] 
[F (τi )−F (τ )]
[F (τ )−F (τi−1 )]
τ
=
Iq,i
×


[F (τi ) − F (τ ′ )] + [F (τ ′ ) − F (τi−1 )]
F (τi ) − F τ ′ + F τ ′ − F (τi−1 ) .
(1.19)
{τ }∪{τ ′ }
Iq,i
=
The Jensen’s inequality tells us the following [Hardy 1988, p. 74]: for a sequence of values ai ,
positive weights bi and a convex function φ we have
 P
P
bi φ (ai )
a i bi
.
(1.20)
φ  iP  6 i P
bi
bi
i
Multiplying both sides of (1.20) by
and F (τ ′ ) − F (τi−1 ), ai with
following:
P
bi
i
[f (τi )−f (τ ′ )]
[F (τi )−F (τ ′ )]
i
and identifying in (1.19) bi with F (τi ) − F (τ ′ )
and
[f (τ ′ )−f (τi−1 )]
[F (τ ′ )−F (τi−1 )]
{τ }∪{τ ′ }
τ
6 Iq,i
Iq,i
.
and φ (x) with x2 , we have the
(1.21)
As it was expected, adding a threshold, or equivalently a quantizer interval, increases the FI
and, as consequence, it decreases the CRB, enhancing estimation performance. Note that
this is also true if we start with an optimal partition (a partition that maximizes the FI) and
we add a threshold arbitrarily, however, in this case, the final interval partition may not be
optimal within the class of quantizers with NI + 1 intervals, even if we try to optimize the
new threshold position.
As adding thresholds increases the FI and as Iq is bounded above by Ic , the FI tends to
a limit value when NI tends to infinity. An interesting point to be studied is to know if we
can make it converge to Ic . This will be done in Part II, where we will see that, under some
regularity assumptions on the quantizer intervals, Iq converges to Ic .
Now, to have a more precise characterization of the estimation performance as a function
of NI , we must first describe how it is influenced by τ . For the optimal τ , we will be able to
obtain the dependence of the estimation performance only on NI and the noise characteristics.
44
1.3
Chapter 1. Estimation of a constant parameter
Binary quantization
′
We begin the analysis by the binary case, NI = 2. For binary observations (τ−1
= −∞
′
and τ1 = ∞), the CRB for
(1.13) in the CRB. As
N measurements′ can be written by ′using
′
= 0, 1 − F (ε + τ1 ) = 0 and F ε + τ−1 = 0 by assumption AN2,
f (ε + τ1′ ) = 0, f ε + τ−1
we obtain
F (ε) [1 − F (ε)]
CRBB
.
(1.22)
q =
N f 2 (ε)
The analysis of performance in this case reduces to the analysis of the function
B (ε) = N CRBB
q =
1.3.1
F (ε) [1 − F (ε)]
.
f 2 (ε)
(1.23)
The Gaussian case
This function was studied in the Gaussian noise case in [Papadopoulos 2001] and revisited in
[Ribeiro 2006a]. In this case,
1
ε 2
f (ε) = √ exp −
,
(1.24)
δ
πδ
where√δ is the noise scale factor, which can be linearly related to the standard deviation σ
(δ = 2σ). By plotting B as a function of ε (see Fig. 1.3), it was noted in [Papadopoulos 2001]
that the minimum value B ⋆ is attained for ε = 0 and that B (ε) increases when |ε| increases.
Thus, the optimal threshold τ0⋆ must be equal to x and the minimum value of B (ε) is B ⋆ =
2
1
= πδ4 . We can compare the CRB for one continuous measurement Bc = I1c with B ⋆ ,
4f 2 (0)
to have an idea about the loss of performance. Using (1.14), (1.15) and the expression for the
PDF of the Gaussian distribution (1.24), we have:
Bc =
or equivalently
1
=
Ic
2 ⋆
δ2
i2 = 2 = π B ,
∂ log f (y−x)
1
E
h
∂x
π
Bc ≈ 1.57Bc .
2
The performance loss due to binary quantization is surprisingly small. However, note that
this requires τ0 = x, which is impossible to do in practice as x is the unknown parameter to
be estimated. For increasing |ε| > 0, we can observe that B increases in a rather sensitive
way.
B⋆ =
An upper bound on B was given in [Ribeiro 2006a] by noting that the product in the
numerator can be bounded by the following exponential (Why? - App. A.1.2):
1
ε 2
F (ε) [1 − F (ε)] 6 exp −
.
(1.25)
4
δ
This bound can be used in (1.23) with (1.24) to obtain
π
ε 2
,
B (ε) 6 B̄ (ε) = exp +
4
δ
(1.26)
1.3. Binary quantization
45
which is a function that increases exponentially with ε. To confirm that the bound is tight,
at least for moderate ε, we plot the function B̄ also in Fig. 1.3.
B
B̄
Bound ×
1
δ2
6
4
2
π/4
0
−1.5
−1
−0.5
0
0.5
1
1.5
ε
δ
Figure 1.3: Quantity related to the CRB for quantized measurements, B, as a function of the
normalized difference δε between threshold and parameter. B̄ is its upper bound, which has
an exponential form. The noise distribution is the Gaussian distribution and the normalizing
factor δ is the Gaussian noise scale parameter. The normalizations in both axis are done to
be able to have a plot independent of δ.
Therefore, for the Gaussian case, we can conclude that the estimation performance loss
for the binary case is relatively small if we set the threshold at the true parameter value, but
it increases rapidly when we quantize far from it.
1.3.2
The Laplacian case
We can try to look to another symmetric unimodal distribution to see if the same happens. For example, we can consider the Laplacian distribution, whose PDF and CDF are
ε 1
(1.27)
exp − ,
2δ
δ
where sign (ε) is the sign function
f (ε) =
sign (ε) =
F (ε) =



1
ε i
1 sign (ε) h
+
1 − exp − ,
2
2
δ
(1.28)
, if ε > 0,
0
, if ε = 0,


−1 , if ε < 0.
Applying (1.27) and (1.28) to (1.23), we get
n
ε o h 1 sign(ε) ε i
sign(ε) 1
−
+
1
−
exp
−
1
−
exp
−
2
2
δ
2
2
δ
ε B =
1
exp −2 δ
4δ 2
ε 1
2
1
1
1
− exp −2 ε ε
−
exp
−
1
−
exp
−
δ
4 δ
ε δ
= 2
= 4 41
1
ε
exp
−2
exp
−2
2
2
δ
4δ
h 4δ ε δ i
2
= δ 2 exp − 1
δ
(1.29)
46
Chapter 1. Estimation of a constant parameter
and we can see that B and consequently the CRB is minimized for τ0 = x and that it is
sensible to ε, growing exponentially when we increase |ε| = |τ0 − x|.
1.3.3
The general case
We can try to verify if the increasing behavior of B (ε) w.r.t. |ε| will be observed in the general
case, when the noise PDF is unimodal and symmetric.
Attempt of global analysis: dead end
A
For unimodal symmetric distributions we have that f (ε) = f (−ε) and F (ε) = 1 − F (−ε).
Therefore, as it was observed for the specific Gaussian and Laplacian cases, B (ε) is a symmetric function. For analyzing if the increasing behavior is true in general, we can concentrate
the analysis on the first derivative of B w.r.t. ε, for ε > 0. The derivative is
f 2 (ε) [1 − 2F (ε)] − 2F (ε) [1 − F (ε)] f (1) (ε)
dB
=
,
dε
f 3 (ε)
(1.30)
where f (1) (ε) is the first derivative of the PDF w.r.t. ε, supposed to exist3 . Observe that
if the distribution is symmetric, we have 1 − 2F (0) = 0 and only the second term in the
numerator can be nonzero for ε = 0. Adding the condition that f (1) (0) = 0 makes ε = 0 to
be a local extremum of B, being a candidate point to be a local minimum.
In a first attempt to verify if ε = 0 is a global minimum, we can calculate the second
derivative and look if its sign is negative for all ε. If we calculate the second derivative we get
−3f 2 (ε) f (1) (ε) [1 − 2F (ε)] + F (ε) [1 − F (ε)] 6f (1) 2 (ε) − 2f (ε) f (2) (ε)
d2 B
=
− 2,
dε2
f 4 (ε)
(1.31)
with f (2) (ε) the second derivative supposed to exist3 . Even using the assumptions on the
noise distribution, we cannot get any conclusion on the sign of the second derivative. Thus,
we can try to go back to the first derivative and analyze its sign. Using the symmetry of B, a
sufficient condition for ε = 0 to be a global maximum is that dB
dε > 0 for ε > 0. The derivative
dB
dε has the same sign of the numerator in the RHS of (1.30), therefore, we can obtain the
condition
f 2 (ε) [2F (ε) − 1]
,
− f (1) (ε) >
2F (ε) [1 − F (ε)]
using the fact that the density is monotonically decreasing (f (1) (ε) < 0 for ε < 0) and
symmetric ([2F (ε) − 1] > 0 for ε > 0), we can write
(1) f 2 (ε) [2F (ε) − 1]
.
f (ε) >
2F (ε) [1 − F (ε)]
(1.32)
Unfortunately, by using the assumptions on the distribution we cannot go further. But, at
least, we can use the condition above (1.32) to verify empirically, for commonly used noise
3
This rules out the evaluation of this quantity for ε = 0 in the Laplacian case, which is not a problematic
case, as we know analytically that B is strictly increasing with |ε| in this case.
1.3. Binary quantization
47
models, the increasing behavior of B (ε) with |ε|. For doing so, we (re)tested the Gaussian
and Laplacian distributions with (1.32), we also added a heavy-tailed distribution4 to see if
in this case the conclusions change. The heavy-tailed distribution is the Cauchy distribution
with PDF and CDF given respectively by
1
1
ε
1 1
i,
h
f (ε) =
(1.33)
2
F
(ε)
=
+
arctan
.
(1.34)
πδ 1 + ε
2 π
δ
δ
For the three distributions (Gaussian, Laplacian and Cauchy), we calculated the quantity M =
(1) f 2 (ε)[2F (ε)−1]
f (ε) −
2F (ε)[1−F (ε)] , which must be positive to have the monotonic increasing behavior of
B w.r.t. |ε|. The result is displayed in Fig. 1.4, where we observe that this is indeed true.
Gaussian
Cauchy
Laplacian
M × δ2
0.4
0.2
0
0
1
2
3
4
5
ε
δ
Figure 1.4: M ×δ 2 as a function of δε > 0. The plot is given for three types of noise distribution:
Gaussian, Cauchy and Laplacian. All distributions have a noise scale parameter denoted δ.
The function must be positive for the optimal threshold in binary quantization to be exactly
placed at the true parameter τ ⋆ = x. The normalizations in both axis are done to be able to
have a plot independent of δ.
Local analysis
As condition (1.32) is difficult to verify in general, we can try to analyze the local behavior of
B (ε) around ε = 0. Even if the results will be weaker, as they will only be local results, we
can expect that the conditions for ε = 0 to be a local minimum of B (ε) will be easy to verify.
We saw above that if f (1) (0) = 0, then we have an extremum of B (ε) at ε = 0. If we use
one more time the assumption f (1) (0) = 0 on the second derivative at zero and the symmetry
(F (0) [1 − F (0)] = 14 ), we get
1 f (2) (0)
d2 B =
−
− 2.
dε2 ε=0
2 f 3 (0)
4
A heavy-tailed distribution is a distribution whose ratio between 1 − F (x + y) and 1 − F (x) is equal
to one when x tends to infinity [Sigman 1999]. A subclass of this family is the class of all sub-exponential
distributions, where the Student-t distributions (for which the Cauchy distribution is a special case) and
Paretian distributions are included.
48
Chapter 1. Estimation of a constant parameter
2 > 0. When we
For ε = 0 to be a local minimum of B (ε), we have the condition ddεB2 ε=0
apply this condition to the expression above, we can obtain the following condition on the
noise PDF and its second derivative:
− f (2) (0) > 4f 3 (0) .
(1.35)
For the Gaussian distribution this condition is satisfied as we have
1 2
1 4
− f (2) (0) = 3 √ > 4f 3 (0) = 3 3
δ
δ π2
π
and also for the Cauchy distribution
− f (2) (0) =
1.3.4
1 2
1 4
> 4f 3 (0) = 3 3 .
3
δ π
δ π
Asymmetric threshold: surprising cases B
Surprisingly, we can find symmetric distributions and even a class of unimodal symmetric
distributions for which the condition (1.35) is not satisfied, as a consequence, for these distributions, ε = 0 can be a local maximum instead of a local minimum.
The uniform/Gaussian case
A simple way to define a symmetric distribution that does not satisfy (1.35) is to set the values
of the PDF around zero to a nonzero constant, in this way f (0) > 0 and f (2) (0) = 0. This
makes the second derivative at ε = 0 to be negative, leading to a local maximum of B (ε) at
that point.
As an example we can consider a noise PDF that is uniform in the interval − α2 , α2 , where
α ∈ R+ , and that decreases as a Gaussian distribution with a standard deviation parameter
σ outside this interval. We call this noise distribution the uniform/Gaussian distribution and
the analytic expression for its PDF is

α 2 
ε+ 2
1
1


, for ε < − α2 ,
fGL (ε) = C √2πσ exp − 2

σ


(1.36)
f (ε) = fU (ε) = C √12πσ ,
for − α2 ≤ ε ≤ α2 ,


α 2


1 ε− 2
1

, for ε > α2 ,
fGR (ε) = C √2πσ exp − 2
σ
α
where C = 1 + √2πσ
is a normalization constant that makes the integral of the PDF to be
equal to one. This PDF is depicted in Fig. 1.5.
To obtain the function B (ε), we have to describe the CDF of the uniform/Gaussian r.v..
If we denote Φ (ε) the CDF of a standard Gaussian distribution (the CDF for a Gaussian with
σ = 1), we obtain the following:

α
ε+ 2
1

Φ
,
for ε < − α2 ,

σ

C h
i
1
ε + α2 , for − α2 ≤ ε ≤ α2 ,
(1.37)
F (ε) = C1 12 + √2πσ

α i
h

ε+

2
 1 √α + Φ
, for ε > α2 .
C
σ
2πσ
1.3. Binary quantization
49
fU (ε)
fGL (ε)
fGR (ε)
σ
σ
− α2
α
2
ε
Figure 1.5: PDF for the uniform/Gaussian distribution. The center region is uniform with
width α, while the left and right sides are Gaussian with standard deviation parameter σ.
Using (1.36) and (1.37) in the expression for B (ε) (1.23), we get
F (ε) [1 − F (ε)]
B (ε) =
=
f 2 (ε)

2 α h
α i
ε+ 2
ε+ 2
ε+ α

2
2

Φ
C
−
Φ
,
for ε < − α2 ,
2πσ
exp

σ
σ
σ




2
ε2
1
2πσ 2 21 + α2 √2πσ
− 2πσ
=
,
for − α2 ≤ ε ≤ α2 ,
2



α i h
2 h
α i

ε− 2
ε− 2
ε− α

2
2

√α + Φ
1
−
Φ
, for ε > α2 .
2πσ exp
σ
σ
σ
2πσ
observe that in the interval − α2 ,
mum at zero.
α
2
(1.38)
, the function is concave, so we really have a local maxi-
For observing the global behavior of this bound, we plotted CRBB
q for a number of samples
N = 500, α = 1, σ = 1 and for values of ε in the interval [−2, 2]. For verifying that the
behavior of the bound was close to the true MSE of the MLE, we simulated the MLE 105
times for N = 500, the simulation results were used for evaluating a simulated MSE. The
details on the implementation of the MLE for binary quantization will be presented further in
(a1.1) in Sec. 1.3.6 and for more specific implementation details about the uniform/Gaussian
case see (More? - App. A.2.2). The simulation of the noise was done by exploiting the fact
that the uniform/Gaussian distribution is a mixture of distributions that are easy to sample
(How? - App. A.3.1). The results are shown in Fig. 1.6.
We can observe the concave behavior of the bound around ε = 0 and the presence of
two minima at points different from ε = 0. This shows that for this type of noise, binary
quantization must be done in an asymmetric way, by shifting the central threshold to a zone
where the noise is not uniform. Note also that if we shift too much, the performance starts
too degrade again. We suspect that this asymmetric behavior comes from the fact that for
the uniform distribution the most informative points in statistical sense are the boundaries of
the distribution (where it passes from a positive value to zero). Finally, we can also see that
the MSE for the MLE is quite close to the bound, indicating that we can use the bound for
analyzing the behavior of the MSE.
50
Chapter 1. Estimation of a constant parameter
1.2
·10−2
CRBB
q
Sim. – MLE
MSE
1
0.8
0.6
−2
0
ε
−1
1
2
Figure 1.6: CRBB
q and simulated MLE MSE for uniform/Gaussian noise. Both the bound
and simulated MSE were evaluated for a number of samples N = 500 and for ε in the interval
[−2, 2]. The MSE for the MLE was evaluated through Monte Carlo simulation using 105
realizations of blocks with 500 samples. We considered the following noise parameters: α = 1
and σ = 1.
The generalized Gaussian case
We can also look if there are noise distributions without the central uniform behavior for which
the condition on the second derivative (1.35) is not respected. All distributions that have zero
second derivative at ε = 0 will not respect the condition. To have zero second derivative at
zero, the PDF must be flat around zero. A class of distributions for which we can control the
flatness around zero by changing a parameter is the generalized Gaussian distribution
(GGD). A more detailed presentation of the GGD will be given in Ch. 3 with the motivation
for using it as a noise model. Here, we will present only its PDF and CDF, which are given
respectively by
f (ε) =
F (ε) =
β
ε β
exp − ,
δ
2δΓ β1


1 ε β
γ
β, δ
1
,
1 + sign (ε)
2
Γ 1
(1.39)
(1.40)
β
where δ is the noise scale parameter, β is a shape parameter which allows for controlling the
flatness around zero. Both δ and β are constrained to be strictly positive δ > 0, β > 0. Γ ()
is the gamma function
+∞
Z
Γ (x) =
z x−1 exp (−z) dz
0
1.3. Binary quantization
51
and γ (, ) is the incomplete gamma function
γ (x, w) =
Zw
z x−1 exp (−z) dz.
0
We need to calculate f (2) (ε) at ε = 0. For doing so, we will evaluate the derivatives for ε < 0
and ε > 0 and then we will evaluate their limits when ε tends to zero.
For the first derivative we have
h

−ε β−1

D
exp
−

δ

f (1) (ε) =
h


−D ε β−1 exp −
δ
where D =
2
β
.
2δ 2 Γ
1
β
−ε β
δ
ε β
δ
i
i
,
, for ε < 0,
for ε > 0,
Observe that if β ≤ 1, then the first derivative at zero is not defined.
For β > 1, the derivative is zero.
For the evaluation of the second derivative we will consider β > 1. We get the following
second derivative:
i
h
 h (β−1)
i
β −ε 2(β−1)
−ε β−2
−ε β

D
−
+
exp
−
, for ε < 0,

δ
δ
δ
δ
δ

f (2) (ε) =
h
i
h
i


D − (β−1) ε β−2 + β ε 2(β−1) exp − ε β ,
for ε > 0.
δ
δ
δ
δ
δ
We can see that for 1 < β < 2, the derivatives when ε approaches zero are both −∞. For
these cases, the point ε = 0 is a local minimum of B (ε). In the Gaussian case β = 2, the
second derivative has a finite negative value and we saw before that ε = 0 is a local minimum
(empirically we also observed that it is a global minimum). For the cases β > 2, the second
derivative is zero, thus corresponding to the special cases of local maximum that we were
looking for.
The function B (ε), that we expect to be a "w" shaped function for β > 2, can be evaluated
using (1.39) and (1.40) in the expression for B (ε) (1.23). This gives


2 1 , ε β
2 Γ2 1
γ
δ
β
β
β δ
F (ε) [1 − F (ε)]
 exp 2 ε .
1 −
(1.41)
=
B (ε) =
f 2 (ε)
β2
δ
Γ2 1
β
As in the uniform/Gaussian case, we also plotted CRBB
q and the simulated MSE of the MLE.
We used N = 500, β = 4, δ = 1 and values of ε in the interval [−1, 1]. We simulated the MLE
105 times and the results were used to obtain an estimate of the true MSE for this estimator.
For more specific implementation details about the MLE in the GGD case see (More? - App.
A.2.3). The GGD noise was generated using transformations of gamma variates (How? - App.
A.3.2). The results are shown in Fig. 1.7.
We can notice again that the optimal threshold must be placed in an asymmetric way
and also that the simulated estimation performance is close to the bound. Contrary to the
52
Chapter 1. Estimation of a constant parameter
·10−3
CRBB
q
Sim. – MLE
MSE
1.6
1.4
1.2
−1
−0.5
0
ε
0.5
1
Figure 1.7: CRBB
q and simulated MLE MSE for GGD noise. Both the bound and simulated
MSE were evaluated for a number of samples N = 500 and for ε in the interval [−1, 1]. The
MSE for the MLE was evaluated through Monte Carlo simulation using 105 realizations of
blocks with 500 samples. We considered the following noise parameters: β = 4 and δ = 1.
uniform/Gaussian case, we cannot have a clear interpretation on the position of the minimum
point. The minimum point was observed to be sensible to changes in β and δ. It was also
observed that as we set β closer to 2 (the Gaussian case), we obtain a difference of performance
between the point ε = 0 and the minimum point that gets smaller. On the other hand, as we
increase β (getting closer to the uniform distribution), the difference seems to increase.
1.3.5
Conclusions on binary quantization performance
To conclude, we can say that the best estimation performance in the binary case for commonly
used noise models (CRBB
q ) is obtained for ε = 0 or τ0 = x:
CRBB,⋆
=
q
1
F (0) [1 − F (0)]
=
,
2
N f (0)
4N f 2 (0)
(1.42)
which is also a lower bound on the asymptotically achievable performance. However, even
under the unimodal symmetric assumption, this rather intuitive conclusion is not always true.
From the local condition on the second derivative, we can see that if the noise PDF is slightly
flat around zero, then a "w" shaped performance function will appear, leading to an optimal
threshold that might be placed asymmetrically w.r.t. to its input r.v. distribution.
The variation between the performance for the point ε = 0 and the minimum CRBB,⋆
q
in the asymmetric cases seems to depend on the flatness of the distribution. An increased
flatness around zero, seems to be related to an increased performance variation. This strong
dependence between the shape of the CRB and the noise distribution seems to be a good
subject for future work.
Another interesting direction for future work on this issue about asymmetry is to analyze how it can appear on the detection problem using binary quantized measurements. It
1.3. Binary quantization
53
appears that such behavior will be present for the same noise distributions considered above
(uniform/Gaussian and GGD) in the problem of local optimum detection of signals based
on binary quantized measurements. For this problem, it can be shown that the asymptotic
performance depends also on the FI for quantized measurements [Kassam 1977].
1.3.6
MLE for binary quantization
The specific implementation of (a1) in the binary case with a fixed threshold can be done
in a simple way [Papadopoulos 2001] (and revisited in [Ribeiro 2006a]). The sequence of N
quantized measurements can be observed as a sequence of N i.i.d. samples from a Bernoulli
distribution with probability p = P (ik = 1) = 1 − F (τ0 − x). Thus, hiding the functional
dependency on x and τ0 , we can calculate the likelihood of p with the sequence i1:N .
The likelihood of p for i1:N can be written in a simple form by observing the following:
• for a measurement ik , P (ik = 1; p) = p and P (ik = −1; p) = 1 − p. We can write
P (ik ; p) in a form pf1 (ik ) (1 − p)f−1 (ik ) , where the functions f1 and f−1 are respectively
1 and 0 when ik = 1, and 0 and 1 when ik = −1. A Simple choice for these functions is
k
f1 (ik ) = ik2+1 and f−1 (ik ) = 1−i
2 .
• As the measurements are independent, the likelihood for the sequence i1:N will be the
product of the marginal likelihoods P (ik = 1; p).
This leads to
L (p, i1:N ) =
N
Y
p
ik +1
2
k=1
(1 − p)
1−ik
2
.
Calculating its logarithm and then evaluating the MLE for p denoted P̂M L , we get the following
[Wasserman 2003, p. 123]:
N
1 X 1 + ik
P̂M L =
.
(1.43)
N
2
k=1
The MLE (in general) has the property that if we want to estimate a parameter x which is
an invertiblefunction
of z, x = g (z) and we know the MLE for z, ẐM L , then the MLE for x
is X̂M L = g ẐM L [Kay 1993, p. 176]. This property is known as functional invariance. For
our problem we can write
x = g (p) = τ0 − F −1 (1 − p) ,
(1.44)
F −1 is the inverse of the noise CDF. By definition F −1 is invertible, as F is strictly increasing
due to the monotonicity assumption on F , so the function g in this case is invertible. Thus,
by the functional invariance of the MLE, we can obtain X̂M L,q , after replacing p in (1.44) by
P̂M L given by (1.43). This leads to an analytical expression for the MLE:
"
!#
N
X
1
1
X̂M L,q = g P̂M L = τ0 − F −1 1 − P̂M L = τ0 − F −1
1−
ik
.
(1.45)
2
N
k=1
54
Chapter 1. Estimation of a constant parameter
Therefore, the solution to problem (a) (p. 27) in the binary case can be detailed as follows
Solution to (a) - MLE for binary quantized measurements and fixed threshold τ0
(a1.1) 1) Estimator
X̂M L,q = g P̂M L
= τ0 − F −1 1 − P̂M L
!#
"
N
X
1
1
1−
ik
.
= τ0 − F −1
2
N
k=1
2) Performance (asymptotic)
X̂M L,q is asymptotically unbiased
h
i
E X̂M L,q
=
N →∞
x
and its asymptotic MSE or variance is given by
Var X̂M L,q
∼
N →∞
CRBB
q =
F (τ0 − x) [1 − F (τ0 − x)]
,
N f 2 (τ0 − x)
which is minimal for commonly used noise models (Gaussian, Laplacian and Cauchy distributions) if τ0 = x, attaining 4N f12 (0) and increases with |τ0 − x|.
Notice that this algorithm can be used for any noise distribution, not only for symmetric
unimodal distributions.
1.4
Multibit quantization
Now, we study the multiple interval (multibit) case, NI > 2. The expression characterizing
estimation performance for this case is given by (1.13):
NI  2 
′
′

X2  f ε + τi′ − f ε + τi′ −1 2
−
f
ε
−
τ
f
ε
−
τ
ik −1
ik
k
k
+ .
Iq (ε) =
 F ε + τ′ − F ε + τ′
F ε − τi′k −1 − F ε − τi′k 
ik =1
ik −1
ik
ik =
We remind that a larger Iq (ε) gives a better asymptotic estimation performance. We will
start by analyzing the influence of the central threshold.
1.4. Multibit quantization
55
For verifying symmetry, we replace ε by −ε.
NI  2 
′
′

X2  f −ε + τi′ − f −ε + τi′ −1 2
−
f
−ε
−
τ
f
−ε
−
τ
ik
ik −1
k
k
+ .
Iq (−ε) =
 F −ε + τ ′ − F −ε + τ ′
F −ε − τi′k −1 − F −ε − τi′k 
ik =1
ik
ik −1
ik =
The following equalities come from the symmetry assumptions:
F −ε + τi′k = 1 − F ε − τi′k ,
F −ε + τi′k −1 = 1 − F ε − τi′k −1 ,
f −ε + τi′k −1 = f ε − τi′k −1 ,
F −ε − τi′k −1 = 1 − F ε + τi′k −1 ,
f −ε − τi′k −1 = f ε + τi′k −1 ,
F −ε − τi′k = 1 − F ε + τi′k .
f −ε − τi′k = f ε + τi′k ,
Applying these expressions to Iq (−ε) and multiplying by −1 inside the squared terms we get
that Iq (ε) = Iq (−ε), thus, the even symmetry observed in the binary case can be extended
to this case.
f −ε + τi′k
1.4.1
= f ε − τi′k ,
The Laplacian case
Now, we start with the Laplacian case which is easy to be treated analytically. If we set ε = 0,
NI  2 
′
′

X2  f τi′ − f τi′ −1 2
f −τ
− f −τik
k
k
+ ik −1 ,
Iq (0) =
 F τ′ − F τ′
F −τi′k −1 − F −τi′k 
ik =1
ik −1
ik
ik =
using also the symmetry assumption (similar development as above), one can easily observe
that the second term inside the sum terms is equal to the first, which means that we can
rewrite the sum as
N 
2 
ik = 2I  ′
′
X
f τik − f τik −1 
.
Iq (0) = 2
(1.46)
 F τ′ − F τ′

ik =1
ik −1
ik
Using the PDF and CDF for the Laplacian distribution (1.27) and (1.28), separating the
last term of the sum and simplifying the notation for the absolute value and sign functions
(τi′k ≥ 0), we obtain

"
!#2 

τ ′N
′ ′ 2 


I −1


τi
τi −1

 NI


1
2



1
k
k
exp
−
2




−1
i
=
exp
−
−
exp
−
δ
4δ
2
2


 k X
δ
δ
4δ
′ ′ !
+
Iq (0) = 2
τ ′N
τi −1
τi




I
1



k
k
1


 ik =1

2 −1
exp
−
−
exp
−




exp
−
2
δ
δ


2
δ


 N

 ′




ik = 2I −1 ′ 
′
τ




NI
X
τik −1
τi k
1
−1
2


=
exp
−
+
exp
.
−
−
exp
−


δ2 
δ
δ
δ
 ik =1



The terms inside the sum (in the Σ operator) cancel each other except for the first and last
term, the last term and the term outside the sum also cancel each other. Iq (0) is then given
56
Chapter 1. Estimation of a constant parameter
by only one term, which is Iq (0) =
1
δ2
′
τ
exp − δ0 , as τ0′ = 0, we have
Iq (0) =
1
.
δ2
Surprisingly, this is exactly the same as the FI for continuous measurements (Why? - App.
A.1.3). Thus, this means that not only τ0 = x is optimal for the Laplacian distribution but
also that no loss of performance is observed. As the quantized measurement FI can only
increase by adding quantization intervals and as it is upper bounded by the continuous FI,
we see that once we have placed the threshold at x, the quantized measurement FI will be
the same for all NI ≥ 2. This means that in practice, as we want to minimize the rate, the
optimal choice of number of quantization intervals will be NI = 2.
1.4.2
The Gaussian and Cauchy cases under uniform quantization
Instead of diving into calculus for trying to obtain some characterization of Iq as a function
of ε, we preferred to directly plot its influence for a given set of thresholds. We evaluated
Iq given by (1.13) as a function of δε with δε ∈ [−10, 10]. The evaluation was done for the
Gaussian and Cauchy distributions. The quantizer was assumed to have NI = 8 and a
uniform step ∆ between thresholds, which means that τ ′ = [0 ∆ 2∆ 3∆ + ∞]. Here,
uniform quantization was assumed only to simplify the presentation.5 Three different ∆ were
chosen for the evaluation, ∆ = 0.1δ, ∆⋆ and 2δ. ∆⋆ was chosen as the maximizer of Iq when
ε = 0 and it was obtained by exhaustive search. The results are given in Fig. 1.8 where the
continuous FI Ic is also plotted for comparison. Remember that for the Gaussian distribution
Ic = δ22 . For the Cauchy distribution we have Ic = 2δ12 (Why? - App. A.1.4 ).
Observe that in all cases the point ε = 0 gives maximum Iq . Note that differently from the
binary case, the FI does not strictly decrease when |ε| increases, this only happens when |ε|
is outside the quantizer range. We can also see that the optimal ∆ gives Iq values very close
to Ic .
It is also interesting to observe that when we choose ∆ very large compared with ∆⋆ , we
obtain a maximum Iq smaller than for ∆⋆ , but this Iq does not decrease to zero inside the
quantizer range. This indicates that when we have a prior information on the interval of values
where x is located, then a more robust solution can be found by using a large quantization step
(for example by using a ∆ that is equal to the prior interval length divided by NI ). Clearly
in this case, the price to pay is that even if we have ε = 0 the performance is lower than the
optimal, being very close to the performance for a binary quantizer.
Differently from the binary case, after evaluating Iq (ε) for the GGD with β > 2 and
NI > 2, it was observed that when we use ∆⋆ as quantization step, the symmetric quantizer
assumption seems to force the performance to be optimal for ε = 0. Less surprisingly now,
when the quantization step is chosen too large, the asymmetric behavior appears, this is due to
the fact that the performance around ε = τi′ is very close to the binary quantizer performance.
In the same way as for the other noise distributions considered above, it was also observed
that when the parameter is outside the quantizer range the performance is degraded.
5
It will be shown in Part II, that for large NI , the optimal quantization intervals may not be uniform.
1.4. Multibit quantization
57
0.8
∆ = 0.1δ
∆⋆
∆ = 2δ
Ic
0.6
FI × δ 2
FI × δ 2
2
∆ = 0.1δ
∆⋆
∆ = 2δ
Ic
1
0.4
0.2
0
−10
−5
0
5
10
0
−10
0
−5
ε
δ
5
10
ε
δ
(a)
(b)
Figure 1.8: FI for a range [−10, 10] of normalized difference δε between the central threshold
τ0 and the true parameter x. The quantizer with NI = 8 is uniform with quantization
interval length (in the granular region) ∆. In (a), the noise distribution is Gaussian and
∆ = 0.1δ, ∆⋆ = 0.399δ, 2δ. ∆⋆ is the optimal quantization step for δε = 0. In (b), the Cauchy
noise distribution is used and ∆ = 0.1δ, ∆⋆ = 0.5878δ, 2δ. ∆⋆ is also the optimal quantization
step for δε = 0. For both cases, Ic is the FI for the continuous measurement. In the Gaussian
case Ic = 2δ 2 , while in the Cauchy case Ic = 21 δ 2 . The normalizations on the difference range
and also on FI were done to be able to have a plot independent of δ.
In all tested cases6 (under the symmetry assumptions), it was observed that when ∆⋆ is
used, ε = 0 is the optimal solution. Thus, we can say that for commonly used noise models,
if the quantization thresholds are well chosen, τ0 = x is optimal. The "commonly used" term
here seems to be larger than in the binary case, as all the GGD with β > 2 do not have
anymore the asymmetric behavior for the optimal central threshold.
After setting τ0 = x, we still need to characterize the other thresholds to have a full
performance characterization depending only on NI . This can be equivalently stated as finding
the variations from the central thresholds τ ′ maximizing Iq (0) given by (1.46):
Iq⋆ = argmax Iq (0) .
(1.47)
τ′
Unfortunately, an analytical solution cannot be found in general. An efficient solution for this
problem could be obtained if this problem was convex or convexifiable [Boyd 2004], but this is
not the case, so this is a very complicated multidimensional maximization problem. A possible
solution to it is to fix the quantizer to be uniform, then in this case the problem is still one
dimensional and it can be solved by exhaustive search (searching for the maximum on a fine
grid of possible values). Existence of a non-degenerate solution (0 < ∆⋆ < ∞) is guaranteed
by the following argument: for ∆⋆ → +∞, all the distribution is concentrated on the first
quantizer interval (remember that ε = 0), thus Iq will be equal to the binary case Iq and for
∆⋆ = 0, we get directly the binary quantization performance. As it was explained above, Iq
6
Two families of distributions were tested, the GGD and the Student-t distribution which will be presented
later in Ch. 3. They were tested with uniform symmetric quantizers.
58
Chapter 1. Estimation of a constant parameter
increases when we add thresholds, so at least one non-degenerate solution must exist. For
a non-uniform solution, we can try to use local approximations by using Taylor series, this
subject will be left to Part II.
1.4.3
Summary of the main points
Thus, up to this point we have:
• estimation performance based on quantized measurements is bounded above by the
estimation performance based on continuous measurements.
• Adding quantization levels does not decrease estimation performance (it might increase
in most of the cases).
• The optimal central threshold τ0 must be placed at the true parameter x for commonly
used noise models (Gaussian, Laplacian, Cauchy distributions). If we consider NI > 2,
symmetric thresholds w.r.t. the central threshold and well chosen quantization intervals,
then it seems that τ0 = x may be optimal for a large class of symmetric unimodal
distributions (for all the distributions above plus other members of the GGD).
• Maximizing the estimation performance w.r.t. the other thresholds (1.47) is in general
a complicated problem.
1.4.4
MLE for multibit quantization with fixed thresholds
As it was done in the binary case, we still need to precise how to implement the MLE. Note
that in this case the likelihood is given by (1.5)
L (x; i1:N ) =
N
Y
P (ik ; x) .
k=1
Now, the MLE cannot be written in simple form and we must resort to numerical maximization. In general, we could use a steepest ascent algorithm, to iteratively climb the likelihood
function. As it was developed in [Ribeiro 2006a], an efficient solution can be found when the
noise distribution is log-concave. A log-concave distribution is a distribution for which its
logarithm is concave, a simple example is the Gaussian distribution. If f is log-concave, it is
known that P (ik ; x) is log-concave [Boyd 2004, p. 107] and also that their product (expression
above) is log-concave [Boyd 2004, p.105]. Thus, under this assumption the log of L is concave,
an efficient solution for finding the MLE is the Newton’s algorithm [Boyd 2004, p. 496] given
by [Ribeiro 2006a]:
∂ log P(ik ;x) ,
(1.48)
X̂M L,j = X̂M L,j−1 − ∂ 2 log∂x
P(ik ;x) 2
∂x
x=X̂M L,j−1
the subscript j is used to represent the iteration index and |x=X̂M L,j−1 means that the function
on its left is evaluated at the point x = X̂M L,j−1 . After starting the algorithm with an
1.4. Multibit quantization
59
arbitrary X̂M L,0
done until a pre-specified small minimum value εmin for
, the iterations are
the variations X̂M L,j − X̂M L,j−1 is crossed. All the interest in obtaining a concave problem
formulation comes from the fact that the Newton’s algorithm not only guarantees convergence
to a global maximum but also does it with quadratic convergence, i.e. when the iterates gets
close to the optimal value, at each iteration X̂M L,j gets 2 digits closer to x [Boyd 2004, p.
489].
Therefore, for NI > 2, with a fixed set of thresholds and considering that the distribution
is log-concave we have the following solution for problem (a) (p. 27):
Solution to (a) - MLE for quantized measurements with log-concave
noise distribution, NI > 2 and fixed τ
(a1.2) 1) Estimator
Define
an initial
guess on the estimate X̂M L,0 .
X̂M L,j − X̂M L,j−1 < εmin , do
X̂M L,j = X̂M L,j−1 −
Until
∂ log P(ik ;x) ∂x
∂ 2 log P(ik ;x) 2
∂x
x=X̂M L,j−1
and set j = j + 1. Then, X̂M L,q is set to the last X̂M L,j .
2) Performance (asymptotic)
X̂M L,q is asymptotically unbiased
i
h
=
E X̂M L,q
N →∞
x
and its asymptotic MSE or variance is given by
Var X̂M L,q
with Iq given by (1.13).
∼
N →∞
CRBq =
1
,
N Iq
60
Chapter 1. Estimation of a constant parameter
1.5
Adaptive quantizers:
the high complexity fusion center approach
The analysis and results above indicate that to get optimal estimation performance from
quantized measurements we must, in general, place the central threshold close to the true
parameter7 . This can be done by using the information given by the measurements to move
adaptively the central threshold. Main work that has been already done on this subject will
be presented in this section.
An adaptive scheme to estimate x based on a sensor network of binary quantizers is
presented in [Li 2007]. The main idea is that enhanced estimation performance can be obtained
if the sensors can place dynamically their thresholds around x. Here, we present an equivalent
sequential version using only one sensor. The following is proposed:
1. a sensor can communicate binary measurements to a fusion center. The sensor measurement noise sequence is supposed to be i.i.d..
2. The sensor starts with a known binary threshold τ0,0 , where the second subscript is for
the discrete-time index. Note that now the threshold will be considered to be varying.
3. At each instant k, the sensor obtains a binary quantized measurement ik (ik ∈ {−1, 1}).
4. The sensor then updates the threshold by the following simple cumulative rule:
τ0,k = τ0,k−1 + γik ,
(1.49)
where γ is a constant positive adaptation step (see the remarks after the MLE definition).
5. The sensor sends its measurement ik to the fusion center.
6. The fusion center updates its τ0,k , and stocks in a memory both ik and τ0,k . Note
that the fusion center threshold is exactly the same as the one obtained in the sensor
threshold update.
7. After a predefined number of iterations, for example N , or at each iteration k, the fusion
center can get a more precise estimate of x (more precise than τ0,k ) by using a MLE
based on all past ik .
7
The literature on the subject also points in the same direction.
The case when x is constrained to lie in a bounded interval X of R was extensively studied in
[Papadopoulos 2001]. Main attention was given to the effects of different schemes for setting τ0 . The schemes
considered were: fixed, varying but random and i.i.d, varying deterministically and based on feedback. For
each scheme, the worst case CRBq (x was chosen to maximize the CRB) was evaluated and divided by the
continuous measurement CRB to give a measure on the performance loss induced by quantization. The loss
was shown to be more sensible w.r.t. an equivalent signal-to-noise ratio (the interval X length divided by the
noise scale factor) in the fixed case and insensible in the feedback case. Some solutions based on iterative
maximum likelihood techniques, which puts the new threshold on the last ML estimate, were presented but
no theoretical proofs that they reach the minimum CRBq were given.
In [Ribeiro 2006a], where the binary quantization Gaussian noise case was mainly studied, it was pointed
out that the sensibility of the estimation performance to ε and its optimality for ε = 0 indicates that, to
enhance performance, we could move adaptively the binary threshold, placing it on the last available estimate
X̂ to get closer and closer to the true x.
1.5. Adaptive quantizers: the high complexity fusion center approach
1.5.1
61
MLE for the adaptive binary scheme
As the threshold is dependent on the measurements, the measurements are not independent
anymore. However, as a measurement ik is dependent only on past measurements and this
dependence is done through τ0,k−1 , conditioned on the threshold that was used, the measurements are independent. This leads to the following likelihood and log-likelihood for the
measurements until time N :
L (x; i1:N ) = P(i1:N ; x) =
N
Y
k=1
=
=
log L (x; i1:N ) =
N
Y
k=1
N
Y
P(ik |τ0,k−1 ; x)
[1 − F (τ0,k−1 − x)]
k=1
N X
k=1
P(ik |ik−1 , · · · , i1 ; x)
1+ik
2
F (τ0,k−1 − x)
1−ik
2
(1.50)
,
1 + ik
1 − ik
log [1 − F (τ0,k−1 − x)] +
log F (τ0,k−1 − x) ,(1.51)
2
2
where the vertical bar inside the probability symbol means that the probability measure is
evaluated for the r.v. on the left side of the bar, conditioned on the r.v. on the right side of
the bar. The conditioning makes the output ik depend on τ0,k−1 as if it was a deterministic
parameter, that is why we can use the same notation with CDF F parametrized by a fixed
nonrandom threshold.
At the fusion center at time N , all the thresholds and binary measurements are known,
the maximum likelihood estimator can then be calculated by maximizing (1.50) or (1.51):
X̂M L,q = argmax
x
k=1
or
X̂M L,q = argmax
x
N
Y
N X
1 + ik
k=1
2
[1 − F (τ0,k−1 − x)]
1+ik
2
F (τ0,k−1 − x)
1−ik
2
(1.52)
1 − ik
log [1 − F (τ0,k−1 − x)] +
log F (τ0,k−1 − x) .
2
Note that the threshold moves with each measurement, while the estimate is obtained only
at the end of the measurement block. Observe also that when the noise distributions are logconcave, the MLE can also be obtained by using the Newton’s algorithm, as it was discussed
in the previous section.
Remarks: it is intuitive to expect that the mean τ0,k will reach an equilibrium after some
time. If the threshold is above the parameter, iteration (1.49) will reduce its value, in the
other case, if the threshold is below
iteration (1.49) will increase its value. In
h the parameter,
i
τ0,k −τ0,k−1
the mean equilibrium we have E
= E [ik ] = 0, thus as ik = 1 or ik = −1 the only
γ
possibility for this to happen is when P (ik = 1; x) = P (ik = −1; x) = 12 , which in the case of
symmetric noise distributions means to say that E [τ0,k ] = x.
62
Chapter 1. Estimation of a constant parameter
The variance of the thresholds will depend on the noise distribution but also on the parameter γ, if we choose γ to be relatively small, once the threshold is close to the parameter,
it will fluctuates around it with a small variance. The fact that the threshold updates are easy
to implement (it is just a cumulative sum) and that the estimator is a complex one goes well
with real implementation constraints, where complexity on the sensor side of the problem is
strongly constrained and on the fusion center side, it is less constrained.
1.5.2
Performance for the adaptive binary scheme
We must look now to the performance of this scheme. The performance analysis that is
presented here was proposed in [Fang 2008].
Even if the measurements are dependent, it is known that, under some conditions that are
satisfied here, the MLE will still attain the CRB [Crowder 1976]. Thus, the main problem
here will be the evaluation of the FI. As the measurements are dependent, the FI for N
measurements is not N times the FI for one measurement and we need to evaluate it using
the score function for the entire block of measurements. The FI for N measurements is
"
#2 
N
 X
∂ log P(ik |τ0,k−1 ; x) 
.
Iq,1:N = E [Sq ] = E


∂x
k=1
It was shown in [Li 2007] that this quantity is equal to
Iq,1:N =
N
X
k=1
f 2 (τ0,k−1 − x)
E
,
F (τ0,k−1 − x) [1 − F (τ0,k−1 − x)]
(1.53)
where the expectation is evaluated under the only r.v. that still appears on the expression,
τ0,k−1 . If we assume that τ0,0 = 0 then the τ0,k−1 is a random walk in an infinite grid (more
specifically finite for finite k) with values {−∞, · · · , −2γ, −γ, 0, γ, 2γ, · · · , +∞}.
For understanding how (1.53) was obtained, one can decompose the squared sum of score
functions into a sum of squared scores and a sum of score cross products. As the measurements
are independent, the expectation of each cross product will be the product of expectations.
The product of expectations will be zero because the expectation of a score function is zero
[Kay 1993, p. 67]. Therefore, Iq,1:N will be the expectation of the sum of squared scores.
Decomposing the expectation into an expectation on ik conditioned on the thresholds and an
expectation on the thresholds, one gets (1.53).
Denoting the probability of having τ0,k−1 = jγ by P (τ0,k−1 = jγ) = pj,k−1 we have:
Iq,1:N =
N X
+∞
X
k=1 j=−∞
f 2 (jγ − x)
pj,k−1 .
F (jγ − x) [1 − F (jγ − x)]
(1.54)
Note that this is equivalent to obtaining N measurements from a binary quantizer with a
random thresholding scheme that changes its prior threshold distribution pk−1 in time. The
prior distribution changes in a way that when k → ∞, it is expected that most of its probability
will be concentrated around the parameter. This is in contrast to the methods presented in
1.5. Adaptive quantizers: the high complexity fusion center approach
63
[Ribeiro 2006a], where x is random with a given prior and N binary thresholds are chosen
using a function of the prior distribution, in this case, having the right mode of the prior
distribution is crucial, while in the adaptive scheme above, the mode of pk−1 will be around
x for large k without any initial prior.
Putting the factors of (1.54) in (infinite dimension) vector notation
f 2 (−γ − x)
′
,
I q = ··· ,
F (−γ − x) [1 − F (−γ − x)]
f 2 (0 − x)
f 2 (γ − x)
,
, ···
F (0 − x) [1 − F (0 − x)] F (γ − x) [1 − F (γ − x)]
⊤
,
pk−1 = [· · · , p−1,k−1 , p0,k−1 , p1,k−1 , · · · ]⊤
(1.55)
(1.56)
allows to rewrite the sum of the products as a scalar product. Thus, (1.54) becomes
Iq,1:N =
N
X
⊤
I′ q pk−1 .
(1.57)
k=1
Using the definition of the threshold evolution (1.49), it is possible to observe that a specific
threshold value jγ has a probability of happening at instant k − 1 that depends on the
probabilities of having thresholds at (j − 1) γ or (j + 1) γ and of measuring ik−1 = 1 or
ik−1 = −1 respectively. This gives rise to a recursive equation for pj,k−1 :
pj,k−1 = pj−1,k−2 [1 − F ((j − 1) γ − x)] + pj+1,k−2 F ((j + 1) γ − x) .
(1.58)
This shows that the threshold values form a Markov chain, as the present probability of the
threshold values depends only on the previous probabilities pj,k−2 . It is possible to write the
vector of threshold pk−1 probabilities in recursive form
(1.59)
pk = Tpk−1 ,
where T is a (infinite dimensional) tridiagonal transition matrix, defined as follows
 T=

 ..
 .


1−F








0
..
..



.
.



(−2γ − x)
0
F (0 − x)
0
0

.
0
1 − F (−γ − x)
0
F (−γ − x)
0


0
0
1 − F (0 − x)
0
F (2γ − x)

.. 
..
..
. 
.
.

0
The stationarity theorem for Markov chains guarantees that pk−1 will attain an asymptotic
distribution p∞ [Fine 1968] (cited in [Fang 2008])8 and this distribution can be obtained by
solving the system of equations
p∞ = Tp∞ .
8
In [Fine 1968], it is shown that the possible threshold values can be separated in two classes of states, which
are periodic. The probability vectors for each class are shown to converge to unique asymptotic probability
vectors. The asymptotic probability vectors when put together form the vector p∞ .
64
Chapter 1. Estimation of a constant parameter
To solve this infinite dimensional system [Fang 2008] considered that only a part of the thresholds around the true parameter will have a non negligible probability, for practical purposes
it was considered that non negligible thresholds are those in the interval
Iτ = [−5σv − |x| , 5σv + |x|] ,
where σv is the standard
of the noise9 . The non negligible probability vector, denoted
l deviation
m
p̃∞ will have size 2 5σvγ+|x| + 1 = 2jmax + 1, where ⌈y⌉ is the closest integer that is larger
than y and the "+1" comes from the zero threshold. The approximate threshold distribution
can then be obtained by solving


p̃−jmax ,∞


..


.



p̃∞ =  p̃∞,0 
(1.60)
 = T̃p̃∞ ,


..


.
p̃jmax ,∞
where T̃ is the truncated transition matrix around the zero threshold (we show only the upper
left corner)

0

 1 − F (−γjmax + ε)

T̃ = 


F (−γ (jmax − 1) + ε)
0
..
.
F (−γ (jmax − 2) + ε)
..
.
0
..
0
One can also truncate I′ q only for the non negligible probability elements
Ĩ′q
0 

0
.

.


f 2 (γjmax + ε)
f 2 (−γjmax + ε)
, ··· ,
=
F (−γjmax + ε) [1 − F (−γjmax + ε)]
F (γjmax + ε) [1 − F (γjmax + ε)]
⊤
.
(1.61)
Following the development in [Fang 2008], after a finite time Nc the probability vector pk will
be indistinguishable from p∞ , thus when N → ∞, an infinity number of terms in Iq,1:N will
⊤
behave approximately as Ĩ′ q p̃∞ , which leads to the following asymptotic approximation of
the FI:
N
X
⊤
⊤
Iq,1:N =
I′ q pk−1 ∼ N Ĩ′ q p̃∞ .
(1.62)
k=1
N →∞
9
For one of the noise distributions considered here, the Cauchy distribution, the standard deviation is
undefined. In this case, one can use the scale parameter δ instead of the standard deviation σv .
1.5. Adaptive quantizers: the high complexity fusion center approach
65
This gives the following solution for problem (a) (p. 27):
Solution to (a) - MLE for binary quantized measurements with
adaptive thresholds given by a simple cumulative sum.
(a2.1) 1) Estimator
Define an initial threshold τ0,0 and a positive γ, then from
k = 1 to N :
• the sensor obtains a binary measurement ik using τ0,k−1 .
• The sensor sends ik to the fusion center and updates
the threshold (1.49):
τ0,k = τ0,k−1 + γik .
• The fusion center stores ik and also evaluates and stores
τ0,k .
With i1,N and τ0,1:N , the fusion center evaluates the MLE
(1.52):
X̂M L,q = argmax
x
N
Y
k=1
[1 − F (τ0,k−1 − x)]
1+ik
2
F (τ0,k−1 − x)
1−ik
2
.
2) Performance (asymptotic and approximate)
X̂M L,q is asymptotically unbiased
i
h
= x
E X̂M L,q
N →∞
and its asymptotic MSE or variance can be approximated
by
1
,
∼ CRBq ≈
Var X̂M L,q
⊤
N →∞
N Ĩ′ q p̃∞
with Ĩ′ q given by (1.61) and p̃∞ by (1.60).
An alternative to have analytical results on the vector p∞ without using an approximation
with truncation can be obtained by considering that x lies in a symmetric interval [−A, A],
where A is a positive real. We can create boundaries on the possible values of the threshold in
such a way that the number of possible thresholds is finite. In this way, the threshold sequence
can be modeled as a Markov chain defined in a domain with a finite number of values and we
can evaluate the asymptotic threshold distribution without using truncation approximations
(More? - App. A.2.4).
66
Chapter 1. Estimation of a constant parameter
1.5.3
Adaptive scheme based on the MLE
One of the disadvantages of (a2.1) is that the threshold will fluctuate around x and it will not
converge to x, producing a performance that is still not optimal. A remedy for this problem
was proposed also in [Fang 2008] (and previously in [Papadopoulos 2001]). By accepting a
feedback from the fusion center and assuming that the fusion center has enough processing
power to evaluate the MLE for the past measurements at each time, instead of using the
cumulative sum for updating the threshold, we can use the last MLE estimate. Intuitively,
with a growing number of measurements for the MLE, the threshold will be placed closer and
closer to x, producing as a result an MLE with performance approaching the optimal one (for
τ0,k−1 = x).
The new update is given by
τ0,k = X̂M L,k ,
(1.63)
where X̂M L,k is the MLE for the measurements i1:k . The asymptotic performance analysis was
also presented in [Fang 2008], the authors claim that in the binary quantization and Gaussian
2
case, the performance (variance) is asymptotically given by πδ
4N . Therefore, this update scheme
4N
is asymptotically optimal as Iq (0) = πδ
2 is the maximum FI that can be achieved.
We will mimic some parts of their proof, but we will change some arguments to obtain a
more general result for NI ≥ 2.
1.5.4
Performance for the adaptive multibit scheme based on the MLE
Under an adaptive τ0 with the vector τ ′ fixed, we can rewrite the FI given in (1.53) for a
general NI using a parametrization on the error εk = τ0,k−1 − x, which now depends on time
and it is given as follows (Why? - App. A.1.5)
NI
Iq,1:N
=
N
X
E [Iq (εk )] ,
(1.64)
k=1
where εk is a sequence of r.v. defined on R, contrary to the previous case when the thresholds
were defined in a grid. The function Iq (εk ) is given by (1.13).
For proceeding, we will make additional assumptions on Iq (ε) (the assumptions on the
noise AN1 p. 34 and AN2 p. 34 are also assumed).
1.5. Adaptive quantizers: the high complexity fusion center approach
67
Assumptions on Iq for the MLE update to have asymptotically optimal performance:
A1.MLE Iq (ε) is maximum for ε = 0.
A2.MLE Iq (ε) is locally decreasing around zero.
A3.MLE The function Iq (ε) has bounded Iq (0),
dIq (ε) dε ε=0
= 0, bounded
fore accepting a Taylor approximation around zero (for small ε′ ):
ε′2 d2 Iq (ε) ′2
+
◦
ε
,
Iq ε′ = Iq (0) +
2
dε2 ε=0
d2 Iq (ε) ,
dε2 ε=0
there-
(1.65)
◦(ε′2 )
where the ◦ ε2 here is equivalent to say that the quantity ε′2 tends to zero when ε′
tends to zero.
If we look to Fig. 1.8 (p. 57), we can see that these assumptions seem to be satisfied
by Gaussian and Cauchy distributions. Except for the Laplacian-like distributions with a
derivative discontinuity at ε = 0, a large class of smooth symmetric unimodal distributions
satisfy these assumptions for NI > 2 and well chosen quantizer intervals. Note that in the
binary cases, where the threshold must be placed asymmetrically, we can add a fixed bias in
the MLE threshold update to obtain a better performance. Also for the asymmetric cases, all
the assumptions can be stated around the maximum point for the FI instead of ε = 0.
NI
The objective now will be to bound above and below the quantity Iq,1:N
in such a way,
NI
that when we make N → ∞ both bounds will "squeeze" Iq,1:N
on an interval that goes
asymptotically to N Iq (0).
For a large number of measurements M < N , the MLE studied here is consistent even if
the measurements are dependent, for verifying this, one can check the regularity conditions
given in [Crowder 1976]. Thus, for ε′ > 0 and ξ > 0, it is possible to choose a number of
measurements M such
P |εk | ≤ ε′ ≥ 1 − ξ, for k ≥ M.
(1.66)
Applying this inequality with the monotonicity property of A2.MLE, we can say that we can
find a M such
(1.67)
P Iq (εk ) ≥ Iq ε′ ≥ 1 − ξ, for k ≥ M.
Now the sum in (1.64) can be separated in two sums, one for the terms with k < M , Iq,1:M −1
and the other with k ≥ M , Iq,M :N :
NI
Iq,1:N
= Iq,1:M −1 + Iq,M :N =
(M −1
X
k=1
E [Iq (εk )]
)
+
(
N
X
k=M
)
E [Iq (εk )] .
(1.68)
Using A1.MLE and the fact that Iq (εk ) is a nonnegative quantity, we know that Iq (εk ) ∈
[0, Iq (0)]. Thus, the first term can be written as:
Iq,1:M −1 = αM (M − 1) Iq (0) ,
(1.69)
68
Chapter 1. Estimation of a constant parameter
with αM ∈ [0, 1]. The terms on Iq,M :N can be lower bounded using the Markov’s inequality.
The Markov’s inequality states that for a nonnegative r.v. Y and value y > 0, we must have
[Wasserman 2003, p. 63]:
E (Y )
P (Y > y) ≤
.
y
Using this inequality for an arbitrary term of Iq,M :N with the value Iq (ε′ ) gives
E [Iq (εk )] ≥ Iq ε′ P Iq (εk ) ≥ Iq ε′ , for k ≥ M.
(1.70)
Then, supposing that the thresholds are updated using the MLE, we can use (1.67) in (1.70)
to have
(1.71)
E [Iq (εk )] ≥ Iq ε′ (1 − ξ) , for k ≥ M.
NI
For sufficiently large M (and consequently N ), Iq,1:N
can be lower bounded using (1.71) and
(1.69)
NI
(1.72)
Iq,1:N
≥ αM (M − 1) Iq (0) + [N − (M − 1)] Iq ε′ (1 − ξ) .
From A1.MLE the FI can be upper bounded by the optimal Iq
NI
Iq,1:N
≤ N Iq (0) .
(1.73)
Joining (1.72) and (1.73) gives the following:
NI
αM (M − 1) Iq (0) + [N − (M − 1)] Iq ε′ (1 − ξ) ≤ Iq,1:N
≤ N Iq (0) .
(1.74)
For small ε′ , we can use A3.MLE to obtain
ε′2 d2 Iq (ε) NI
′2
(1 − ξ) ≤ Iq,1:N
≤ N Iq (0) .
+
◦
ε
αM (M − 1) Iq (0) + [N − (M − 1)] Iq (0) +
2
dε2 ε=0
(1.75)
The term on the left of the inequality can be rewritten as
′2 2
(M − 1) [αM − (1 − ξ)]
ε d Iq (ε) ′2
N Iq (0) (1 − ξ) +
+
◦
ε
+ [N − (M − 1)]
(1 − ξ) .
N
2
dε2 ε=0
Separating a factor N Iq (0) we can write the term above as N Iq (0) (1 − ξ ′ ) with
′
ξ =
(M − 1) [αM − (1 − ξ)]
−ξ +
N
(M − 1)
+ 1−
N
Therefore, the inequality (1.74) becomes
(1 − ξ)
ε′2 d2 Iq (ε) ′2
+◦ ε
.
2
dε2 ε=0
Iq (0)
(1.76)
NI
N Iq (0) 1 − ξ ′ ≤ Iq,1:N
≤ N Iq (0) .
By imposing N ≫ M (N much larger than M ) so that (MN−1) is arbitrary small and by
choosing M sufficiently large so that ξ is small, we can make the first term in ξ ′ to approach
zero. Using also N ≫ M and choosing now M sufficiently large so that ε′ is arbitrary small,
we can make the second term in ξ ′ to approach zero. Therefore, we can make the left side of
1.5. Adaptive quantizers: the high complexity fusion center approach
69
the inequality above to be close to N Iq (0) when N and M tend to infinity with the condition
NI
N ≫ M . As the upper bound on Iq,1:N
is also N Iq (0), we have that
NI
Iq,1:N
∼
N →∞
or equivalently
CRBq
∼
1
N →∞
(1.77)
N Iq (0) ,
N Iq (0)
(1.78)
.
We have now the following solution for problem (a)10 :
Solution to (a) - MLE for quantized measurements with NI ≥ 2 and
adaptive thresholds given by the MLE.
(a2.2) 1) Estimator
Define an initial threshold τ0,0 , then from k = 1 to N :
• the sensor obtains a binary measurement ik using τ0,k−1 .
• The sensor sends ik to the fusion center.
• The fusion center stores ik , evaluates and stores
X̂M L,k = τ0,k following (1.63)
τ0,k = X̂M L,k ,
where the estimate X̂M L,k is given by
X̂M L,k = argmax
x
k
Y
P (ij ; x, τ0,j−1 ) .
j=1
• The fusion center sends τ0,k = X̂M L,k to the sensor.
2) Performance (asymptotic)
X̂M L,q = X̂M L,k=N is asymptotically unbiased
h
i
E X̂M L,q
= x
N →∞
and its asymptotic MSE or variance attains the optimal
value
1
Var X̂M L,q
.
∼
N →∞ N Iq (0)
Now, we have an estimator with adaptive thresholds (mainly the central threshold) that
attains the asymptotically optimal performance. The estimator guides the quantizer dynamic
10
The threshold τ0,k−1 is added in the notation of the probabilities to make the dependence on time more
explicit.
70
Chapter 1. Estimation of a constant parameter
range close to the parameter by setting the central point of the quantizer with a decreasing
fluctuation around τ .
1.5.5
Equivalent low complexity asymptotic scheme
The main disadvantage of (a2.2) is its high complexity, since the MLE must be obtained at each
iteration. In [Papadopoulos 2001] a heuristic based on an approximation of the expectation
maximization method for applying the MLE update with reduced complexity on the binary
quantization and Gaussian noise case was presented. The proposed threshold/estimate update
is given by the following recursive expression:
√
δ π
ik .
(1.79)
X̂k = τ0,k = X̂k−1 +
2k
Observe that the difference in complexity is large. In general, the MLE must be obtained
with a maximization algorithm, e.g. Newton’s algorithm, which itself has an inner recursive
procedure that may need multiple iterations for reaching convergence for each time k. In
(1.79), we have only a recursive procedure in k, which requires a multiplication of ik by a gain
and summation with the last estimate.
We can show that (1.79) can be generalized easily to non Gaussian noise cases. We will
use a less heuristic method (less than the method used to obtain (1.79)). We will assume,
additionally to symmetry, that the noise PDF has f (1) (0) = 0. If we consider that k is
large, then from the convergence of the CRB discussed above and the asymptotic normality
of the MLE [Kay 1993, p. 167] (or [Crowder 1976]), the error between the threshold used to
obtain ik , ε = X̂M L,k−1 − x = τ0,k − x, is Gaussian distributed with zero mean and variance
1
11
(k−1)Iq (0) :
r
(k − 1) Iq (0)
(k − 1) Iq (0) 2
fε (ε) =
(1.80)
exp −
ε ,
2π
2
where fε is the PDF of the error. We can try to estimate the random error using the new
quantized observation ik and the knowledge about its distribution given by the PDF above.
After estimating it, we can correct X̂M L,k−1 using the estimate. As ε is random, we will use an
estimator equivalent to the MLE, but for random parameters. In this case the maximum a
posteriori estimator (MAP) will be used. The posterior distribution (the one that might
be maximized) is the conditional PDF of ε given ik . Using Bayes theorem, it is given by
p (ε|ik ) =
P (ik |ε) fε (ε)
,
P (ik )
(1.81)
where in the binary case the conditional probability P (ik |ε) is given by
P (ik |ε) = [1 − F (ε)]
1+ik
2
F (ε)
1−ik
2
.
(1.82)
The denominator is the marginal probability of the output ik and it does no depend on ε.
The MAP is then given by [Kay 1993, p. 350]
ε̂M AP = argmax p (ε|ik ) .
(1.83)
ε
11
Observe that here we are using the parametrization of the Gaussian distribution with its variance and not
with its scale parameter
1.5. Adaptive quantizers: the high complexity fusion center approach
71
In the same way as for the MLE, we can maximize the logarithm of the posterior, as P (ik )
does not depend on ε, we can write an equivalent form for (1.83) as
ε̂M AP
= argmax log p (ε|ik )
ε
= argmax {log [P (ik |ε)] + log [fε (ε)]} .
(1.84)
ε
Using (1.81) and (1.80) in the RHS (1.84), we obtain
(k − 1) Iq (0) 2
1 + ik
1 − ik
ε .
log p (ε|ik ) =
log [1 − F (ε)] +
log [F (ε)] −
2
2
2
(1.85)
Under consistency of the MLE, it is expected that for large k, the probability of |ε| being
small is close to 1. Thus, we can look for a maximum point of (1.85) around zero. This leads
us to expand log [1 − F (ε)] and log [F (ε)] around zero. The expansions are given by
ε2 d2 log [1 − F (z)] d log [1 − F (z)] +
log [1 − F (ε)] = log [1 − F (0)] + ε
+ ◦ ε2 ,
2
dz
2
dz
z=0
z=0
2
2
ε d log [F (z)] d log [F (z)] + ◦ ε2 .
+
log [F (ε)] = log [F (0)] + ε
2
dz
2
dz
z=0
z=0
Using the symmetry of the distribution (1 − F (0) = F (0) = 12 ) the terms with logarithms are
log 21 = − log (2). The derivatives at the zero point are
f (0)
d log [1 − F (z)] = −
= −2f (0) ,
dz
1 − F (0)
z=0
d log [F (z)] f (0)
=
= 2f (0)
dz
F (0)
z=0
and using the assumption f (1) (0) = 0, the second derivatives are
−f (1) (0)
f 2 (0)
d2 log [1 − F (z)] 2
=
−
2 = −4f (0) ,
dz 2
1
−
F
(0)
[1 − F (0)]
z=0
2
2
(1)
f (0)
f (0)
d log [F (z)] − 2
= −4f 2 (0) .
=
2
dz
F (0)
F (0)
z=0
Applying these expressions to the expansions above, we get
ε2 2
f (0) + ◦ ε2 ,
2
ε2
log [F (ε)] = − log (2) + 2εf (0) − 4 f 2 (0) + ◦ ε2 .
2
log [1 − F (ε)] = − log (2) − 2εf (0) − 4
These expansions can be used in (1.85), this gives the following:
ε2 2
1 + ik
2
+
log p (ε|ik ) =
− log (2) − 2εf (0) − 4 f (0) + ◦ ε
2
2
ε2
1 − ik
− log (2) + 2εf (0) − 4 f 2 (0) + ◦ ε2 +
+
2
2
(k − 1) Iq (0) 2
−
ε .
2
(1.86)
72
Chapter 1. Estimation of a constant parameter
To find the maximum, we differentiate log p (ε|ik ) in (1.86) w.r.t. ε and we equate it to zero.
This gives
1 + ik (k − 1) Iq (0) ε =
−2f (0) − 4εf 2 (0) + ◦ (ε) +
2
1 − ik 2f (0) − 4εf 2 (0) + ◦ (ε)
+
2
= −2f (0) ik − 4εf 2 (0) + ◦ (ε) .
For binary measurements, we know that Iq (0) = 4f 2 (0). Thus adding ε4f 2 (0) on both sides
gives
k4f 2 (0) ε = −2f (0) ik + ◦ (ε) .
Thus, we have
ε̂M AP
∼ −
k→∞
ik
.
2f (0) k
The optimal new threshold/estimate when k → ∞ is then given by
X̂k = X̂k−1 − ε̂M AP ≈ X̂k−1 +
ik
.
2kf (0)
(1.87)
This is exactly the same recursive estimator obtained by [Papadopoulos 2001] when the
noise is Gaussian (f (0) = √1πδ for the Gaussian distribution). Note that this recursive update/estimation procedure is asymptotically equivalent to the MLE update, as both procedures
(MLE and MAP) have equivalent error distribution for k → ∞ [Wasserman 2003, p. 181].
Clearly, some questions arise about the low complexity recursive estimator above:
• can (1.87) converge if we use it when the initial distance |ε| = |τ0 − x| is arbitrary (not
necessarily small)?
• Can we extend this low complexity recursive procedure to the NI > 2 case?
Answers for these questions will be given in Ch. 3.
1.6. Chapter summary and directions
1.6
73
Chapter summary and directions
We conclude this chapter with the main points observed until now and directions for future
work.
• Estimation performance in terms of MSE can be minimized asymptotically under an
unbiasedness constraint by the MLE (a1). The asymptotic performance is then mainly
characterized by the CRB which is given in terms of the FI.
• The FI for quantized measurements is upper bounded by the FI for continuous measurements and lower bounded by the FI for binary quantization. Moreover, it increases as
additional quantization intervals are used.
• The CRB and FI are very sensitive to the central threshold of the quantizer.
– For commonly used noise models (Gaussian, Laplacian and Cauchy), the threshold
must be placed exactly at the parameter.
– In the binary quantization case, even if we restrict the noise distribution to be
symmetric and unimodal this is not always true. We can find cases (GGD) where
quantizing the input r.v. asymmetrically can be optimal. In these cases, it was also
observed that the gain of performance obtained by using an asymmetric quantizer
seems to be dependent on the noise distribution, however, in general, the gain
from using the optimal asymmetric quantizer in the place of a symmetric quantizer
seems to be small when compared with the gain that can be obtained by using a
symmetric quantizer in the place of a poorly chosen asymmetric quantizer.
– An interesting subject for future research is to study in more detail the effect
of the noise distribution on the shape of the performance function B (ε) in the
asymmetric cases, for example, we can try to characterize the loss incurred by
imposing symmetric quantization w.r.t. optimal quantization. Another possible
point for future research is to see if such asymmetric behavior also appears in the
problem of detection using binary quantized measurements.
– In all cases, under symmetry assumptions on the noise and on the quantizer, estimation performance degrades when the quantizer dynamic (the quantizer threshold
in the binary case) is very distant to the true parameter.
– For multibit quantization, also under symmetry assumptions, it seems that if we
choose the quantizer thresholds (or equivalently the quantizer intervals) well, then
for a large class of unimodal distributions it is optimal to place the central threshold at the true parameter. Note that, quantizing "well" in this case means that
we choose the quantization intervals to have a good symmetric quantization performance. An interesting point for future analysis is to see if we can get a better
performance than in the symmetric case, when we optimize the quantizer intervals
for an asymmetric quantizer (one that is not centered at x). A partial answer for
this will be given in Part II, where we will see that when the number of quantization
intervals tends to infinity, the optimal quantizer is symmetric for symmetric noise
distributions.
74
Chapter 1. Estimation of a constant parameter
• Selection of optimal quantization intervals, or equivalently, optimal non central thresholds, was observed to be a difficult problem for nonuniform quantization. The asymptotic
design of the optimal quantizer that approaches the optimal finite solution will also be
studied in Part II.
• The MLE for binary quantized measurements and a fixed threshold can be obtained in
closed form (a1.1). While in the general case it might be obtained numerically. When
the noise distribution is log-concave, the Newton’s algorithm can be used as an efficient
numerical solution (a1.2).
• As the performance degrades when the quantizer range is far from the parameter, the
quantizer central threshold must be placed adaptively around the parameter. A simple
solution in the binary case is to move the threshold up or down with a constant step.
Then, asymptotically, the threshold will settle its mean close to the parameter and it
will fluctuates around it. The measurements obtained in this case can be used to have a
MLE with asymptotic performance less sensitive to uncertainty on the true parameter
value (a2.1).
• By accepting an increased complexity, the central threshold (both in binary and non
binary cases) can be set closer and closer to the true parameter by updating it at each
time with the MLE based on all the past measurements (a2.2). This scheme asymptotically attains a performance equal to the performance obtained when the threshold is
placed at the parameter, which is equivalent to say that this scheme is asymptotically
optimal for commonly used noise models.
• When the time goes to infinity the threshold update based on MLE is equivalent to a
simple recursive update with decreasing correction gain (1.87). Low complexity recursive
schemes of this type and their performance will be studied in detail in Ch. 3.
Chapter 2
Estimation of a varying parameter:
what is done and a little more
In this chapter we study the estimation of a varying parameter based on quantized measurements. First, we will present the parameter evolution model and the measurement model.
Then, we will present the optimal estimator in the MSE sense and its performance. Due to
the difficulties that arise when we want to have analytical expressions for the optimal estimator and its performance, we will obtain the optimal estimator using a numerical method. We
present and discuss a numerical solution known as particle filtering, which is a method based
on Monte Carlo simulation. We give then a bound on its performance using the Bayesian
Cramér–Rao bound. After the analysis of the bound, we present a particle filtering scheme
based on the quantized prediction error, which is commonly known as quantized innovation.
At the end of the chapter, we show that the optimal estimator has, asymptotically, a simple
recursive form for a slowly varying parameter. After obtaining the performance for the asymptotically optimal estimator and comparing it to the lower bound on the MSE, we conclude
the chapter with a summary and directions for work to be presented in other chapters or to
be presented in the future.
Contributions presented in this chapter:
• Motivation to use the quantized innovation. By analyzing a simple signal model, we can
obtain a detailed characterization of the bound on the mean squared error for estimation
based on quantized measurements. From the bound, we can see clearly that a good
estimation scheme can be obtained by quantizing the innovation. This differs from
[Ribeiro 2006c] and [You 2008], where the motivation for using the quantized innovation
does not come from any quantitative analysis and relies only on intuition.
• Asymptotically optimal estimator for a slowly varying parameter. We show that the
asymptotically optimal estimator for slowly varying Wiener process parameter can be
approximated by a low complexity recursive estimator. We also verify its optimality by
comparing it to a lower bound on the mean squared error. The Wiener process model
that we consider is a special case of the model in [Ribeiro 2006c], but we do not consider
that the noise is Gaussian and we do not impose the quantization to be binary.
75
76
Chapter 2. Estimation of a varying parameter
Contents
2.1
Parameter and measurement model . . . . . . . . . . . . . . . . . . . .
77
2.1.1
Parameter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.1.2
Measurement model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.2
Optimal estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
78
2.3
Particle Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
2.4
2.5
2.6
2.3.1
Monte Carlo integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
2.3.2
Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.3.3
Sequential importance sampling . . . . . . . . . . . . . . . . . . . . . . . . 83
2.3.4
Sequential importance resampling . . . . . . . . . . . . . . . . . . . . . . 85
Evaluation of the estimation performance . . . . . . . . . . . . . . . . .
87
2.4.1
Online empirical evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2.4.2
BCRB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Quantized innovations . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
90
2.5.1
Prediction and innovation . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.5.2
Bound for the quantized innovations . . . . . . . . . . . . . . . . . . . . . 93
2.5.3
Gaussian assumption and asymptotic estimation of a slow parameter . . . 94
Chapter summary and directions . . . . . . . . . . . . . . . . . . . . . . 102
2.1. Parameter and measurement model
2.1
2.1.1
77
Parameter and measurement model
Parameter model
The parameter to be estimated now is a stochastic process X defined on the probability space
P = (Ω, F, P) with values on (R, B (R)). At each instant k ∈ N⋆ , the corresponding scalar r.v.
Xk will be given by the Wiener process model:
Xk = Xk−1 + Wk ,
k > 0,
(2.1)
where Wk is the k-th element of a sequence of independent Gaussian r.v.. Its mean is given by
2 . If u = 0 then X forms a standard discrete-time
uk and its variance is a known constant σw
k
k
Wiener process, otherwise, it is a Wiener process with drift. The initial distribution of Xk
is supposed to be Gaussian with known mean x′0 and known variance σ02 . The PDF of X0 ,
denoted, p (x0 ) is also known as the initial prior of the stochastic process. For estimation
purposes, the initial mean represents a guess on the value of X0 and the initial variance
represents the degree of uncertainty on this guess.
From (2.1), we can see that conditioned on Xk−1 , Xk is independent from the past X0:k−2 .
Therefore, this process is a homogeneous Markov process. Until instant k, it can be characterized by its joint PDF p (x0:k ), which factorizes as follows
p (x0:k ) = p (x0 )
k
Y
j=1
p (xj |xj−1 ) ,
(2.2)
where p (xj |xj−1 ) is the conditional PDF of Xj given Xj−1 . This conditional PDF can be
written using the Gaussian assumption on Wk as
"
#
1 xj − xj−1 − uj 2
1
exp −
.
(2.3)
p (xj |xj−1 ) = √
2
σw
2πσw
Therefore, from the knowledge of p (x0 ), uk and σw , we can describe probabilistically the
process X until any arbitrary instant k using (2.2) and (2.3).
2.1.2
Measurement model
Continuous measurement
The process X is measured with noise
(2.4)
Y k = X k + Vk .
The same assumptions on Vk as for constant x, AN1 and AN2, are considered in this case.
Quantizer
For tracking the varying parameter, the quantizer will be assumed to be dynamic with varying
threshold set τ k :
h
i⊤
τ k = τ− NI ,k · · · τ−1,k τ0,k τ1,k · · · τ NI ,k .
2
2
78
Chapter 2. Estimation of a varying parameter
The assumptions on the labeling of the outputs and symmetry, AQ1 and AQ2, are still considered to be valid. The quantized measurements are then given as the output of the quantization
function Q () defined in (1.2)
ik = Q (Yk ) ,
where as in the adaptive case, the function Q can change in time.
2.2
Optimal estimator
As it was stated at the beginning of this chapter, we are interested in solving problem (b) (p.
29). That is to estimate Xk based on the past and present quantized measurements i1:k . In
what follows, we consider that τ 1:k is a fixed sequence. As in the constant case we want the
estimator, or filter in this case, to have minimum MSE (MMSE). We want for all k an
estimator
X̂ (i1:k )
minimizing the MSE
MSEk = E
X̂k − Xk
2 .
(2.5)
As the parameter itself is random, the expectation is evaluated w.r.t. the joint distribution of
the measurements i1:k and the parameter Xk .
Differently from the deterministic case, when the parameter is random, the general form
of the MMSE estimator can be obtained directly from the minimization of MSEk . It can be
shown that its general form is [Jazwinski 1970, p. 149]
X̂k = EXk |i1:k (Xk ) ,
(2.6)
where the subscript Xk |y1:k means that the expectation is evaluated w.r.t. the probability
measure of Xk given a realization of i1:k . The MMSE estimator is then the posterior mean, i.e.
the conditional mean of the parameter Xk given a specific realization sequence of quantized
measurements i1:k 1 .
The MMSE estimator is unbiased since
h i
E X̂k = Ei1:k EXk |i1:k (Xk ) = EXk ,i1:k (Xk ) = E (Xk ) ,
where the first equality comes from the decomposition of the expectation on the joint variables
and the second equality comes from marginalization of the i1:k .
1
Similarly, we obtain that the MSE is the mean of the posterior variance
n
n
2 oo
= Ei1:k VarXk |i1:k (Xk ) .
MSEk = Ei1:k EXk |i1:k Xk − EXk |i1:k (Xk )
(2.7)
Note that this estimator is different from the MAP, which is the maximum value xk that maximizes the
posterior p (xk |i1:k ). It can be shown that the MAP is the optimal estimator under the mean absolute error
[Van Trees 1968, pp. 56–57].
2.2. Optimal estimator
79
Note that for a given realization i1:k , VarXk |i1:k (Xk ) is the conditional MSE and it can be
used when online assessment of the MSE is needed. Online here means that the performance
is not averaged on the distribution of the measurements, but evaluated for a given realization.
All the information is contained in the posterior distribution. Its mean is the optimal
estimator and its averaged variance is the MMSE. Assuming that the posterior distribution
accepts a PDF p (xk |i1:k ), the MMSE estimator and its MSE are given respectively by
Z
X̂k = EXk |i1:k (Xk ) = xk p (xk |i1:k ) dxk ,
(2.8)
R
MSEk = Ei1:k VarXk |i1:k (Xk )



X Z
2
=
xk − EXk |i1:k (Xk ) p (xk |i1:k ) dxk P (i1:k ) .


⊗k
i1:k ∈I
(2.9)
R
where I ⊗k is the joint set where the quantized measurements are defined. To simplify the
evaluation of the quantities above, a recursive form for p (xk |i1:k ), and as a byproduct for
P (i1:k ), can be obtained by using the Markovian property of the dynamical model for the
process X. The main idea is to write the PDF for prediction p (xk |i1:k−1 ) as a function of
p (xk−1 |i1:k−1 ) using the dynamical model information p (xk |xk−1 ) and then pass from the
prediction PDF to the posterior p (xk |i1:k ) using the information given by the measurement
P (ik |xk ). These two expressions, one for prediction using the model and the other for update
using the measurement are given respectively by (Why? - App. A.1.6):
Z
p (xk |i1:k−1 ) =
p (xk |xk−1 ) p (xk−1 |i1:k−1 ) dxk−1 ,
(2.10)
R
p (xk |i1:k ) =
R
R
P (ik |xk ) p (xk |i1:k−1 )
.
P ik |x′k p x′k |i1:k−1 dx′k
(2.11)
The denominator in the RHS of (2.11) is equal to P (ik |i1:k−1 ) (Why? - App. A.1.6), thus this
integral can be reused for writing P (i1:k ) in recursive form for k > 1

P (i1:k ) = P (ik |i1:k−1 ) P (i1:k−1 ) = 
Z
R

P (ik |xk ) p (xk |i1:k−1 ) dxk  P (i1:k−1 ) ,
(2.12)
for k = 1 this probability is
P (i1 ) =
Z
P (i1 |x0 ) p (x0 ) dx0 .
R
In these expressions the prior p (x0 ), as stated above, is a Gaussian function
"
#
1 x0 − x′0 2
1
exp −
,
p (x0 ) = √
2
σ0
2πσ0
(2.13)
80
Chapter 2. Estimation of a varying parameter
the conditional PDF p (xk |xk−1 ) is given by (2.3) and the probability P (ik |xk ) is given by
(1.6) with the dynamical threshold set τ k instead of only one fixed set:
P (ik |xk ) =
(
F (τik ,k − xk ) − F (τik −1,k − xk ) , if ik > 0,
F (τik +1,k − xk ) − F (τik ,k − xk ) , if ik < 0.
(2.14)
The general solution to (b) (p. 29) given by the optimal filter is the following:
Solution to (b) - MMSE estimator for a fixed threshold set sequence τ 1:k
(b1) 1) Estimator
For each time k, the estimator is given by
Z
X̂k = EXk |i1:k (Xk ) = xk p (xk |i1:k ) dxk ,
R
where the posterior PDF p (xk |i1:k ) can be evaluated recursively using (2.10) and (2.11).
2) Performance (exact)
X̂k is unbiased
h i
E X̂k = E [Xk ]
and its MSE for each time k is
MSEk = Ei1:k VarXk |i1:k (Xk )



X Z
2
=
xk − EXk |i1:k (Xk ) p (xk |i1:k ) dxk P (i1:k )


⊗k
i1:k ∈I
R
where now not only (2.10) and (2.11) are used, but also
(2.12) and (2.13) to obtain P (i1:k ).
Some attention must be given to the fact that the MMSE estimator given above and the
recursive form for the evaluation of the posterior PDF are quite general and can be applied
in many other nonlinear filtering problems.
A major drawback with (b1) is that evaluating the integrals in the prediction/update
expressions and in the expectation is analytically intractable. Therefore, we must look for a
numerical method for solving it approximately. This will be done next.
2.3. Particle Filtering
2.3
81
Particle Filtering
To obtain the posterior mean (2.8), we must evaluate the integral
R
R
xk p (xk |i1:k ) dxk . A
general solution is to evaluate it numerically, for example, using a Monte Carlo integration
method.
2.3.1
Monte Carlo integration
The Monte Carlo integration method consists in approximating the expectation of a function
g (X)
Z
E [g (X)] = g (x) p (x) dx,
R
where p (x) is the PDF of X, by the sample mean calculated using multiple i.i.d. samples X (j)
from the distribution of X [Robert 1999, p. 83]
E [g (X)] ≈ ḡNS =
NS 1 X
g x(j) ,
NS
j=1
with NS the number of samples and x(j) the j-th i.i.d. sample realization.
The approximation is clearly unbiased


NS NS
h i
X
1
1 X
E
g X (j)  =
E g X (j) = E [g (X)] .
NS
NS
j=1
j=1
By the strong law of large numbers, it converges with probability one to the true expectation
E [g (X)] [Robert 1999, p. 83]
P
lim ḡNS = E [g (X)] = 1.
NS →+∞
Moreover, by using a central limit theorem, the asymptotic normalized approximation error
εḡ tends to a zero mean Gaussian distribution with variance given by
Var (εḡ ) =
1
Var [g (X)] .
NS
Thus, if g (X) has finite variance, the variance of the approximation reduces by increasing the
number of samples.
In our case, we want to approximate the posterior mean
NS
1 X
(j)
X̂k ≈
Xk ,
NS
j=1
(j)
with Xk
i.i.d. samples from the posterior distribution.
(2.15)
82
Chapter 2. Estimation of a varying parameter
Observe that we can also rewrite the posterior mean in an equivalent way using the joint
posterior PDF p (x1:k |i1:k ):
Z
X̂k = xk p (x1:k |i1:k ) dx1:k .
(2.16)
R
(j)
In this case we will sample independent trajectories X1:k from p (x1:k |i1:k ) and the posterior
mean is also given by (2.15).
The main problem here is that the posterior distribution and the joint posterior distribution
are usually difficult to sample directly. Therefore, to solve this problem we will use a method
called importance sampling.
2.3.2
Importance sampling
Retaining the second form of the posterior mean (2.16), the main idea of importance sampling
[Robert 1999, p. 92] is to multiply and divide the integrand in the expectation by a PDF
q (x1:k |i1:k )2 from which we know how to sample the trajectories X1:k . This gives
Z
p (x1:k |i1:k )
X̂k = xk
q (x1:k |i1:k ) dx1:k .
q (x1:k |i1:k )
R
Note that the support of the PDF q (x1:k |i1:k ) might be strictly larger than the support of the
posterior. Denoting the ratio between PDF as an importance weight w (x1:k )
w (x1:k ) =
p (x1:k |i1:k )
,
q (x1:k |i1:k )
(2.17)
the expectation can be approximated by
X̂k =
Z
xk w (x1:k ) q (x1:k |i1:k ) dx1:k
R
NS
1 X
(j)
(j)
≈
Xk w X1:k ,
NS
(2.18)
j=1
(j)
where X1:k are i.i.d. trajectories from q (x1:k |i1:k ). We can divide the expectation by the
integral of the posterior as its value is equal to one, this gives
N
PS (j) (j) R
Xk w X1:k
xk w (x1:k ) q (x1:k |i1:k ) dxk
j=1
R
≈ N
X̂k = R
.
w (x1:k ) q (x1:k |i1:k ) dxk
PS (j) w X1:k
R
j=1
(j)
Defining the normalized weights w̃ X1:k
(j)
w̃ X1:k
as
(j)
w X1:k
,
= N
PS (j) w X1:k
(2.19)
j=1
2
Note that q (x1:k |i1:k ) can depend on the measurements, after we will choose a simplified form which does
not depend on the measurements.
2.3. Particle Filtering
83
we have that the posterior mean can be approximated by
X̂k ≈
NS
X
j=1
(j)
(j)
Xk w̃ X1:k .
By comparing the approximation in (2.20) with the integral
(2.20)
R
R
xk p (xk |i1:k ) dxk , we realize
that this method is equivalent to approximate the posterior by a discrete distribution with
support values chosen randomly and with probabilities given by the normalized weights
p (xk |i1:k ) ≈
NS
X
j=1
(j)
(j)
w̃ x1:k δD xk − xk ,
(2.21)
where δD () is a Dirac distribution.
2.3.3
Sequential importance sampling
The remaining problems now are the choice of a PDF q (x1:k |i1:k ) easy to sample and the
evaluation of the weights.
(j)
(j)
To be able to sample the trajectory X1:k without modifying the past trajectory X1:k−1 (so
that we do not need to resample the past trajectory), we must choose a distribution q (x1:k |i1:k )
for which the marginal distribution for k − 1 is exactly q (x1:k−1 |i1:k−1 ). This can be done
using the following form for q (x1:k |i1:k ) [Doucet 1998]:
q (x1:k |i1:k ) = q (x1:k−1 |i1:k−1 ) q (xk |x1:k−1 , i1:k ) .
(j)
(2.22)
(j)
In this case to extend a sample trajectory from realization x1:k−1 to x1:k , we sample
(j)
(j)
q xk |x1:k−1 , i1:k to generate the new point of the trajectory xk .
To evaluate the weights, we develop p (x1:k |i1:k ) using conditioning and the independence
assumptions on the model:
• Xk is independent of X1:k−2 and i1:k−1 conditioned on Xk−1 ,
• ik is independent of X1:k−1 and i1:k−1 conditioned on Xk .
This gives
p (x1:k |i1:k ) =
P (ik |xk ) p (xk |xk−1 )
p (x1:k−1 |i1:k−1 ) .
P (ik |i1:k−1 )
(2.23)
Replacing the simplified form of q (x1:k |i1:k ) (2.22) and the joint posterior above (omitting
P (ik |i1:k−1 ), which is constant in x1:k ) in the expression (2.17), we have the following weight
for the trajectory j:
(j)
(j) (j)
P ik |x(j)
p
x
|i
p
x
|x
1:k−1 1:k−1
k
k
k−1
(j)
,
w x1:k ∝
(2.24)
(j)
(j)
q x1:k−1 |i1:k−1
q x1:k−1 |i1:k−1
84
Chapter 2. Estimation of a varying parameter
where ∝ is the symbol for proportional. The fact that the weights are defined up to a proportional factor is not important because for approximating
the posterior mean we use the
(j)
normalized weights. Note that the factor
p x1:k−1 |i1:k−1
(j)
q x1:k−1 |i1:k−1
is the weight for the samples at time
k − 1. Thus, we can write a recursive expression that relates the normalized weights for time
k − 1 with the weights for time k
(j) (j)
P ik |x(j)
p
x
|x
k
k
k−1
(j)
w̃ x(j)
(2.25)
w x1:k ∝
1:k−1 .
(j) (j)
q xk |x0:k−1 , i1:k
We need now to define the PDF q (xk |x0:k−1 , i1:k ) which is used to generate the samples.
The two most commonly used choices are the following:
• choice 1: p (xk |xk−1 , ik ), minimum weight variance distribution. The quality of the
approximation of the posterior by the discrete distribution (2.21) is dependent on the
variance of the weights and the variance depends on the PDF q (xk |x0:k−1 , i1:k ). It
(j)
can be shown that conditioned on the past trajectory x1:k−1 realization and on the
measurements realization i1:k , the variance of the weights is minimized for [Doucet 1998]
q (xk |x0:k−1 , i1:k ) = p (xk |xk−1 , ik ) .
(2.26)
Unfortunately this distribution is difficult to sample directly. In our case, we can sample
from it by using a rejection method (More? - App. A.2.5).
• Choice 2: p (xk |xk−1 , ik ), prior distribution. In order to simplify the evaluation of the
weights we can choose
"
#
1 xk − xk−1 − uk 2
1
q (xk |x0:k−1 , i1:k ) = p (xk |xk−1 ) = √
.
(2.27)
exp −
2
σw
2πσw
(j)
(j)
Thus for each previous xk−1 , we are going to obtain a sample from a r.v. Xk using
(j)
the distribution p xk |xk−1 3 . In our case, this choice reduces the problem to sampling
from a Gaussian distribution, which is very simple, and updating the weights following
(we chose the proportionality factor to be one)
(j)
(j)
(j)
w x1:k = P ik |xk w̃ x1:k−1 .
(2.28)
Note that in both cases the sampling and evaluation of the weights do not require the past
(j)
measurements and the samples x1:k−2 . This leads to memory requirements that do not increase
over time. If we compare both choices in terms of complexity, the second choice is better
because it only requires sampling from a Gaussian distribution and evaluating the weights
with the likelihood. Therefore, from now on, we will use the second choice for the sampling
distribution.
We have the following procedure:
3
For details on how to sample from it using a standard Gaussian variate see (How? - App. A.3.3).
2.3. Particle Filtering
85
(1:NS )
1. Sample the prior distribution
p (x0 ). This will generate NS samples x0
normalized weights w̃
For time k,
(j)
x0
=
. Set uniform
1
NS .
(j)
2. Create NS samples each from the corresponding r.v. Xk with PDF given by (2.27):

!2 
(j)
1 xk − xk−1 − uk 
1
(j)
.
p xk |xk−1 = √
exp −
2
σw
2πσw
3. Evaluate the sample weights using the measurement and the last weights with (2.28):
(j)
(j)
(j)
w x1:k = P ik |xk w̃ x1:k−1 .
4. Normalize the weights using (2.19):
(j)
w̃ x1:k
(j)
w x1:k
.
= N
PS (j) w x1:k
j=1
5. Obtain the estimate with the weighted mean
x̂k ≈
NS
X
j=1
(j)
(j)
xk w̃ x1:k .
This procedure is the sequential extension of importance sampling applied to filtering and
this is the reason for its commonly used name - sequential importance sampling filter. As
this method is a special case of importance sampling, it has the same general characteristics,
namely, it is biased for a fixed number of samples, but it converges with probability one to
the optimal estimator when NS → ∞ [Doucet 1998].
2.3.4
Sequential importance resampling
We would expect that by increasing the number of samples the filter would get closer and
closer to the optimal estimate. However, the convergence result is asymptotic, it works only
when NS tends to infinity. When NS is finite, it can be shown that the variance of the weights
increases over the time [Kong 1994]. This problem is known as the degeneracy problem and
what happens in practice is that after some time most of the normalized weights are close to
zero, which is equivalent to say that most of the samples are useless [Doucet 1998].
(j)
In the case of sampling with p xk |xk−1 , the cause of this problem is easy to understand.
We start with a given prior distribution, then
duringthe procedure we evaluate the posterior
(j)
for values of Xk sampled randomly using p xk |xk−1 , as there is no feedback from the measurements in the sampling processes, after some time, the samples can lie very far from the
values of Xk where the posterior has larger values. As a consequence, this will produce a very
poor discrete approximation of the posterior. A possible remedy for this problem is to drive
the sampling process using the measurements i1:k .
86
Chapter 2. Estimation of a varying parameter
Resampling
(j)
This can be donein asimple way by reproducing the samples xk for which the posterior
(j)
approximation w̃ x1:k is large and deleting the samples for which the posterior is small.
This procedure, known as resampling, can be carried out in practice by sampling NS times4
the posterior discrete approximation given by
 w̃ x(j) , if x = x(j) ,
k
k
1:k
(2.29)
P (xk ) =
0, otherwise.
After resampling, for retaining the posterior approximation, the weights of the samples are
set to
1
(j)
.
(2.30)
w̃ x1:k =
NS
As the posterior approximation is a multinomial distribution, the procedure of resampling
using the approximation of the posterior (2.29) is known as multinomial resampling. Multinomial resampling can be easily implemented using NS independent uniform samples, for details
see (How? - App. A.3.4) (app4) and for other types of resampling techniques see [Hol 2006].
The process of resampling should not be performed every time as it leads to the impoverishment of the sample set [Berzuini 1997]. Sample impoverishment comes as the opposite
extremum of the degeneracy problem, as in this case we simply neglect possible trajectories
of Xk with medium and low likelihood, leading to a not sufficiently rich approximation of the
posterior. For triggering the resampling process we can monitor the number of effective samples Neff , that is to say, the equivalent number of samples if we were using the true posterior
for Monte Carlo evaluation. This number can be approximated by [Doucet 1998]
Neff =
N
PS
j=1
1
w̃2
(j)
x1:k
.
(2.31)
Therefore, each time Neff < Nthresh , where Nthresh ∈ [1, NS ] is a minimum acceptable number
of effective samples, the resampling process is triggered.
Sequential importance sampling with the resampling step for general Bayesian estimation
was first suggested in [Rubin 1988](cited in [Doucet 1998]) under the name sequential importance resampling. Its widespread use in filtering with the specific choice of p (xk |xk−1 ) as the
sampling distribution was initiated with [Gordon 1993] under the name of bootstrap filter.
This method was proposed for solving general nonlinear non Gaussian filtering problems.
The method presented above can be found in the literature under many other names, the
most common is particle filter (PF). In this case "particle" is the name given for a sample
(j)
xk . We will use the terms particle filter and particle from now on.
A proof of convergence of the general PF is given in [Berzuini 1997] for the case with
resampling at each iterate. It is shown that when NS → ∞ the error between the optimal
4
We could resample more or less than NS samples, we chose NS because it is the most commonly used
choice in the literature.
2.4. Evaluation of the estimation performance
87
√
estimator and the PF estimate multiplied by NS tends to a Gaussian r.v. with fixed finite
variance. This means that, for a large number of particles, when the number of particles
increases the PF estimate is more and more concentrated around the optimal estimator.
Application of PF for estimation based on quantized measurements with a fixed sequence
of threshold sets are reported in [Ruan 2004] and [Karlsson 2005]. In [Ruan 2004] the main
focus is on analyzing the main issues related to the fusion of quantized measurements from
multiple sensors for tracking in general, the results reported therein are given by simulation.
A more restricted model with Xk given by a vector linear Gaussian evolution and quantized
linear Gaussian measurement is used in [Karlsson 2005], where a theoretical lower bound
on estimation performance is obtained and compared with simulation results. The bound
that is used is the equivalent counterpart of the CRB for random parameters the Bayesian
Cramér–Rao bound (BCRB).
2.4
Evaluation of the estimation performance
We have already explained how to obtain the estimates for our problem (b) (p. 29). We still
need to evaluate its performance.
2.4.1
Online empirical evaluation
The variance of the posterior approximation (supposing that NS is sufficiently large for the
bias to be negligible)
NS 2 X
(j)
(j)
xk − x̂k w̃ x1:k ,
MSEk ≈
j=1
gives an online estimate of the MSE. The problem with this approach is that the performance is
conditioned on the given measurement sequence i1:k . In this case, approximated performance
can be obtained only after having the measurements, thus no design of the system (choice of
the number of quantization intervals NI , choice of the sensor quality δ) can be done. Even if we
push more into the Monte Carlo philosophy and try to evaluate the mean of the approximated
MSE above using Monte Carlo integration, we will have to simulate a large number of times the
2 ). Therefore,
PF procedure by changing the parameters needed for system design (NI , δ , σw
it is better to turn our attention to analytical results on performance.
2.4.2
BCRB
The analytical form of the MSE (2.9) depends on the posterior distribution. Thus, for the
same reason, we cannot have an analytical expression for the estimator, we are not going to
have an analytical expression for the MSE. We must resort then to a bound on the MSE. As a
consequence, we will follow [Karlsson 2005] and we will also analyze the BCRB. As our case is
simpler (Xk is a scalar Wiener process) than the vector linear case studied in [Karlsson 2005],
we will be able to analyze the effects of the measurement system parameters in a more clear
and simple way.
88
Chapter 2. Estimation of a varying parameter
The BCRB at instant k, BCRBk , is a lower bound on MSEk , it is given by the inverse of
the Bayesian information (BI) [Van Trees 1968, p. 84]
MSEk ≥ BCRBk =
1
.
Jk
(2.32)
The BI at time k, JK , is given by
∂ 2 log p (Xk , i1:k )
.
Jk = −E
∂Xk2
(2.33)
As Xk is random, the expectation here is evaluated using the joint probability measure of Xk
and i1:k . This result is general and it is not linked particularly to the quantization problem,
we could replace i1:k by any measurement related to Xk .
By assuming that Xk is a Markov process (also here i1:k can be any type of measurement),
in [Tichavsky 1998] a recursive form for evaluating the BI is obtained
Jk = C k −
where5
h
∂ 2 log p(X0 )
∂X02
i
Bk2
,
Ak + Jk−1
and Ak = −E
J0 = −E
h 2
i
h 2
i
∂ log p(Xk |Xk−1 )
∂ log P(ik |Xk )
Ck = −E
−
E
.
∂X 2
∂X 2
k
(2.34)
∂ 2 log p(Xk |Xk−1 )
2
∂Xk−1
, Bk = −E
h
∂ 2 log p(Xk |Xk−1 )
∂Xk ∂Xk−1
i
,
k
Using (2.3) for evaluating the terms Ak , Bk and Ck , we have

 2 
2 




1 Xk −Xk−1 −uk
2 log √ 1
2 − 1 Xk −Xk−1 −uk




∂
exp
−
∂




2
σw
2
σw
2πσw
Ak = −E
=
−E
2
2




∂Xk−1
∂Xk−1








=
1
.
2
σw
In the same way
1
,
2
σw
2
1
∂ log P (ik |Xk )
−E
.
2
σw
∂Xk2
Bk = −
Ck =
Decomposing the expectation above, we obtain
2
∂ log P (ik |Xk )
1
.
Ck = 2 + EXk −Eik |Xk
σw
∂Xk2
The inner expectation is another form of expressing the FI for estimating Xk when Xk is considered to be a deterministic parameter [Kay 1993, p. 34]. Thus, by using the parametrization
of the FI for quantized measurements with the r.v. εk = τ0,k − Xk , we can write
Ck =
5
1
+ E [Iq (εk )] .
2
σw
Note that we are using the notation for discrete measurements i1:k with P (ik |xk ).
2.4. Evaluation of the estimation performance
89
where the expectation is evaluated using the probability measure of εk . Using these results in
(2.34) gives
1
1
1
,
Jk = 2 + E [Iq (εk )] − 4 (2.35)
1
σw
σw
+J
k−1
2
σw
with J0 given by
J0 = −E


2 log √

∂




1
2πσ0

X0 −x′0 2


exp − 12

σ0
∂X02



=
1
.
σ02
(2.36)
For commonly used noise models (Gaussian, Laplacian and Cauchy), the FI is maximized
for εk = 0. Thus, we can obtain a simple upper bound on the BI by assuming εk = 0 with
probability one. This gives
Jk ≤ Jk′ =
1
1
+ Iq (0) − 4 2
σw
σw
1
1
2
σw
′
+ Jk−1
,
(2.37)
with J0′ = J0 . This will give a simple lower bound on the BCRB and consequently on the
MSE, which can be used to assess approximately the performance of the PF.
90
Chapter 2. Estimation of a varying parameter
The solution to problem (b) (p. 29) given by the PF is
Solution to (b) - Particle filter for a fixed threshold set sequence τ 1:k
(b1.1) 1) Estimator
(j)
• Set uniform normalized weights w̃ x0
= N1S and initialize NS
n
o
(1)
(N )
particles x0 , · · · , x0 S
by sampling the prior
p (x0 ) = √
"
2 #
1 x0 − x′0
1
.
exp −
2
σ0
2πσ0
For each time k,
(j)
• for j from 1 to NS , sample the r.v. Xk with PDF
(How? - App. A.3.3)

!2 
(j)
x
−
x
−
u
k
k
1
1
(j)
k−1
,
p xk |xk−1 = √
exp −
2
σw
2πσw
• for j from 1 to NS , evaluate and normalize the weights
(j)
w x1:k
(j)
w̃ x1:k = N
,
(j)
(j)
(j)
PS (j) w x1:k = P ik |xk w̃ x1:k−1 ,
w x1:k
j=1
(j)
is given by (2.14).
where P ik |xk
• Obtain the estimate with the weighted mean
x̂k ≈
NS
X
j=1
(j)
(j)
xk w̃ x1:k .
• Evaluate the number of effective particles
Neff =
N
PS
j=1
1
(j)
w̃2 x1:k
,
if Neff < Nthresh , then resample using multinomial resampling
(How? - App. A.3.4) (app4).
2) Performance (lower bound)
The MSE can be lower bounded as follows
1
MSEk ≥ ′ ,
Jk
with Jk′ given recursively by
1
1
Jk′ = 2 + Iq (0) − 4 σw
σw
2.5
1
1
2
σw
′
+ Jk−1
.
Quantized innovations
For commonly used symmetrically distributed noise models (Gaussian, Laplacian and Cauchy
distributions), we saw in Ch. 1 that Iq (ε) around ε = 0 is a locally decreasing function with
|ε|, thus from (2.35) we can see that closer τ0,k is to the parameter realization xk , higher will
2.5. Quantized innovations
91
be the BI. If we assume that the BCRB is sufficiently tight for accepting its behavior as an
approximation of the behavior of the MSE, closer τ0,k is to the parameter realization xk , lower
will be the MSE. This indicates that the dynamical range of the quantizer must vary in time
in order to follow the parameter and produce enhanced estimation performance.
2.5.1
Prediction and innovation
Prediction. The main problem with the approach –τ0,k = xk – is that we do not know xk , if
we knew, we would not need to estimate it. We might then accept a small loss of performance
by using the closest value to xk that we have in hand, in our case, a prediction of xk using the
last estimate value x̂k−1 and the drift uk . If X̂k−1 is the MMSE estimator based on i1:k−1 ,
then the MMSE estimator of Xk based also on i1:k−1 , denoted X̂k|k−1 , is the conditional mean
[Jazwinski 1970, p. 149]
X̂k|k−1 = EXk |i1:k−1 (Xk ) .
Using the dynamical model for Xk and the linearity of the conditional expectation
X̂k|k−1 = EXk |i1:k−1 (Xk−1 + Wk ) = EXk−1 |i1:k−1 (Xk−1 ) + EWk |i1:k−1 (Wk ) .
The first term is the optimal estimate for Xk−1 . As Wk is independent of all Wn with n 6= k
and it is also independent of all Vk , it does not depend on i1:k−1 . Thus, the second term is
simply EWk (Wk ), which is uk . This gives
X̂k|k−1 = τ0,k = X̂k−1 + uk .
Considering that the estimator is good enough (at least for large k), we expect to have the r.v.
εk with most of its probability concentrated around zero, thus leading to a higher E [Iq (εk )]
and, consequently, to a lower MSE.
Quantizing the Innovation. Quantizing the prediction error is a known subject in standard quantization. It is widely known under the name predictive quantization [Gersho 1992,
Ch. 7]. Note that the procedure considered here is different. Instead of quantizing the prediction error of reconstruction, we quantize the error between the prediction in estimation sense
and the noisy measurement Yk − X̂k|k−1 . The prediction error in this case is commonly called
the innovation process in continuous measurement linear filtering theory [Kay 1993, p. 433].
The name comes from the fact that it represents the previously unknown information brought
by the new measurement. As a consequence, the quantized prediction error for estimation
purposes is called quantized innovation.
We can slightly change solution (b1.1) (p. 90) by adding the adaptive replacement of the
central threshold with the prediction6 . This is what was done in [Sukhavasi 2009b] under
the assumption of Gaussian noise, linear and Gaussian vector Xk ( Xk = AXk−1 + Wk ) and
binary quantization. Constraining Xk to be the scalar Wiener process considered here and
generalizing the algorithm for symmetrically distributed noise and NI ≥ 2, we have
6
As the measurements are now linked through the use of the prediction for quantizing, we cannot guarantee
the convergence of the particle approximation through standard results and more advanced results are needed
[Crisan 2000].
92
Chapter 2. Estimation of a varying parameter
Solution to (b) - Particle filter with adaptive threshold sequence τ 1:k
quantizing the innovation.
(b2.1) 1) Estimator
(j)
• Set uniform normalized weights w̃ x0
= N1S and initialize NS
n
o
(1)
(N )
particles x0 , · · · , x0 S
by sampling with prior
"
2 #
1
1 x0 − x′0
exp −
p (x0 ) = √
.
2
σ0
2πσ0
For each time k,
(j)
• for j from 1 to NS , sample the r.v. Xk with PDF
(How? - App. A.3.3)

!2 
(j)
xk − xk−1 − uk
1
1
(j)
,
p xk |xk−1 = √
exp −
2
σw
2πσw
• for j from 1 to NS , evaluate and normalize the weights
(j)
w x1:k
(j)
,
w̃ x1:k = N
(j)
(j)
(j)
PS (j) w x1:k = P ik |xk w̃ x1:k−1 ,
w x1:k
j=1
(j)
is given by (2.14).
where P ik |xk
• Obtain the estimate with the weighted mean
x̂k ≈
NS
X
j=1
(j)
(j)
xk w̃ x1:k .
• Set the central threshold of the quantizer to the new estimate
τ0,k = x̂k−1 + uk .
• Evaluate the number of effective particles
Neff =
N
PS
j=1
1
(j)
w̃2 x1:k
,
if Neff < Nthresh , then resample using multinomial resampling
(How? - App. A.3.4) (app4).
2) Performance (lower bound)
The MSE can be lower bounded as follows
1
MSEk ≥ ′ ,
Jk
with Jk′ given recursively by
1
1
Jk′ = 2 + Iq (0) − 4 σw
σw
1
1
2
σw
′
+ Jk−1
.
2.5. Quantized innovations
2.5.2
93
Bound for the quantized innovations
Observe that the lower bound is still valid because we still have E [Iq (εk )] ≤ Iq (0). But, we
might have a performance closer to the bound, as εk might be concentrated mostly around
zero. It is also important to note that now even if the bound is tight (which might not be
true), we can get very close to it, but in general we cannot attain it. This is due to the fact
that the MSE for a varying parameter never goes to zero, leading to a residual spread on the
PDF of εk , which makes E [Iq (εk )] ≤ Iq (0).
To have an approximation on the evolution of the MSE, we can analyze the lower bound
on the BCRB. Therefore, we are interested in analyzing the evolution of Jk′ . We can start by
′
analyzing the evolution of its increments. Subtracting the expressions for Jk′ and Jk−1
, we
have
!
′
′
Jk−1
− Jk−2
1
1
1
1
′
′
.
−
=
(2.38)
Jk − Jk−1 = 4
1
4
′
′
1
1
σw σ12 + Jk−2
σw
+ Jk−1
+ J′
+ J′
σ2
w
2
σw
w
k−1
2
σw
k−2
2 is also
The BI is positive by definition, as it is an expectation of a squared quantity, and σw
positive by definition, thus the denominator of the expression above is always positive. This
leads to a sign of the increment at time k − 1 that is the same as the sign of the increment at
k − 2. As a conclusion, we can say that the BI is monotonic, it always increases or decreases.
For determining if the BI increases or decreases, we can see from the recursive expression
above that this will be determined by the first increment J1′ − J0′ . By subtracting J0′ = σ12
0
from J1′ , we obtain
1
1
+ Iq (0) − 4
2
σw
σw
J1′ − J0′ =
Regrouping the terms with factor
J1′ − J0′ = Iq (0) +
1
2
σw
1
2
σw
1
+
1
σ02
−
1
.
σ02
gives


1 
1
1
1
1 
1−
− 2.
− 2 = Iq (0) + 2
2
2
2
σ
σw
σ0
σw + σ0
σ0
1 + σw2
0
Thus, if
Iq (0) >
1
1
− 2
,
σ02 σw
+ σ02
J1′ − J0′ is positive and the BI is always increasing, otherwise it always decreases. As a
consequence, if the inequality is satisfied the BCRB is always decreasing, otherwise always
increasing.
As stated before, the information bound Jk′ is bounded below by zero. By looking to (2.37)
Jk ≤ Jk′ =
1
1
+ Iq (0) − 4 2
σw
σw
1
1
2
σw
′
+ Jk−1
,
we can see that it is bounded above by Iq (0) + σ12 , as the other term that is subtracted is
w
always positive. Joining the facts that Jk′ is lower and upper bounded with the fact that it
94
Chapter 2. Estimation of a varying parameter
is always increasing or decreasing, we can conclude that Jk′ will converge to a fixed point (a
fixed value). Except in the cases when the inequality above is an equality, from (2.38), we
′
see that the increment Jk′ − Jk−1
cannot be zero, as it is equal to the last increment (which
is positive) multiplied by a positive value. Therefore, the fixed point of Jk′ is expected to be
attained only asymptotically.
Jk′
′ , by definition it is the value of J ′ for which
Denoting this asymptotic fixed point J∞
k
′
= Jk−1 . Thus, it can be found by solving
1
1
+ Iq (0) − 4 2
σw
σw
′
J∞
=
which is equivalent to solve
2
′
′
− Iq (0) J∞
−
J∞
1
1
2
σw
′
+ J∞
,
Iq (0)
= 0.
2
σw
The solutions for the equation above are
Iq (0) ±
q
Iq2 (0) +
4Iq (0)
2
σw
2
.
′ positive (it is positive by definition), we must take the positive solution.
In order to have J∞
4Iq (0)
As σ2 is positive, the positive solution is obtained for the positive sign. Therefore,
w
′
J∞
=
Iq (0) +
q
Iq2 (0) +
4Iq (0)
2
σw
(2.39)
2
′
and the asymptotic MSE is then lower bounded by the inverse of J∞
MSE∞ ≥
2
q
Iq (0) + Iq2 (0) +
4Iq (0)
2
σw
.
(2.40)
The following behaviors can then be obtained for the evolution of the bound: if we start with
a very small σ02 (small compared with Iq1(0) ), as we can see from the inequality related to the
monotonicity pattern, the lower bound on the MSE will always increase, tending asymptotically to J1′ . If we start with a large σ02 , the lower bound will always decrease, also tending
∞
asymptotically to J1′ .
∞
From the analysis we can see that the MSE, as expected, is always strictly positive, it is
lower bounded by σ02 when this value is very small compared with Iq1(0) and it is lower bounded
by
1
′
J∞
2.5.3
when σ02 is large compared with
1
Iq (0) .
Gaussian assumption and asymptotic estimation of a slowly varying
parameter
Other filtering methods based on the quantized innovation are proposed in the literature
under the Gaussian noise assumption. In [Ribeiro 2006c], binary measurements are obtained
2.5. Quantized innovations
95
by applying the sign function to the innovation. A similar procedure to the well-known Kalman
filter [Kay 1993, Ch. 13] is derived by assuming that the posterior at instant k −1 is Gaussian.
In the same line, [You 2008] proposes a Kalman-like procedure for quantized innovations with
NI > 2. A careful reader of the literature on the subject might note that the idea of considering
Gaussian approximations of the posterior for filtering based on quantized data with Gaussian
noise dates back to [Curry 1970]. Also, the idea of quantizing the innovation seems to be first
exploited in [Borkar 1995]7 (cited in [Sukhavasi 2009a]).
The general algorithm presented in [You 2008] has its approximate performance dependent
on Iq (0), with Iq √
(0) being evaluated for the Gaussian distribution with variance σv2 = 1 (noise
scale factor δ = 2). The performance of the algorithms is enhanced by maximizing Iq (0).
This is in accordance with the lower bound on the MSE studied above for the Wiener process
model with symmetric noise, MSEk ≥ J1′ , which decreases with increasing Iq (0) ((2.37) shows
k
that Jk′ increases with Iq (0)). This gives additional motivation for studying how to maximize
Iq (0) w.r.t. the thresholds.
The assumption that the posterior is a Gaussian distribution for all k and all σw stated
in [Curry 1970], [Ribeiro 2006c] and [You 2008] is a very rough approximation. For observing
this, consider that the assumption that the prediction PDF p (xk |i1:k−1 ) is Gaussian is correct.
Then, from the update expression (2.11), we know that the posterior is proportional to the
function P (ik |xk ) p (xk |i1:k−1 ). The probability P (ik |xk ) is a difference of CDF, which is a
function that is approximately a rectangular window with slowly decreasing borders centered
at the quantization interval for ik . If the standard deviation of the prediction is large or has
similar value of the equivalent width of P (ik |xk ) and the prediction distribution has a mean
that is different of the quantization interval center, then it is easy to see that the resultant
P (ik |xk ) p (xk |i1:k−1 ) will be a skewed function, not similar at all to a Gaussian function. As
an additional remark, we can see that differently from the continuous measurement case, where
the measurement noise must be Gaussian for having a Gaussian posterior, the assumption of
Gaussian noise does not help here, as the function P (ik |xk ) is not close to Gaussian even in
the Gaussian case.
We will use the Gaussian assumption when σw is small and k tends to infinity. Under
these assumptions and considering that we quantize the innovations, we will obtain an approximation of the asymptotically optimal estimator and its performance. To verify that the
approximation is reasonable, we will compare the approximate asymptotic performance with
the asymptotic BCRB.
2.5.3.1
Asymptotic estimator for a slowly varying parameter
As it was discussed above, it is reasonable to accept that the estimator MSE will converge to
2 . When the Wiener process increment standard deviation σ is small compared
a constant σ∞
w
with the noise scale factor, the estimator has sufficient time for reducing the estimation vari2 is small. If
ance before Xk changes significantly, thus it is also reasonable to state that σ∞
7
In this case, the true innovation is quantized, i.e., the innovation obtained by using the estimator based on
the continuous measurements, this is different from the methods in [Ribeiro 2006c] and [You 2008], where the
quantized innovation is the innovation obtained using the estimator based on the quantized measurements.
96
Chapter 2. Estimation of a varying parameter
we assume that the previous posterior after some time is approximately Gaussian with mean
2 , then, as the prediction PDF is the convolution (2.10) of p (x |x
E (Xk−1 ) and variance σ∞
k k−1 ),
which is Gaussian, with p (xk |i1:k−1 ) which is also Gaussian, we obtain that the prediction
PDF conditioned on the past observations is Gaussian distributed with mean E (Xk−1 ) + uk
2 + σ 2 . For estimating the optimal X we must evaluate the conditional mean
and variance σ∞
k
w
R
xk P (ik |xk ) p (xk |i1:k−1 ) dxk
Z
P (ik |xk ) p (xk |i1:k−1 )
R
R
R
dxk =
.
X̂k = xk
P ik |x′k p x′k |i1:k−1 dx′k
P ik |x′k p x′k |i1:k−1 dx′k
R
R
R
The numerator in the last term of the RHS can be seen as the prediction mean of the r.v.
Xk P (ik |Xk ) (the mean w.r.t. p (xk |i1:k−1 )), under the assumption that σ∞ is small, the factor
P (ik |Xk ), which is given by (2.14)
 F τ ′ + X̂k|k−1 − Xk − F τ ′
+
X̂
−
X
if ik > 0,
k ,
k|k−1
|ik |
|ik−1|
P (ik |Xk ) =
′
F −τ ′
if ik < 0,
|ik +1| + X̂k|k−1 − Xk − F −τ|ik | + X̂k|k−1 − Xk ,
can be well approximated by a first order Taylor series expansion around X̂k|k−1 − Xk = 0
X̂
−
X
+
P (ik |Xk ) = P (ik |Xk )|X̂
f
i
,
X̂
,
X
k
d
k
k
k|k−1
k|k−1
k|k−1 =Xk
X̂k|k−1 =Xk
+ ◦ X̂k|k−1 − Xk , (2.41)
where fd ik , X̂k|k−1 , Xk is the first derivative of P (ik |Xk ) w.r.t. the prediction error X̂k|k−1 −
Xk . It can be written as a function of the noise PDF f
fd ik , X̂k|k−1 , Xk =
 f τ ′ + X̂k|k−1 − Xk − f τ ′
+
X̂
−
X
,
if ik > 0,
k
k|k−1
|ik |
|ik−1|
=
(2.42)
′ + X̂
f −τ ′
+
X̂
−
X
−
f
−τ
−
X
,
if
i
<
0.
k
k
k
k|k−1
k|k−1
|ik +1|
|ik |
Note that when X̂k|k−1 = Xk the function fd ik , X̂k|k−1 , Xk depends only on ik . Using
(2.41), the numerator in the estimator expression is then the prediction mean of
2
f
X̂
−
X
+
X
Xk P (ik |Xk ) = Xk P (ik |Xk )|X̂
i
,
X̂
,
X
d
k
k
k
k|k−1
k|k−1
k
=X
k
k|k−1
X̂k|k−1 =Xk
+ ◦ Xk X̂k|k−1 − Xk2 .
The prediction mean of Xk is the prediction X̂k|k−1 , while X̂k|k−1 is simply a constant for the
evaluation of this mean. Thus, using linearity and the fact that
2
2
E2Xk |i1:k−1 (Xk ) − EXk |i1:k−1 Xk2 = −VarXk |i1:k−1 (Xk ) = − σ∞
+ σw
,
we have
Z
xk P (ik |xk ) p (xk |i1:k−1 ) dxk = X̂k|k−1 P (ik |Xk )|X̂
k|k−1 =Xk
+
R
2
2
− σ∞
+ σw
fd ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
2
2
+ ◦ σ∞
+ σw
. (2.43)
2.5. Quantized innovations
97
To obtain the denominator in the estimation expression, we can use a similar procedure. Note
that now the prediction expectation is evaluated for the r.v. P (ik |Xk ) instead of Xk P (ik |Xk ).
We will use the second order Taylor expansion of P (ik |Xk ) in this case:
X̂
−
X
+
f
P (ik |Xk ) = P (ik |Xk )|X̂
i
,
X̂
,
X
+
k
d
k
k k|k−1
k|k−1
=Xk
k|k−1
+
X̂k|k−1 − Xk
2
2
X̂k|k−1 =Xk
fd′ ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
+◦
X̂k|k−1 − Xk
2 , (2.44)
where fd′ ik , X̂k|k−1 , Xk is the second derivative of P (ik |Xk ) w.r.t. the prediction error. By
differentiating fd in (2.42), we can observe that, for X̂k|k−1 = Xk , this function also depends
. For the
only on ik . The mean of the first term above is the constant P (ik |Xk )|X̂
k|k−1 =Xk
second term fd is a constant and the mean of the prediction is zero, as the optimal predictor
is unbiased [Jazwinski 1970, p. 150]. The third and last terms depend on the prediction
2
2 + σ 2 , also due to the
mean of X̂k|k−1 − Xk , which is equal to the prediction variance σ∞
w
unbiasedness of the optimal predictor. This gives the following
Z
P ik |x′k p x′k |i1:k−1 dx′k = P (ik |Xk )|X̂
=Xk +
k|k−1
R
2 + σ2
σ∞
w
2
2
+
fd′ ik , X̂k|k−1 , Xk .
+ σw
+ ◦ σ∞
2
X̂k|k−1 =Xk
(2.45)
Dividing the RHS of the expressions (2.43) and (2.45), we have an expression for the estimator
2 + σ2 f
2 + σ2
−
σ
X̂k|k−1 P (ik |Xk )|X̂
i
,
X̂
,
X
+ ◦ σ∞
d
k
k
k|k−1
∞
w
w
k|k−1 =Xk
X̂k|k−1 =Xk
.
X̂k =
2 +σ 2 )
(σ∞
′ i , X̂
w
2 + σ2 )
+
◦
(σ
+
P (ik |Xk )|X̂
f
,
X
k
k
k|k−1
∞
w
d
2
=X
k
k|k−1
X̂k|k−1 =Xk
Dividing the numerator and denominator by P (ik |Xk )|X̂
X̂k =
2 + σ2
X̂k|k−1 − σ∞
w
1+
2 +σ 2 )
(σ∞
w
2
fd (ik ,X̂k|k−1 ,Xk )|X̂
P(ik |Xk )|X̂
fd′ (ik ,X̂k|k−1 ,Xk )|X̂
P(ik |Xk )|X̂
k|k−1
, we get
k|k−1 =Xk
k|k−1 =Xk
k|k−1 =Xk
k|k−1 =Xk
+
2 + σ2
+ ◦ σ∞
w
2
◦ (σ∞
+
.
(2.46)
2)
σw
′
fd (ik ,X̂k|k−1 ,Xk )|X̂
k|k−1 =Xk 2 + σ 2 ≪ 1, the denominator is approximately
If is bounded and σ∞
w
P(ik |Xk )|X̂
=X
k
k|k−1
one and we can approximate the optimal estimator by8
fd ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
2
2
.
X̂k ≈ X̂k|k−1 − σ∞
+ σw
P (ik |Xk )|X̂
=Xk
(2.47)
k|k−1
8
1
around x = 0 would produce a more precise approximation,
Note that a first order Taylor series of 1+x
but this would generate a more complicated algorithm for the performance analysis.
98
Chapter 2. Estimation of a varying parameter
2.5.3.2
Performance of the asymptotic estimator
2 . The idea here
Now, we need to calculate the asymptotic MSE of this estimator, which is σ∞
to start is to rewrite the asymptotic prediction error as a sum of the estimation error plus a
function of the observations. Using (2.47) we have
fd ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
2
2
+ σw
X̂k − Xk = X̂k|k−1 − Xk − σ∞
,
P (ik |Xk )|X̂
=Xk
k|k−1
subtracting the term with fd from both sides, we have
fd ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
2
2
+ σw
X̂k − Xk + σ∞
= X̂k|k−1 − Xk .
P (ik |Xk )|X̂
=Xk
k|k−1
Squaring and taking the expectation gives
E
X̂k − Xk

2 

+ 2E  X̂k − Xk
 2
2 2
+E  σ∞
+ σw
2
2
σ∞
+ σw
fd2 ik , X̂k|k−1 , Xk P2 (ik |Xk )|X̂
fd ik , X̂k|k−1 , Xk P (ik |Xk )|X̂

X̂k|k−1 =Xk 
k|k−1 =Xk
=E
X̂k|k−1 =Xk 
k|k−1 =Xk

X̂k|k−1 − Xk
+
2 .
2 . The second term is the expectation of
The first term is the asymptotic squared error σ∞
2 + σ 2 the
the estimation error multiplied by a function of the measurement ik , for small σ∞
w
estimation procedure is optimal (it minimizes
the
MSE),
thus
this
expectation
equals
zero
2 + σ 2 2 can leave the expectation and the term on the
[Rhodes 1971]9 . The constant σ∞
w
2 + σ 2 . Therefore, we have
RHS is the prediction error σ∞
w

 fd2 ik , X̂k|k−1 , Xk X̂k|k−1 =Xk 
2
2
2 2 
2
2
(2.48)
+ σw
E
+ σ∞
σ∞
 = σ∞ + σw .
2
P (ik |Xk )|X̂
=Xk
k|k−1
The expectation that still needs to be evaluated is an expectation under the marginal probability measure of ik , P (ik ). This probability measure can be evaluated by marginalizing on
the prediction error εk = X̂k − Xk
Z
P (ik ) = P (ik |xk ) p (εk ) dεk .
R
Remember that in the quantized innovation scheme P (ik |xk ) is a function of ik and εk . The
marginal can be also observed as the mean of P (ik |Xk ) evaluated w.r.t. the distribution of
9
This is a more general form of the well-known orthogonal projection theorem.
2.5. Quantized innovations
99
εk . For evaluating the mean, we can again use a second order Taylor series expansion around
εk = 0. This will lead to the same expression as in (2.45). Then, the remaining expectation is

 
 2 i , X̂


f
,
X
fd2 ik , X̂k|k−1 , Xk k
k
k|k−1


X d
X̂k|k−1 =Xk
X̂k|k−1 =Xk 

+
=
E



P2 (ik |Xk )|X̂
P (ik |Xk )|X̂


=Xk
=Xk
ik ∈I
k|k−1
2
2
+ σ∞
+ σw
f2
X d
ik ∈I
ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
k|k−1
fd′ ik , X̂k|k−1 , Xk P2 (ik |Xk )|X̂
X̂k|k−1 =Xk
+
k|k−1 =Xk
2
2
+ σw
.
+ ◦ σ∞
The first term on the RHS can be identified as the FI Iq (0). Considering that the sum in the
2 + σ 2 2 , the second term
second term on the RHS is bounded, then,
after multiplying
by σ∞
w
i
h
2 + σ 2 2 term. This leads to
2 + σ 2 3 which is a ◦
σ∞
is multiplied by σ∞
w
w
2
σ∞
+

2 2 
σw
E
fd2 ik , X̂k|k−1 , Xk P2 (ik |Xk )|X̂

X̂k|k−1 =Xk 
k|k−1 =Xk
2
σ∞
=
+
2 2
σw
Iq
(0) + ◦
h
2
σ∞
+
2 2
σw
i
.
Using the expression above in (2.48), we obtain
h
i
2
2 2
2
2
2
2
2 2
+ σw
Iq (0) + ◦ σ∞
= σ∞
+ σw
,
σ∞
+ σ∞
+ σw
or equivalently
2 2
2
σ∞
+ σw
+◦
"
2 + σ2
σ∞
w
Iq (0)
2 #
=
2
σw
.
Iq (0)
2 is small enough so that the ◦ term is negligible, we can obtain the following approximation
If σw
2 :
for σ∞
σw
2
2
≈p
σ∞
− σw
.
(2.49)
Iq (0)
p
Considering that σw is small compared with Iq (0) and with one, we have a rough approximation for the asymptotic performance
σw
2
.
σ∞
≈p
Iq (0)
(2.50)
2 from (2.49) in the approximate expression for X̂ (2.47), the following
Finally, replacing σ∞
k
is obtained:
fd ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
σw
X̂k ≈ X̂k|k−1 − p
=
P
(i
|X
)|
Iq (0)
k
k X̂k|k−1 =Xk
fd ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
σw
.
(2.51)
= X̂k−1 + uk − p
P (ik |Xk )|X̂
Iq (0)
=Xk
k|k−1
100
Chapter 2. Estimation of a varying parameter
A few remarks are important here. First, in a similar way as it happened for the adaptive
estimator of a constant parameter, the asymptotic estimation procedure is very simple, it is a
correction on the last estimate which depends on the observation through
fd (ik ,X̂k|k−1 ,Xk )|X̂
P(ik |Xk )|X̂
k|k−1 =Xk
k|k−1 =Xk
,
a function of ik only. This means that the corrections can be stored in a table. Second, the
correction gain now is even simpler than in the constant parameter case, it is a constant. Third,
2 (2.50) agrees with the intuition on estimation performance, if
the rough approximation for σ∞
σw increases, the MSE increases, as the estimator has less effective samples to estimate before
the parameter changes significantly. If Iq (0) increases, which is equivalent to say that the
noise level is reduced and/or that the quantizer resolution is increased, the MSE decreases as
the statistical information given by each sample is reduced.
2.5.3.3
Asymptotic lower bound on the BCRB
for a slowly varying parameter
To check if the asymptotic estimator above is indeed close to optimal, we can compare its
estimation performance with the asymptotic MSE lower bound, which is given by (2.40):
MSE∞ ≥
2
q
Iq (0) + Iq2 (0) +
4Iq (0)
2
σw
.
Comparing with (2.50) must be done for small σw . For evaluating the RHS above in this case,
we can multiply its numerator and its denominator by √σw . This will lead to
2
2
Iq (0) +
q
Iq2 (0) +
Using the expansion around x = 0,
√
4Iq (0)
2
σw
Iq (0)σw
√
2 Iq (0)
1+x=1+
2
q
Iq (0) + Iq2 (0) +
=
4Iq (0)
2
σw
=
x
2
Iq (0)
√σw
Iq (0)
q
.
2
Iq (0)σw
+
+
1
4
+ ◦ (x), on the square root above gives
√σw
Iq (0)
Iq (0)σw
√
2 Iq (0)
+1+
2
Iq (0)σw
8
2)
+ ◦ (σw
,
2 is small compared with I (0) for making the I (0) to disappear
where we used the fact that σw
q
q
from the ◦ term. Note that this was also supposed to get the rough approximation (2.50) above.
1
= 1 − x + ◦ (x). Supposing,
We can use again an expansion around zero. Now, we use 1+x
Iq (0)
additionally that σw is small compared with √
, we can use a ◦ term depending only on
Iq (0)
σw . Thus, we obtain
2
Iq (0) +
q
Iq2 (0) +
4Iq (0)
2
σw
"
#
Iq (0) σw
σw
1− p
+ ◦ (σw ) .
=p
Iq (0)
2 Iq (0)
2.5. Quantized innovations
101
The squared terms can be assimilated to ◦ (σw ) leading finally to
EQM∞ ≥
2
q
Iq (0) + Iq2 (0) +
4Iq (0)
2
σw
=p
σw
+ ◦ (σw ) ,
Iq (0)
which for small σw is exactly the same as the rough approximation of the asymptotic estimator
performance. Consequently, we can say that the asymptotic estimator obtained above is
optimal, as in this specific case, it attains the lower bound.
As in the previous section, where the adaptive MLE scheme was shown to have a simple
recursive form asymptotically, a question arise:
• can the asymptotic estimator (2.47) for slowly varying Xk converge when we use it with
an arbitrary (not necessarily small) initial error?
The answer for this question will be given in Ch. 3.
102
2.6
Chapter 2. Estimation of a varying parameter
Chapter summary and directions
We sum up the main points of this chapter:
• instead of considering that the parameter is constant, we assumed that the parameter
can vary in time, more specifically, following a Wiener process model. We saw that, in
general, the optimal estimator can be obtained by evaluating the mean of the parameter
conditioned on the past and present quantized measurements. Thus, the core of the
problem was observed to be the evaluation of the posterior PDF (PDF of Xk conditioned
on i1:k ). For a Markov process Xk , which is the case for a Wiener process model, the
posterior can be evaluated in a recursive way, first by obtaining a prediction PDF using
the posterior at time k − 1 and the evolution model, then by updating the prediction
PDF to the posterior at time k, incorporating the new measurement ik .
• The integrals involved in the recursive expressions are complicated to be evaluated analytically, so we must resort to numerical algorithms for solving them. One way of doing
this is to apply Monte Carlo integration. This leads to a PF solution. The PF solution
is a recursive simplified form of Monte Carlo integration applied to the filtering problem
with an additional resampling step. The performance of the optimal estimator could
also be obtained using Monte Carlo integration, but it would be very difficult and time
consuming to study the effects of the system parameters (noise level, Wiener process
increments variance and quantizer resolution) using the Monte Carlo results. Therefore,
we considered a simpler solution by using a bound on the MSE for which we can have a
simple analytical expression.
• In our case, we used the BCRB, which is the inverse of the Bayesian information. The BI
for the Wiener process Xk can be evaluated recursively. From its recursive expression,
we could see that the BI and consequently the bound were affected by the quantization
through a E [Iq (εk )] term, where εk is the difference between the central threshold and
the parameter. If E [Iq (εk )] is increased, then the bound decreases, if it decreases, the
bound increases. For commonly used noise models, Iq (εk ) is maximum at εk = 0,
therefore, a practical lower bound can be obtained by using Iq (0) instead of E [Iq (εk )].
• If we accept that the bound is tight enough to mimic the behavior of the MSE, another
consequence of the dependence of the bound on E [Iq (εk )] and the fact that Iq (εk ) is large
close to εk is that the central threshold must be placed as close as possible to the true
parameter. This can be done in an approximate way, by setting the central threshold
to the prediction of Xk based on the past measurements. Thus, a good estimation
procedure might be based on the quantized innovation. In this case, it is expected that
the estimation performance will be closer to the bound, when compared with a quantizer
with arbitrary central threshold.
• When σw is small and k tends to infinity, the optimal estimator can be approximated by
a low complexity recursive expression, with its MSE attaining the BCRB and given by
√σw . This shows one more time, the importance of studying the maximization of Iq (0)
Iq (0)
w.r.t. the quantization threshold variations. As stated before, the asymptotic analysis
2.6. Chapter summary and directions
103
of this problem will be done in Part II. The simplicity of the asymptotic estimator when
Xk varies slowly will be used as a motivation to study in more detail recursive algorithms
of the type - prediction + correction based on ik . This will be done in Ch. 3.
• A generalization of the signal model used here can be obtained by considering that the
dynamical parameter is a vector Xk with size N and that it obeys a linear Gaussian
model of the type
Xk = Φk Xk−1 + Wk ,
where Φk is a N × N matrix and Wk is a sequence of independent Gaussian vectors.
The continuous measurement is a vector Yk with dimension M
Yk = Hk Xk + Vk ,
where Hk is a M × N matrix and Vk is a sequence of independent vectors. Quantization
can be done also scalarly, but we might consider two possibilities for the quantization
of each Yk , scalar quantization of each dimension or vector quantization of the entire
vector.
A direct application of the estimation problem with this model is the control of linear
systems under rate constraints. We will not go further in this direction in this thesis,
but we will keep this generalized version of the problem for future work.
Chapter 3
Adaptive quantizers for estimation
As we saw in the previous chapters, to obtain good estimation performance, the quantizer
dynamic might be adaptively set around the true parameter to be estimated. We also saw
that the asymptotically optimal estimator in the constant parameter case or in the slowly
varying parameter case has a simple recursive form. We asked at the end of each chapter:
• can the asymptotic estimator based on binary measurements converge when we use
its simplified form (the low complexity equivalent) with an arbitrary initial error (not
necessarily small)?
• Can we extend this low complexity recursive procedure to the case NI > 2?
• Can the asymptotic estimator for slowly varying Xk converge when we use its simplified
form (the low complexity equivalent) with an arbitrary initial error (not necessarily
small)?
In this chapter we will answer these questions. For doing so, we will impose the estimation
algorithm to have a general recursive form that includes the asymptotically optimal estimators
as special cases.
We will start the chapter with a brief review of the signal models that will be used (constant, Wiener process without and with drift) and with the definition of the quantizer to be
used. Then, we will define the estimation algorithm form and we will study its performance for
the signal models defined previously in terms of the mean error and of the MSE . Based on the
performance analysis, we will obtain the optimal estimator parameters and the corresponding
optimal performance. As in related work [Papadopoulos 2001], the optimal performance will
be used to obtain a measurement of performance loss due to quantization. This loss will be
evaluated for each signal model by using the corresponding optimal performance for estimation based on continuous measurements. The performance results will be verified through
simulation.
We will also propose extensions of the adaptive algorithm in the following cases:
• quantized measurements from a sensor are used for estimating a constant parameter,
but in this case, the noise scale parameter is considered to be unknown.
• Multiple sensors and a fusion center are used to estimate a constant parameter. The
sensors can send only quantized measurements to the fusion center, while the fusion
center can broadcast continuous values to the sensors.
105
106
Chapter 3. Adaptive quantizers for estimation
In each of these cases we will follow a similar procedure. We will define the problem and the
estimation algorithm to be used. Then, we will obtain the optimal estimator parameters and
the corresponding optimal performance. Simulation will be used to check the validity of the
results.
At the end of the chapter, we will summarize the main results of the chapter and we will
give some directions for future work.
Contributions presented in this chapter:
• Design and analysis of an adaptive estimation algorithm based on multibit quantized
noisy measurements. This differs from [Li 2007] and [Fang 2008], where only binary
quantization is treated.
• Explicit performance analysis for tracking of a slowly varying parameter. Differently
from [Papadopoulos 2001, Ribeiro 2006a, Li 2007, Fang 2008], where the parameter is
set to be constant and all subsequent analysis is based on this hypothesis. Even if tracking is treated in a more general way in [Ribeiro 2006c] and [You 2008], we do not state
assumptions on noise Gaussianity. Note that the assumption that the parameter varies
slowly seems more restrictive than the parameter models considered in [Ribeiro 2006c]
and [You 2008], actually, the slowly varying assumption is hidden in the performance
evaluation for the binary case given in [Ribeiro 2006c], where it is shown that the performance of the proposed filter reaches the equivalent continuous when the sampling time
tends to zero.
• Low complexity algorithms. The algorithms proposed here are based on simple recursive
techniques that have lower complexity than the methods proposed in [Li 2007] and
[Fang 2008].
• Joint location and scale adaptive estimator. The algorithm that we propose is an extension of the location estimation problem. This extension is discussed in [Ribeiro 2006b]
but only for fixed quantization thresholds.
• Fusion center approach. This approach can be seen as a multisensor, multibit, low
complexity alternative to the adaptive techniques presented in [Li 2007] and [Fang 2008]
and also as an adaptive alternative for the optimal threshold distribution approach given
in [Ribeiro 2006a], where a prior distribution on the parameter is needed.
107
Contents
3.1
Parameter model and measurement model . . . . . . . . . . . . . . . . 108
3.1.1
Parameter model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.1.2
Noise model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
3.1.3
Adjustable quantizer model . . . . . . . . . . . . . . . . . . . . . . . . . . 109
3.2
General estimation algorithm . . . . . . . . . . . . . . . . . . . . . . . . 111
3.3
Estimation performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
3.4
3.5
3.6
3.7
3.3.1
Mean ordinary differential equation . . . . . . . . . . . . . . . . . . . . . . 113
3.3.2
Asymptotic MSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Optimal algorithm parameters and performance . . . . . . . . . . . . . 125
3.4.1
Optimal algorithm parameters . . . . . . . . . . . . . . . . . . . . . . . . 125
3.4.2
Algorithm performance for optimal gain and coefficients . . . . . . . . . . 128
Simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.5.1
General considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
3.5.2
Theoretical performance loss due to quantization . . . . . . . . . . . . . . 137
3.5.3
Simulated loss . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
3.5.4
Comparison with the high complexity algorithms . . . . . . . . . . . . . . 143
3.5.5
Discussion on the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
Adaptive quantizers for estimation: extensions
. . . . . . . . . . . . . 149
3.6.1
Joint estimation of location and scale parameters . . . . . . . . . . . . . . 149
3.6.2
Fusion center approach with multiple sensors . . . . . . . . . . . . . . . . 155
Chapter summary and directions . . . . . . . . . . . . . . . . . . . . . . 164
108
3.1
3.1.1
Chapter 3. Adaptive quantizers for estimation
Parameter model and measurement model
Parameter model
We will join the constant and varying models by using the dynamic model (2.1)
Xk = Xk−1 + Wk ,
where {Wk , k = 1, 2, · · · } is a sequence of independent Gaussian r.v. (also independent of Xn ,
for n < k) whose means form a deterministic sequence {uk , k = 1, 2, · · · } and its standard
deviation is σw :
2
.
W k ∼ N u k , σw
Symbol ∼ means "distributed according to" and N is the symbol for the Gaussian distribution.
Differently from what was considered previously, the sequence uk will be considered to be
a known or unknown constant u and we will assume that it has small value. We will also
assume that σw is a known, small constant. "Small" in both cases means that these constants
are small when compared with the noise scale parameter. In the Gaussian noise case, this is
equivalent to say that they are small when compared with the noise standard deviation.
The fact that we use a constant u instead of the varying uk will allow to have asymptotic
performance results. In practice, all the results that will be presented will be valid for varying
uk , as long as the sequence uk is small and slowly varying.
The model above is a compact form to describe the three parameter models that are
studied in this thesis:
• constant: by taking u = σw = 0 and X0 = x, we have the constant parameter model.
• Wiener process: if u = 0, (small) nonzero σw and Gaussian X0 with unknown mean and
variance, then Xk is a (slowly) varying Wiener process.
• Wiener process with drift: in this case u and σw are non zero (and with small amplitudes).
3.1.2
Noise model
The continuous amplitude measurement is again given by the additive model
Y k = Xk + Vk ,
where the noise r.v. sequence Vk respects the assumptions considered previously:
• the sequence is i.i.d..
• AN1 (p. 34) – The marginal noise CDF denoted F (v) accepts a PDF denoted f (v).
• AN2 (p. 34) – f (v) is a strictly positive even function that strictly decreases w.r.t. |v|.
3.1. Parameter model and measurement model
109
An additional assumption will be considered on the noise CDF.
Assumption (on the noise distribution):
AN3 F is locally Lipschitz continuous.
A function F (v) is Lipschitz continuous in an interval V, if for every two points v1 and v2 in
V there exists a constant L such that
|F (v1 ) − F (v2 )| ≤ L |v1 − v2 | ,
the function is locally Lipschitz continuous if for every v ∈ R, we can find an interval V ′
containing v such that the function is Lipschitz continuous.
This assumption is required by the method of analysis that will be used to assess the
performance of the proposed algorithm. Most noise CDF considered in practice are Lipschitz
continuous, thus this assumption is generally satisfied.
3.1.3
Adjustable quantizer model
We saw in Ch. 1 and 2 that the quantizer central threshold must be dynamically updated
to obtain a good estimation performance. We will make explicit this feature by imposing the
quantizer to have an adjustable offset bk . For adjusting the amplitude of the quantizer input,
we can also consider that after offsetting the input, we apply an adjustable gain ∆1k . The
quantized measurements at the output of the adjustable quantizer are given by
Y k − bk
.
(3.1)
ik = Q
∆k
By considering dynamic input offset and gain, we can fix the quantizer to have a static structure
with a central threshold that now can be set to zero. Thus, the quantizer thresholds are equal
to the threshold variations. This modifies assumption AQ2 (p. 37).
Assumption (on the quantizer):
AQ2’ The quantizer is symmetric around the central threshold which is equal to zero. This
means that the vector of thresholds τ is given by the vector of threshold variations
⊤
′
′
′
′
′
τ = τ = −τ NI · · · − τ1 0 + τ1 · · · + τ NI
,
2
2
where the threshold variations τi′ form an increasing sequence.
The adjustable quantizer output is given by
Y k − bk
= i sign (Yk − bk ) ,
ik = Q
∆k
for
|Yk − bk | ′
∈ τi−1 , τi′ .
∆k
(3.2)
A scheme representing the adjustable quantizer is given in Fig. 3.1 . Note that even if the
quantizer is not uniform (with constant step-length between thresholds), it can be implemented
using a uniform quantizer with a compander approach [Gersho 1992].
110
Chapter 3. Adaptive quantizers for estimation
Static quantizer
τ2′
Adjustable gain
Yk
τ1′
1
∆k
−
bk
Adjustable offset
0
ik
−τ1′
−τ2′
Figure 3.1: Scheme representing the adjustable quantizer. The offset and gain can be adjusted
dynamically, while the quantizer thresholds are fixed.
Based on the quantizer outputs, the main objective is to estimate Xk . A secondary objective is to adjust the parameters bk and ∆k to enhance estimation performance. As the
estimate X̂k of Xk will be possibly used in real time applications, it might be interesting to
estimate it online. Therefore, we are again interested in solving problems (a) and (b), the
main difference is that now we want to solve (a) for each time index k.
It was observed in the previous chapters that
• when estimating a constant, we can place the central threshold in the last estimate to
have an asymptotically optimal algorithm.
• When estimating a Wiener process, we can place the central threshold at the prediction.
For Wiener process without drift the prediction is exactly X̂k−1 and for Wiener process
with drift the prediction is X̂k−1 + uk .
Based on these observations and for simplification purposes, we will set for all cases bk =
X̂k−1 . Also to simplify, we will consider that the gain is set to be a constant. For the algorithm
presented later, the fact that the offset is set to X̂k−1 will have as a consequence asymptotically
optimal parameters that do not depend on the mean of Xk , thus simplifying the analysis.
Some remarks here are important:
• We will see that imposing the use of bk = X̂k−1 , instead of using the prediction, will
make the algorithm parameters and the performance to be different for Wiener process
with and without drift.
• If we use the prediction, instead of the last estimate, for setting the quantizer offset and
for estimating Xk , then all the results that we will obtain for a Wiener process without
drift will be valid also for the process with drift.
• In the special cases where the optimal central threshold is not the median of the continuous amplitude measurement, we can evaluate the optimal quantizer offset ε⋆ w.r.t. the
true parameter (the point of minimum in the "w" shaped CRB curves) and then add
this value to the offset of the adaptive quantizer bk = X̂k−1 + ε⋆ .
3.2. General estimation algorithm
111
• As most performance results will be given asymptotically, the simplification brought by
1
using a constant gain ∆
can still be partially achieved if we constrain this gain to be
constant after some time or to achieve asymptotically a constant value. In this case,
the analysis of error convergence will have to take into account that the measurement
system varies in time and we must be able also to evaluate its asymptotic value.
1
will be again considered to be variable further in the chapter, where we will
• The gain ∆
estimate jointly a constant Xk and the scale parameter of the noise. In this case, the
gain will not only be variable, but it will also depend on the measurements.
The general scheme for the estimation of Xk is depicted in Fig. 3.2 and the main objective
will be to find the algorithm that will be placed in the block named Update.
Adjustable
Quantizer
Vk
Xk
τ2′
τ1′
Yk
1
∆
0
−τ1′
−
ik
Quantized
measurements
−τ2′
∆
X̂k−1
Update
X̂k
Estimate
Figure 3.2: Block representation of the estimation scheme. The estimation algorithm and the
procedures to set the offset and the gain are represented by the Update block.
3.2
General estimation algorithm
At the end of Ch. 1, we saw that the estimator in the adaptive binary quantization scheme
based on the MLE is asymptotically given by (1.87)
X̂k = X̂k−1 +
ik
,
2kf (0)
whereas at the end of Ch. 2, we saw that the asymptotic expression for the optimal estimator
of a slowly varying Wiener process is (2.51)
fd ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
σw
.
X̂k ≈ X̂k|k−1 − p
P
(i
|X
)|
Iq (0)
k
k X̂k|k−1 =Xk
112
Chapter 3. Adaptive quantizers for estimation
Both asymptotic estimators have low complexity and both are special cases of the following
adaptive algorithm:
"
!#
Yk − X̂k−1
X̂k = X̂k−1 + γk η Q
,
(3.3)
∆
where, γk is a sequence of positive real gains and η[·] is a mapping from I to R
η: I → R
j → ηj ,
(3.4)
n
o
which is characterized by the sequence of NI coefficients η− NI , . . . , η−1 , η1 , . . . , η NI . Notice
2
2
that the coefficients η[·] can be seen as the "estimation equivalent" of the output quantization
levels used in standard quantization theory.
Even if nothing guarantees that the algorithm (3.3) is optimal for finite time, the fact that
it can be equivalent asymptotically to the optimal estimator and that it has low complexity
are strong motivations for using it. Other more intuitive motivations are the following:
• similarly to the binary grid method proposed by [Fang 2008], for a slowly varying or
constant parameter, we can choose the coefficients η[·] in a way that the algorithm will
tend to be around true parameter at least in the mean.
• When estimating a constant, the maximum likelihood estimator can be approximated
by a simpler online algorithm using a stochastic gradient ascent algorithm, which has
the same form as (3.3). It will be shown later that for the optimal choice of ηi , algorithm
(3.3) is equivalent to a stochastic gradient ascent method to maximize the log-likelihood.
• To estimate a Wiener process, an approximate choice of estimator is a Kalman filter like
method based on the quantized innovation, which is also (3.3).
Due to the symmetry of the problem for commonly used noise models, when X̂k is close to
Xk , it seems reasonable to suppose that the corrections given by the output quantizer levels
have odd symmetry with positive values for positive ik . This symmetry will be useful later
for simplification purposes and we will add it to the other assumptions.
Assumption (on the quantizer output levels):
AQ3 The quantizer output levels have odd symmetry w.r.t. i:
ηi = −η−i ,
(3.5)
with ηi > 0 for i > 0.
In the special cases where the threshold must be placed asymmetrically and we put an additional constant value in the quantizer offset (ε⋆ ), the assumption above may lead to an asymptotic estimation bias. For observing this, consider that the quantization offset is already at
the parameter. Then, the mean of the correction η [ik ] will be zero, as the distribution of the
ik is even and ηi is odd. Thus the algorithm is in a mean equilibrium point. As the offset is
3.3. Estimation performance
113
placed ε⋆ away from the estimate, the mean of the estimate has an equilibrium point that is
different from the true parameter.
The non differentiable non linearity Q in (3.3) makes it difficult to be analyzed. Fortunately, an analysis based on mean approximations was developed in [Benveniste 1990] for a
wide class of adaptive algorithms. Within this framework, the function η can be a general
nonlinear non-differentiable function of Yk and X̂k and it is shown that the gains γk that
optimize the estimation of Xk can be chosen as follows:
• γk ∝
1
k
when Xk is constant.
• γk is constant for a Wiener process Xk .
2
• γk is a constant which is proportional to u 3 when Xk is a Wiener process with drift.
Notice that the gains for the constant and Wiener process models given above have the same
form of the asymptotically optimal gains found in Ch. 1 and Ch. 2. The only difference is
2
the gain proportional to u 3 in the case with drift, which reflects the choice of using X̂k−1 in
the place of the prediction.
In the following sections we will consider the gains given above for the algorithm (3.3) and
we will apply the general analysis presented in [Benveniste 1990] to obtain its performance.
3.3
Estimation performance
To obtain the estimation performance, the analysis is separated in
• the analysis of the estimator mean. This gives a rough approximation of the estimator
behavior. With this information we can see if the estimator converges in the mean and
we can also characterize its bias.
• The analysis of the estimation variance. This analysis will give the details on the fluctuation around the mean and it will be obtained, in most cases, asymptotically.
3.3.1
Mean ordinary differential equation
The core of the analysis that we use here and
that
is presented in a general setting in
[Benveniste 1990] is to approximate the mean E X̂k by x̂ (tk ), where x̂ (t) is the solution of
the ordinary differential equation (ODE)
dx̂
= h (x̂) .
dt
The correspondence between continuous and discrete time is given by tk =
(3.6)
k
P
γj and h (x̂) is
j=1
the following:
x − x̂ + V
h (x̂) = E η Q
,
∆
(3.7)
114
Chapter 3. Adaptive quantizers for estimation
where the expectation is evaluated w.r.t. the distribution of V , which is the noise marginal
distribution.
A simple heuristic to obtain the approximation is the following: first we rewrite (3.3) as
"
!#
Xk − X̂k−1 + Vk
X̂k − X̂k−1
=η Q
,
γk
∆
then we consider that the parameter is approximately a constant Xk = x and that X̂k−1 on the
RHS can be approximated by the mean at time k, i.e. X̂k−1 = x̂. Evaluating the expectation
on both sides
E X̂k − E X̂k−1
x − x̂ + Vk
=E η Q
,
γk
∆
we see now that the RHS is h (x̂) and if we consider the algorithm gain as a small time step,
E(X̂k )−E(X̂k−1 )
then
is an approximation of the time derivative.
γk
For the approximation given by the ODE (3.6) to be valid as an approximation of E X̂k
at least after some time k and for using the results from [Benveniste 1990], some conditions
must be satisfied:
• conditions on the Gains. The gains must sum to infinity
∞
X
γk = +∞,
k=1
when they are decreasing, the sum of their power must be finite
∞
X
γkα < +∞,
for some α > 1
k=1
and when they are not decreasing, they must tend to a finite limit
γ∞ = lim γk < +∞.
k→∞
As the cumulative sum of the gains is an equivalent for the time in the ODE approximation, the condition that the sum of the gains goes to infinity is equivalent to say that the
time in the ODE can go to infinity, so that the algorithm does not get "stuck" in time.
The condition on the sum of the powers of the decreasing gains is used to guarantee that
the fluctuations of the estimator will decrease when we want to estimate a constant. The
last condition on the limit of the gains is used to have fixed asymptotic performance
results. We can see that all these conditions are satisfied for the three types of gain
defined previously.
• Conditions on the continuous measurements. For a fixed Xk = x, the continuous measurements Yk form a Markov chain with a unique stationary asymptotic distribution.
This condition is also necessary to have fixed asymptotic results. In the problem considered here, the distribution of the continuous measurements given a fixed parameter x
is the distribution of the noise shifted by the parameter x. As the noise distribution is
i.i.d., the distribution of Yk is stationary for all k, thus clearly respecting this condition.
3.3. Estimation performance
115
• Regularity conditions on h (x̂). The function h (x̂) is locally Lipschitz continuous.
The main point of using the analysis presented in [Benveniste 1990] is that it is not
necessary to have a continuous correction function η. The analysis is mainly based on
replacing the mean of the algorithm by the ODE approximations and then evaluating the
fluctuations around it. This analysis then reposes mainly on h and not on η. For the local
existence, uniqueness and regularity of the ODE solution, we might impose regularity
conditions on h. Also, for evaluating the fluctuations around the ODE solution we might
look to local expansions of h, which then leads naturally to conditions as the one stated
above.
Using the assumptions on the quantizer thresholds and output levels, the expectation in
(3.7) can be written as:
NI
h (x̂) =
2
X
i=1
[ηi Fd (i, x̂, x) − ηi Fd (−i, x̂, x)] ,
where Fd is a difference of CDFs:

F (τ ′ ∆ + x̂ − x) − F τ ′ ∆ + x̂ − x ,
i
i−1
Fd =
F τ ′ ∆ + x̂ − x − F (τ ′ ∆ + x̂ − x) ,
i+1
i
o
n
if i ∈ 1, · · · , N2I ,
o
n
if i ∈ −1, · · · , − N2I .
(3.8)
(3.9)
From assumption AN3, the function h is a linear combination of locally Lipschitz continuous functions, this implies that h is also locally Lipschitz continuous, and the condition
is satisfied.
All the conditions are satisfied in our case, therefore, we can apply the performance results
from [Benveniste 1990].
Mean of the algorithm for estimating a constant
For estimating a constant, the gain of the algorithm is of the form [Benveniste 1990]
γk =
γ
.
k
(3.10)
The ODE is given by (3.6)
dx̂
= h (x̂) ,
dt
with the time given by tk = γ
k
P
j=1
this case, it is valid for large k.
1
j.
The ODE approximation is valid for small gains, so in
The estimation bias after a transient time can be approximated using the ODE above. By
denoting the bias as ε (t) = x̂ (t) − x, the bias ODE is
dε
= h̃ (ε) ,
dt
(3.11)
116
Chapter 3. Adaptive quantizers for estimation
where h̃ (ε) = h (ε + x) is a function that does not depend on the true parameter x (to verify
this, use ε + x in the place of x̂ in the expression for Fd ).
As the function h̃ (ε) depends on a sum of CDF which might not even have analytical form,
it is difficult to find analytical solutions for (3.11). The solution in general can be obtained
using a numerical method, for example a Runge–Kutta method (see [Golub 1991] for details
on numerical solvers).
Even if we cannot obtain in general a characterization of the bias for all k using the ODE,
we can at least analyze what happens asymptotically to the mean of the algorithm.
Asymptotic stability and asymptotic unbiasedness
An interesting point to study is the asymptotic mean convergence of the algorithm. More
precisely, if we prove that ε → 0 as t → ∞ for every ε (0) ∈ R, then we prove that the
algorithm is asymptotically unbiased, as its true mean can be approximated by the ODE. The
convergence in the mean is not only useful for showing that the algorithm indeed works, at
least in the mean, but it is also a requirement for the evaluation of the MSE that will be
presented later.
The fact that ε → 0 as t → ∞ for every ε (0) ∈ R means that ε = 0 is a globally
asymptotically stable point [Khalil 1992]. Global asymptotic stability of ε = 0 can be shown
using an asymptotic stability theorem for nonlinear ODEs. This will require the definition of
an unbounded Lyapunov function of the error. To simplify, a quadratic function will be used:
L (ε) = ε2 ,
(3.12)
which is a positive definite function and tends to infinity when ε tends to infinity.
If h̃ (ε) = 0 for ε = 0 and dL
dt < 0 for ε 6= 0, then by the Barbashin–Krasovskii theorem
[Khalil 1992, p. 124] ε = 0 is a globally asymptotically stable point.
To show that both conditions are met, expression (3.8) can be rewritten as a function of
ε:
NI
h̃ (ε) =
2
X
i=1
i
h
ηi F̃d (i, ε) − F̃d (−i, ε) ,
(3.13)
where F̃d (i, ε) = Fd (i, ε + x, x) is also a function that does not depend on x.
When ε = 0, the differences between F̃d in the sum are differences between probabilities
on symmetric intervals. The symmetry of the noise PDF stated in AN2 and the symmetry of
the quantizer stated in AQ2’ imply that h̃ (0) = 0, fulfilling the first condition.
The second condition can be written in more detail by using the chain rule for the derivative:
dL dε
dL
=
= 2εh̃ (ε) < 0, for ε 6= 0.
(3.14)
dt
dε dt
Thus, h̃ (ε) has to respect the following constraints:
h̃ (ε) > 0, for ε < 0 and h̃ (ε) < 0, for ε > 0.
(3.15)
3.3. Estimation performance
117
When ε 6= 0, the terms in the sum that gives h̃ (ε) are the difference between integrals of
the noise PDF under the same interval size but with asymmetric interval centers. Using the
symmetry assumptions, for ε > 0, F̃d (i, ε) is the integration of f over an interval more distant
to zero than for F̃d (−i, ε), then by the decreasing assumption on f , F̃d (i, ε) < F̃d (−i, ε) and
consequently h̃ (ε) < 0. Using the same reasoning for ε < 0 one can show that h̃ (ε) > 0.
Therefore, the inequalities in (3.15) are satisfied and dL
dt < 0 for ε 6= 0.
Finally, as both conditions are satisfied, one can say that ε = 0 is globally asymptotically
stable, which means that the estimator is asymptotically unbiased for estimating a constant.
Mean of the algorithm for estimating a Wiener process
When we want to estimate a Wiener process, the gain of the algorithm is considered to be a
constant
γk = γ.
In this case, if we consider γ to be a small constant, we can also write the ODE approximation
to the mean with (3.6)
dx̂
= h (x̂) .
dt
Now, the constant x in the expression for h is the mean of the Wiener process (which is also
the mean of the initial condition X0 ) and the time is tk = kγ.
Note that in this case, by imposing a γ sufficiently small the ODE will be valid for all k and
there will be no transient time. Actually, this could also be done for the constant parameter,
but as we will see later, the optimal γ minimizing the asymptotic MSE may not be small for
estimating a constant and it will indeed be small for estimating a Wiener process with small
σw .
The bias ODE is also given by (3.11), therefore, for small γ the algorithm is also asymptotically unbiased in this case.
To show an example for which the ODE approximates well the estimation bias, we simulated the adaptive algorithm for NI = 2 and NI = 4 in the Gaussian noise case. The quantizer
1
= 1, the threshold variations and the output coefficients were chosen to be uniform,
gain was ∆
′
′
τ = [τ1 = 1 τ2′ = 2]⊤ , {η1 = 1, η2 = 2} for NI = 2 and τ ′ = [τ1′ = 1 τ2′ = 2 τ3′ = 3 τ4′ = 4]⊤ ,
{η1 = 1, η2 = 2, η3 = 3, η4 = 4} for NI = 4. The noise scale parameter was chosen to be
δ = 1, the Wiener process increment standard deviation σw = 10−3 and the adaptive gain
γ = 10−3 . We considered the mean of the Wiener process to be E (Xk ) = 0 and the initial
condition of the algorithm was set to be X̂0 = 1. To obtain an estimation of the bias, we
simulated the algorithm 10 times for blocks of 104 samples. For each sample (each index k)
we averaged the error through the different simulations. The solution of the bias ODE (3.11)
was obtained numerically with a Runge–Kutta method with order 4 and 5. The results are
displayed in Fig. 3.3.
118
Chapter 3. Adaptive quantizers for estimation
E (εk )
1
ODE approx. – NI = 2
Sim. – NI = 2
ODE approx. – NI = 4
Sim. – NI = 4
0.5
0
0
0.2
0.4
0.6
Time [k]
0.8
1
·104
Figure 3.3: ODE bias approximation and simulated bias for the estimation of a Wiener process
with the adaptive algorithm. The noise was considered to be Gaussian with δ = 1. Both
NI = 2 and NI = 4 were considered with τ ′ = [τ1′ = 1 τ2′ = 2]⊤ , {η1 = 1, η2 = 2} and τ ′ =
[τ1′ = 1 τ2′ = 2 τ3′ = 3 τ4′ = 4]⊤ , {η1 = 1, η2 = 2, η3 = 3, η4 = 4}. In both cases, the quantizer
input gain was considered to be one. The Wiener process increment standard deviation σw
and the adaptive gain were set to 10−3 . The algorithm was initialized with X̂0 = 1, while the
true mean of the Wiener process was set to zero. To obtain the simulated bias, we simulated
10 realizations of the estimation procedure for blocks with 104 samples. The simulated bias
was obtained through averaging of the simulations. The ODE approximation of the bias was
obtained by solving numerically the ODE (3.11) with a Runge–Kutta method.
We note that the ODE approximation corresponds well to the mean trajectory of the
estimation error. For this specific choice of parameters, which corresponds to the binary
constant step update presented in [Li 2007] and [Fang 2008] and to a multibit extension of it
(when NI = 4), we see that the algorithm can set the mean of the central threshold, which in
this case is also the estimator, at the parameter mean even if the parameter is time-varying.
We also observe that for the choice of simulation parameters used here, the convergence time
of the algorithm for NI = 4 is smaller than the convergence time for NI = 2.
As a final remark on the Wiener process case, when γ → 0, the ODE approximation is
increasingly accurate as the inherent discretization error (from time discretization) decreases
to zero. Also when γ → 0, we get the constant Xk case studied in [Li 2007] and [Fang 2008].
Thus, the proof of asymptotic mean convergence given above is also a proof of convergence of
the fixed step algorithms presented in [Li 2007] and [Fang 2008] and multibit extensions of it,
when the step of the algorithm is small.
Mean of the algorithm for estimating a Wiener process with drift
When the Wiener process has a drift, we consider again that the algorithm has a constant
gain
γk = γ,
3.3. Estimation performance
119
However, in this case as the mean of the parameter is not stationary, we cannot consider the
ODE approximation with a constant x in the function h.
To obtain the ODE, we will use again the heuristic presented above, but in this case, we
will include the dynamical model of the parameter. We start with the expectation of the
increments divided by γ
E (Xk ) − E (Xk−1 )
γ
E X̂k − E X̂k−1
γ
u
,
γ
x − x̂ + Vk
= E η Q
,
∆
=
then we approximate it by a pair of coupled ODEs
dx
dt
dx̂
dt
=
u
,
γ
= h̃ (x̂ − x) ,
where the time for both equations is tk = kγ. Note that the algorithm ODE now depends on
the solution of the parameter ODE. By subtracting both expressions, we have an ODE for the
bias ε
dε
u
= h̃ (ε) − .
(3.16)
dt
γ
As the parameter is now moving deterministically with the drift u, we can assume that most
of the algorithm tracking effort will be done to remove the bias ε. Therefore, the algorithm
must be fast enough to follow the parameter and we must have γ ≫ u. This also makes uγ
to be small, thus if all ηi are not too small, we can find an ε∞ such that h̃ (ε∞ ) = uγ , which
means that ε∞ is an equilibrium point for the bias.
It was shown above that the bias ODE without the forcing term uγ is globally asymptotically
stable, thus for a slowly varying parameter, we can expect that the algorithm will tend to get
close to the true parameter. After a time tk−1 , we can assume that the algorithm is sufficiently
close to the true parameter, so that we can approximate the function h̃ (ε) with a first order
taylor expansion around ε = 0
h̃ (ε) = h̃ (0) + h̃(1) (0) ε + ◦ (ε) ,
where h̃(1) (0) is the derivative of h̃ (ε) with respect to ε evaluated at ε = 0. The ODE can
then be rewritten as
u
dε
= h̃(1) (0) ε − + ◦ (ε) , for t > tk−1 .
(3.17)
dt
γ
For tk sufficiently large we can neglect the ◦ (ε) term. Thus, the bias ODE can be approximated
by a linear ODE. For the linear ODE approximation not to diverge we must impose the
condition
h̃(1) (0) < 0.
(3.18)
Therefore, under this condition the approximate bias will tend to an approximation of the
equilibrium point ε∞ .
120
Chapter 3. Adaptive quantizers for estimation
Before obtaining the asymptotic bias given by the equilibrium point ε∞ , we will verify
condition (3.18). The derivative of h̃ (ε) w.r.t. ε is given by
NI
2
i
h
dh X
=
h̃(1) (ε) =
ηi f˜d (i, ε) − f˜d (−i, ε) ,
dε
(3.19)
i=1
where f˜d (i, ε) is

f (τ ′ ∆ + ε) − f τ ′ ∆ + ε ,
i
i−1
f˜d (i, ε) =
f τ ′ ∆ + ε − f (τ ′ ∆ + ε) ,
i+1
i
o
n
if i ∈ 1, · · · , N2I ,
n
o
if i ∈ −1, · · · , − N2I .
(3.20)
o
n
′
and the
At point ε = 0, f˜d (i, ε) = f˜d (i, 0) for i ∈ 1, · · · , N2I is negative because τi′ > τi−1
noise PDF is strictly decreasing by assumption. For −i, f˜d (−i, 0) has the same absolute value
as f˜d (i, 0) by the symmetry assumptions, but it is positive. Therefore, f˜d (i, 0) − f˜d (−i, 0) =
2f˜d (i, 0) and this difference is always negative. The sum h̃(1) (ε) is then given by
NI
h̃(1) (0) = 2
2
X
ηi f˜d (i, 0)
(3.21)
i=1
and it is also negative, as the output quantizer levels ηi are positive for positive i by assumption.
This means that condition (3.18) is satisfied and the ODE linear approximation will converge
to an equilibrium point. For simplifying the notation, we will use hε in the place of h̃(1) (0)
from now on.
As the system is linear, the equilibrium point will be unique and independent of the initial
condition. We can obtain its expression by setting dε
dt to zero in the ODE approximation. This
leads to the following equation:
u
hε ε∞ − = 0,
γ
for which the solution is
ε∞ =
u
.
γhε
As the bias ODE is an approximation of the true bias, this is equivalent to say that for
small u
u
E X̂k − Xk
≈
.
(3.22)
k→∞ γhε
Note that differently from the constant and Wiener process cases, the estimator is not asymptotically unbiased. Observe also that if uk is not a small constant, but a small amplitude
slowly varying sequence, we could replace u by u (t) in the ODE approximation above and
for each time step (t ∈ [tk , tk + γ)) approximate the varying u (t) by the constant uk . This
would lead to replace u by uk in the bias approximate expression above (3.22) and instead of
considering it as a valid expression for k → ∞, we would say that it is valid for a large k.
3.3. Estimation performance
3.3.2
121
Asymptotic MSE
After characterizing the mean behavior of the algorithm, we must quantify its random fluctuations. For doing so, we will mainly use asymptotic results on the variance of the algorithm.
With the asymptotic bias and the asymptotic variance we can obtain the asymptotic MSE.
The asymptotic MSE is a function of the parameter γ, thus by minimizing it through γ, we
will obtain expressions for the MSE independent of γ.
Asymptotic variance for estimating a constant
Under the condition that the algorithm is asymptotically unbiased, it can be shown using a
central limit theorem, that the normalized estimation error is asymptotically distributed as a
Gaussian r.v. [Benveniste 1990, p. 109]
√ k X̂k − x
the symbol
k→∞
2
N 0, σ∞
,
(3.23)
2 is given by
means convergence in distribution. The asymptotic variance σ∞
2
=
σ∞
γ2R
,
−2γhε − 1
(3.24)
where the term h̃ε is the derivative of h̃ (ε) w.r.t. ε at ε = 0, as it was defined before. The
term
R inthe numerator is the variance of the adaptive algorithm normalized increments
X̂k −X̂k−1
when the mean of the algorithm, which is approximated by the ODE solution
γk
x̂, is equal to x. From the symmetry assumptions on the noise and on the quantizer, the
normalized mean of the increments h (x̂) is zero when x̂ = x. Thus, this variance is given by
the second order moment of the quantizer output levels:
R =
x − x̂ + V
Var η Q
∆
x̂=x
NI
NI
=
2
X
i=1
2
ηi2 Fd (i, x, x) + η−i
Fd (−i, x, x) = 2
NI
2
= 2
X
2
X
ηi2 Fd (i, x, x)
i=1
ηi2 F̃d (i, 0) ,
(3.25)
i=1
where the third equality comes from the symmetry of the quantizer and the noise distribution
and the last equality is obtained using the F̃d notation.
For minimizing the asymptotic variance w.r.t. γ, we must find the positive γ for which
= 0. The expression for the derivative is
2 (γ)
dσ∞
dγ
2 (γ)
dσ∞
2γ 2 hε
2γ
R
=R
+
=
−2γ 2 hε − 2γ ,
2
2
dγ
−2γhε − 1 (−2γhε − 1)
(−2γhε − 1)
122
Chapter 3. Adaptive quantizers for estimation
which equals zero for γ = − h1ε . Note that this gain is positive as hε is negative. By rewriting
the derivative above as
2 (γ)
dσ∞
1
−2Rγhε
γ+
,
=
dγ
hε
(−2γhε − 1)2
we can see that for γ > − h1ε , the derivative is positive and for γ < − h1ε , the derivative is
2 . The optimum gain γ ⋆ and its corresponding
negative, thus γ = − h1ε gives a minimum σ∞
variance are
1
(3.26)
γ⋆ = − ,
hε
R
2
= 2.
(3.27)
σ∞
hε
Note that this result is valid under the condition that the estimator is asymptotically unbiased,
a condition that was shown to be true in the previous subsection.
Asymptotic variance for estimating a Wiener process
The MSE for a varying parameter and a constant adaptive gain can be expressed as a sum of
three terms
nh
i
o2 2
+ ◦ (γ)
MSEk = E [x̂ (tk ) − x (tk )] + E
X̂k − x̂ (tk ) − [Xk − x (tk )]
(3.28)
= ε2 (tk ) + E ξk2 + ◦ (γ) ,
i
h
where ε2 (tk ) = E2 [x̂ (tk ) − x (tk )] and ξk = X̂k − x̂ (tk ) − [Xk − x (tk )]. The first term
ε2 (tk ) is an approximation of the squared bias E2 [εk ]. The second term is an approximation
of the error variance, which can be obtained by evaluating the second order moment of the
total fluctuation of the error ξk . The last term is the error due to the approximations and if
γ is small this term is negligible. As σw is small by assumption, γ must be small for tracking
Xk without large fluctuations, thus this last term is expected to be negligible.
It was shown in the last subsection that the algorithm is asymptotically unbiased, thus
the first term of the decomposition tends to zero as k tends to infinity. As a consequence, the
asymptotic MSE, that we denote MSEq,∞ , depends mainly on the asymptotic characterization
of ξk . Under the conditions that the estimator is asymptotically unbiased and that hε < 0,
which were both shown to be true in the previous subsection, it can be shown [Benveniste 1990,
pp. 130–131]
that ξk tends to be a stationary Gaussian process with marginal distribution
2
N 0, σξ . The asymptotic variance σξ2 is given as a sum of two terms, one produced by the
fluctuations of the estimator itself and equal to
the parameter and equal to
2
σw
−2γhε ,
thus giving
σξ2 =
and leading to the asymptotic MSE
MSEq,∞ =
γR
−2hε
and the other due to the fluctuations of
2
σw
γR
+
−2hε −2γhε
2
γR
σw
+
+ ◦ (γ) .
−2hε −2γhε
(3.29)
3.3. Estimation performance
123
Neglecting the ◦ (γ), we can find the approximately optimal gain by equating to zero its
derivative w.r.t. γ. This gives the equation
2
dMSEq,∞ (γ)
σw
1
−R + 2 = 0,
≈
dγ
2hε
γ
which is zero for γ =
σw
√
.
R
The second derivative can be approximated by
d2 MSEq,∞ (γ)
σ2
≈ − w3 ,
2
dγ
hε γ
2 is positive, the second derivative is positive for positive γ. This means
as hε is negative and σw
σ
that choosing γ = √wR leads to a minimum MSE. Thus,
σw
γ⋆ = √
R
(3.30)
and the corresponding asymptotic MSE is
MSEq,∞
√
σw R
+ ◦ (γ ⋆ ) .
=
−hε
(3.31)
We can express MSEq,∞ as a function of σ∞ given in (3.27). This gives
MSEq,∞ = σw σ∞ + ◦ (γ ⋆ ) .
(3.32)
Observe that both the asymptotic MSE for estimating a Wiener process and for estimating
a constant depend on the quantizer parameters (ηi , ∆ and τ ′ ) through an increasing function
2 , therefore the asymptotically optimal quantizer parameters is the same in both cases.
of σ∞
The only difference in the adaptive algorithm for these two cases is the sequence of gains γk .
Asymptotic MSE for estimating a Wiener process with drift
When the Wiener process has a drift, the MSE can still be written as the sum of three terms
(3.28)
MSEk = ε2 (tk ) + E ξk2 + ◦ (γ) .
Even if γ ≫ u, we still expect it to be small, so that the algorithm is able to reduce the effects
of the measurement noise. Thus, we can still neglect the ◦ (γ).
We will proceed similarly as for the Wiener process without drift. We will evaluate the
asymptotic MSE and then we will obtain the asymptotically optimal gain.
Differently, from the Wiener process without drift, the algorithm is not asymptotically
unbiased and we must use the expression for the asymptotic bias approximation (3.22)
ε∞ =
u
γhε
in the first term of MSEq,∞ . As it is explained in [Benveniste 1990, p. 133], by using γ ≫ u,
the fluctuations of the parameter around its ODE approximation are negligible when compared
124
Chapter 3. Adaptive quantizers for estimation
with the fluctuations of the algorithm. Therefore, we can approximate the asymptotic variance
of the fluctuation by
γR
.
(3.33)
σξ2 ≈
−2hε
Using the bias from (3.22) and the variance from (3.33), we obtain
MSEq,∞ ≈
γR
u2
+
+ ◦ (γ) .
2
2
γ hε
−2hε
(3.34)
To obtain the minimum w.r.t. γ, we must find γ satisfying
dMSEq,∞ (γ)
R
2u2
≈− 3 2 +
= 0.
dγ
γ hε
−2hε
2 1
3
4u
The solution of this equation in the variable γ is γ = −h
. To verify that this value of γ
εR
corresponds to a minimum of MSEq,∞ , we evaluate the second derivative
d2 MSEq,∞ (γ)
6u2
.
≈
dγ 2
γ 4 h2ε
We can verify that this quantity is positive. Therefore,
31
4u2
⋆
γ =
−hε R
(3.35)
and its corresponding asymptotic MSE is
MSEq,∞ ≈ 3
uR
4h2ε
2
3
+ ◦ (γ ⋆ ) .
(3.36)
Note that in practice, u may be unknown and it will be necessary to replace its value in
γ ⋆ by an estimate of it û, which can be also obtained adaptively, for example by calculating
a recursive mean on X̂k − X̂k−1 .
2 with a dependence
The asymptotic MSE in (3.36) can also be rewritten as a function of σ∞
on u
2
u
3
2
σ∞
+ ◦ (γ ⋆ ) .
(3.37)
MSEq,∞ ≈ 3
4
2 .
Also in this case the asymptotic MSE is an increasing function of σ∞
Remark: in the previous subsection, we remarked that if uk is a small amplitude slowly
uk
for large k. Thus, following
varying parameter, the bias could be approximated by εk ≈ γh
ε
the same development and considering that the gains γk can be slowly variable, we have for
large k
31
4u2k
⋆
γk =
−hε R
and the corresponding asymptotic MSE
2
uk R 3
+ ◦ (γk⋆ ) .
MSEk ≈ 3
4h2ε
3.4. Optimal algorithm parameters and performance
3.4
125
Optimal algorithm parameters and performance
Now, we focus on the asymptotically optimal design of the quantizer parameters. From the
previous results, we can see that the asymptotic performance for the three cases is dependent
2 . Also, for the three cases the asymptotic performance deon an increasing function of σ∞
2 . Therefore, the optimal
pends on the quantizer parameters (ηi , ∆ and τ ′ ) only through σ∞
parameters are the same for the three cases.
2 w.r.t. to the quantizer update coefficients
In the next subsections, first we will minimize σ∞
1
. After that, we will present the
ηi , then we will discuss on the choice of the input gain ∆
2
optimal algorithm general form and its corresponding σ∞ . We then discuss on how to optimize
the performance w.r.t. the threshold variations set τ ′ . Finally, we will present the optimal gain
and performance for each of the three parameter models, by considering the optimal update
coefficients. In each case, we will also evaluate the performance loss due to quantization.
3.4.1
Optimal algorithm parameters
Update coefficients (output levels)
2 (3.27), the optiUsing the expressions for hε (3.21) and R (3.25) in the expression for σ∞
mization of the algorithm performance w.r.t. the update coefficients can be written as the
following minimization problem:
argmin
η
R
η ⊤ Fd η
=
argmin
2,
h2ε
η
2 (η ⊤ fd )
(3.38)
where η is a vector with the coefficients
h
i⊤
η = η 1 · · · η NI
,
(3.39)
2
Fd is a diagonal matrix given by
Fd = diag F̃d (1, 0) , · · · , F̃d
NI
,0 ,
2
(3.40)
with diag [] the function that creates a matrix with the input sequence added to the diagonal
of a zero matrix. fd is the following vector
fd = f˜d (1, 0) · · · f˜d
⊤
NI
,0
.
2
(3.41)
The minimization problem is equivalent to the following maximization problem:
2
η ⊤ fd
.
argmax ⊤
η Fd η
η
(3.42)
126
Chapter 3. Adaptive quantizers for estimation
Using the fact that Fd is a positive semidefinite matrix (it is a diagonal matrix with nonzero
diagonal elements), we can rewrite (3.42) as
⊤ 2
1
− 12
2
F d fd
Fd η
argmax
1 ⊤ 1 ,
η
Fd 2 η
Fd 2 η
1
1
the matrices Fd 2 and Fd − 2 are obtained by taking the square root and the inverse of the
square root of the diagonal elements in Fd . Using the Cauchy–Schwarz inequality on the
expression in the numerator gives


2 
1 ⊤ 
1


−


F d 2 fd

 Fd 2 η
≤ fd ⊤ Fd −1 fd
1 ⊤ 1 





Fd 2 η
Fd 2 η


and the equality happens for
1
1
F d 2 η ∝ F d − 2 fd .
Under the assumption that the update coefficients are positive for positive i AQ3 (p. 112),
the optimal η can be chosen to be
η ⋆ = −Fd −1 fd .
(3.43)
2 w.r.t. η is
The minimum σ∞
2
=
σ∞
2 fd
⊤

−1
NI
2
2
1
 X f˜d (i, 0) 
2
=


Fd −1 fd
F̃ (i, 0)
i=1 d
.
(3.44)
We can recognize that the sum above is exactly equal to the FI given in (1.13) when the
central threshold is placed exactly at the parameter x, Iq (0)
NI
Iq (0) = 2
2
X
f˜d2 (i, 0)
i=1
F̃d (i, 0)
.
(3.45)
Choice of the input gain
To simplify the choice of the constant ∆, we can consider that the noise CDF is parametrized
by a known scale parameter δ, which means that
x
F (x) = Fn
,
δ
where Fn is the CDF for δ = 1. In this case the key quantity that appears in the evaluation of
the quantizer output levels is ∆
δ . Thus, the evaluation of the output levels can be simplified
by setting
∆ = c∆ δ,
(3.46)
3.4. Optimal algorithm parameters and performance
127
where c∆ is a constant used to adjust the input gain when the quantizer threshold variation
range is fixed or to adjust the quantization step-length when the threshold variations are
uniform and fixed to a value that cannot be changed.
For given δ, c∆ and Fn , the coefficients do not depend on the true parameter value, neither
on the estimator value, so that they can be pre-calculated and stored in a table. In scalar
form the coefficients are
f˜d (i, 0)
.
(3.47)
ηi⋆ = −
F̃d (i, 0)
Note that for ∆ given by (3.46), ηi depends on δ only through a 1δ multiplicative factor, the
other factor can be written as a function of the normalized PDF and CDF, thus it can be
pre-calculated based only on the normalized distribution.
An interesting observation is that ηi⋆ is given by the score function for estimating a constant
location parameter when considering that the offset is fixed and placed exactly at x, therefore
this algorithm is equivalent to a gradient ascent technique to maximize the log-likelihood that
iterates only one time per observation and sets the offset each time at the last estimate.
Optimal algorithm and general performance for the three cases
Using the ηi⋆ from (3.47) and the assumption on the symmetry of the output levels AQ3, the
adaptive estimator is
X̂k = X̂k−1 + γk sign (ik ) η|i⋆ k | ,
(3.48)
Y −X̂
with ik = Q k c∆ δk−1 .
The asymptotic (γ, ηi )-optimized adaptive algorithm performance is approximated for all
the three cases (for the constant case it is exact) by
MSEq,∞ ≈ ψ [Iq (0)] ,
(3.49)
where ψ is a decreasing function of Iq (0):
• constant: M SEk ≈
1
kIq (0) .
• Wiener process: MSEq,∞ ≈ √σw .
Iq (0)
• Wiener process with drift: MSEq,∞ ≈ 3
u
4Iq (0)
2
3
.
Optimal threshold variations
In the performance given in (3.49), the threshold variations set τ ′ is influent only through
Iq (0). Therefore, for optimizing the algorithm through τ ′ , we will have the same optimization
problem discussed in Ch. 1, namely (1.47)
Iq⋆ = argmax Iq (0) .
τ′
128
Chapter 3. Adaptive quantizers for estimation
In Ch. 1, we saw that this problem is difficult in general. Two alternatives were proposed:
the first one would be to constrain the quantizer to be uniform and then obtain the optimal
quantizer interval step-length. The second would be to consider a general quantizer but with
a very large (tending to infinity) number of quantizer intervals. For the simulated results to
be presented later, Sec. 3.5, we will use the first approach. We consider that the positive
threshold variations are uniform and fixed to be
⊤
′
′
′
′
′
τ = −τ NI = −∞ · · · − τ1 = −1 0 + τ1 = +1 · · · + τ NI = +∞ .
(3.50)
2
2
Then in this case, only c∆ need to be maximized and, as it was stated before, this can be done
using a grid method.
3.4.2
Algorithm performance for optimal gain and coefficients
We now present for each parameter model the optimal adaptive gain γk⋆ and the asymptotic
MSE for the update coefficients η ⋆ . In each case, after evaluating the asymptotic MSE, we
will also evaluate the effect of quantization on the estimation performance. This will be done
by evaluating the performance loss due to quantization Lq defined by
!
MSEq,∞
,
(3.51)
Lq = 10 log10
˜ c,∞
MSE
where MSEq,∞ is the asymptotic MSE for the adaptive algorithm based on quantized measure˜ c,∞ is a quantity related to the asymptotic performance of estimation based
ments and MSE
˜ c,∞ will be specified later for each case. Observe that the
on continuous measurements. MSE
loss Lq is a relative measure and it is expressed in decibels (dB).
Before proceeding to the performance evaluation for each case, we still need to determine
the quantities hε and R for the optimal update coefficients. Using the expression for ηi⋆ (3.47)
in the expression for hε (3.21) and R (3.25), we have
NI
hε = −2
3.4.2.1
2
X
f˜d2 (i, 0)
i=1
F̃d (i, 0)
NI
= −Iq (0) ,
(3.52)
R=2
2
X
f˜d2 (i, 0)
i=1
F̃d (i, 0)
= Iq (0) .
(3.53)
Constant case: gain and performance
Replacing hε given by (3.52) in (3.26) and then the result in (3.10), we have the following
gains
1
γk⋆ =
.
(3.54)
kIq (0)
2 (3.27), we get
Also, replacing (3.52) and (3.53) in the expression for σ∞
2
σ∞
=
1
.
Iq (0)
(3.55)
3.4. Optimal algorithm parameters and performance
129
In practice, this means that, for large k, the MSE will be
MSEk ≈
1
.
kIq (0)
(3.56)
˜ c,∞ can be obtained through the CRB. As the
The continuous asymptotic performance MSE
measurements are independent, the FI for k continuous measurements is k times the FI for
continuous measurements Ic , thus the continuous measurement bound CRBc is
CRBc =
1
.
kIc
(3.57)
The expression for Ic can be obtained by evaluating the expectation E Sc2 , where the score
is given by (1.15)
∂ log f (y − x)
.
Sc (y) =
∂x
Changing variables, Ic is given by the following integral:
Ic =
Z
f (1) (x)
f (x)
!2
f (x) dx.
MSEk
k→∞ CRBc
=
Ic
Iq (0) ,
R
The ratio
MSEq,∞
˜ c,∞
MSE
is then given by lim
Lq = −10 log10
Iq (0)
Ic
(3.58)
leading to the loss
.
(3.59)
130
Chapter 3. Adaptive quantizers for estimation
We have the following solution to problem (a) (p. 27):
Solution to (a) - Adaptive algorithm with decreasing gain
(a3) 1) Estimator
For each time k, the estimate and threshold update is given
by (3.48)
X̂k = τ0,k = X̂k−1 + γk sign (ik ) η|i⋆ k | ,
˜
Y −X̂
with ik = Q k c∆ δk−1 , γk = kIq1(0) and ηi⋆ = − F̃fd (i,0)
.
(i,0)
d
2) Performance (asymptotic)
X̂k is asymptotically unbiased and its bias for large k can
be approximated by ε (tk ), which is the solution of the ODE
(3.11)
dε
= h̃ (ε) ,
dt
where h̃ (ε) = h (ε + x), h is given by (3.7) and the time is
k
P
γj . Its asymptotic MSE or variance is given by (3.56)
tk =
j=1
MSEk ∼
k→∞
1
,
kIq (0)
where Iq (0) is given by (1.13) with ε = 0, representing a loss
of performance w.r.t. the asymptotically optimal estimator
based on continuous measurements of (3.59)
Iq (0)
,
Lq = −10 log10
Ic
with Ic the continuous measurement FI given by (3.58).
3.4.2.2
Wiener process case: gain and performance
Using (3.53) in (3.30), we obtain the optimal constant gain
σw
γ⋆ = p
Iq (0)
(3.60)
and for this gain, the asymptotic MSE is given by substituting (3.55) in (3.32)
σw
MSEq,∞ = p
+ ◦ (σw ) .
Iq (0)
(3.61)
Note that we used the fact that γ ⋆ in this case depends linearly on σw for writing the ◦ term.
3.4. Optimal algorithm parameters and performance
131
The comparison with the continuous case can be done by using the asymptotic BCRB for
˜ c,∞ . The evaluation of the asymptotic BCRB follows in the
continuous measurements as MSE
same line as the one presented in Ch. 2 for estimation based on quantized measurements.
The main difference is that in the continuous case, the FI Ic is independent of the parameter
value, thus E [Ic ] = Ic and we do not need to consider a lower bound on the BCRB. For small
σw (small compared with Ic ) the asymptotic BCRB can be approximated exactly in the same
way as for the lower bound on the MSE for quantized measurements
σw
BCRBc,∞ = √ + ◦ (σw ) .
Ic
(3.62)
The loss of performance, in this case denoted LW
q , is given as follows
LW
q = 10 log10
MSEq,∞
BCRBc,∞

= 10 log10 
√σw
+ ◦ (σw )
Iq (0)
σw
√
+
Ic
◦ (σw )

.
(3.63)
√
We multiply the numerator and the denominator inside the logarithm of (3.63) by σwIc . This
gives

q
◦(σw )
Ic
+
σ
I
(0)
w
q
,

LW
q = 10 log10
◦(σw )
1 + σw
√
where we have assimilated the Ic in the ◦ (σw ) term. Using the first order Taylor expansion
1
around x = 0, 1+x
= 1 − x + ◦ (x), we can obtain
LW
q
Then factorizing
x
ln(10)
q
Ic
Iq (0)
= 10 log10
s
Ic
◦ (σw )
+
Iq (0)
σw
!
.
and using the first order Taylor expansion around x = 0, log10 (1 + x) =
+ ◦ (x), where ln is the natural logarithm, we have
LW
q
= 10 log10
s
Ic
Iq (0)
!
◦ (σw )
= −5 log10
+
σw
Iq (0)
Ic
+
◦ (σw )
.
σw
Note that the first term is half the loss of performance for the constant case
1
◦ (σw )
LW
.
q = Lq +
2
σw
From the definition of the ◦ term we also have
1
lim LW
q = Lq .
σw →0
2
(3.64)
132
Chapter 3. Adaptive quantizers for estimation
This gives the following solution to problem (b) (p. 29) when the parameter is modeled
by a Wiener process without drift:
Solution to (b) - Adaptive algorithm with constant gain for
tracking a Wiener process with small σw .
(b3.1) 1) Estimator
For each time k, the estimate and threshold update is given
by (3.48)
X̂k = τ0,k = X̂k−1 + γ sign (ik ) η|i⋆ k | ,
with ik = Q
Yk −X̂k−1
c∆ δ
, γ = √σw
Iq (0)
˜
and ηi⋆ = − F̃fd (i,0)
.
(i,0)
d
2) Performance (approximated and asymptotic)
X̂k is asymptotically unbiased and its bias can be approximated by ε (tk ), which is the solution of the ODE (3.11)
dε
= h̃ (ε) ,
dt
where h̃ (ε) = h (ε + x), h is given by (3.7) and the time is
tk = kγ. Its asymptotic MSE or variance is given by (3.61)
σw
MSEq,∞ = p
+ ◦ (σw ) ,
Iq (0)
where Iq (0) is given by (1.13) with ε = 0, representing
a loss of performance w.r.t. the asymptotically optimal
estimator based on continuous measurements of (3.64)
Iq (0)
1
◦ (σw )
◦ (σw )
W
Lq = −5 log10
= Lq +
,
+
Ic
σw
2
σw
with Ic the continuous measurement FI given by (3.58)
and Lq the loss of the adaptive algorithm for estimating a
constant.
3.4.2.3
Wiener process with drift case: gain and performance
Replacing the expressions for hε (3.52) and R (3.53) in the expressions for γ ⋆ (3.35) and
MSEq,∞ (3.36), we obtain
1
2
3
4u2 3
u
⋆
γ =
,
(3.65)
MSEq,∞ ≈ 3
+ ◦ (γ ⋆ ) .
(3.66)
2
Iq (0)
4Iq (0)
If u is unknown, it might be estimated. It can be estimated by smoothing the differences
3.4. Optimal algorithm parameters and performance
133
between successive estimates
Ûk = Ûk−1 + γku
h
i
X̂k − X̂k−1 − Ûk−1 .
(3.67)
where γku is a sequence of small positive gains. The estimator Ûk can replace u in the evaluation of the gain and of the asymptotic MSE. If the drift is not constant but slowly varying,
the adaptive algorithm above can also be used. In this case, additional information on the
evolution of the drift might be incorporated in (3.67) to have more precise estimates and get
an adaptive gain closer to the optimal.
For the evaluation of the loss due to quantization, we could use BCRBc,∞ for the continuous measurement performance. However, this would result in an unfair comparison, as
the imposition of using X̂k−1 instead of the prediction is known to be suboptimal. Therefore,
the evaluation of the loss will be done using the approximate performance for an adaptive
algorithm of the same form, but using continuous measurements instead of quantized measurements. The algorithm has the following form:
X̂k = X̂k−1 + γkc ηc Yk − X̂k−1 ,
where γkc and the non linearity ηc (x) are optimized to minimize the asymptotic MSE.
Using the same theory described for the quantized case it is possible to show that the
optimal γkc and ηc (x) are
2 31
f ′ (x)
4u
c,⋆
,
η
(x)
=
−
c
,
γk =
f (x)
Ic2
which exist under the constraint that Ic converges and is not zero and that f ′ (x) exists for
every x.
The MSE can be approximated in a similar way as before
MSEc,∞
u
≈3
4Ic
2
3
.
(3.68)
˜ c,∞ in the evaluation of the loss. Using similar
This asymptotic MSE can be used as MSE
D,
taylor expansions as in the previous Wiener model and denoting the loss in this case by LW
q
we have
2
◦ u 32
◦
u3
Iq (0)
2
20
D
≈ − log10
LW
= Lq +
.
(3.69)
+
2
2
q
3
Ic
3
u3
u3
Note that here the limit result is on u
2
D
= Lq .
lim LW
q
3
u→0
However, note also that hidden in the approximation is the fact that σw must also tend to
zero.
134
Chapter 3. Adaptive quantizers for estimation
We have the following solution to problem (b) (p. 29) when the parameter is modeled by
a Wiener process with deterministic drift:
Solution to (b) - Adaptive algorithm with constant gain for
tracking a Wiener process with small σw and small u.
(b3.2) 1) Estimator
For each time k, the estimate and threshold update is given
by (3.48)
X̂k = τ0,k = X̂k−1 + γ sign (ik ) η|i⋆ k | ,
with ik = Q
Yk −X̂k−1
c∆ δ
,γ=
4u2
Iq2 (0)
1
3
˜
.
and ηi⋆ = − F̃fd (i,0)
(i,0)
d
2) Performance
(approximated and approximated asymptotic)
The estimation bias can be approximated by ε (tk ), which
is the solution of the ODE (3.16)
u
dε
= h̃ (ε) − ,
dt
γ
where h̃ (ε) = h (ε + x), h is given by (3.7), x is the mean of
the Wiener process and the time is tk = kγ. Its asymptotic
MSE or variance is approximated as follows (3.66)
MSEq,∞ ≈ 3
u
4Iq (0)
2
3
2
+ ◦ u3 ,
where Iq (0) is given by (1.13) with ε = 0, representing a loss
of performance w.r.t. the asymptotically optimal adaptive
estimator based on continuous measurements of (3.69)
2
◦ u 32
◦
u3
Iq (0)
20
2
D
+
log
L
+
LW
≈
−
=
,
q
10
2
2
q
3
Ic
3
u3
u3
with Ic the continuous measurement FI given by (3.58)
and Lq the loss of the adaptive algorithm for estimating a
constant.
Observe that the losses for the three models of Xk depend directly on Lq , thus Lq allows
to approximate how much of performance is lost for a specific type of noise and thresholds set
when comparing to the equivalent continuous measurements based algorithm.
3.5. Simulations
3.5
135
Simulations
Now, we are going to check the validity of the results through simulation. We will mainly
focus on obtaining a simulated version of the loss of performance for the three parameter
models and then we will compare the simulated loss with the theoretical one. After that,
we will compare the adaptive algorithm performance with the algorithms presented in the
previous chapters, namely the adaptive MLE scheme for estimating a constant and the PF
with dynamical central threshold for estimating a Wiener process. This comparison will allow
us to know if we lose in estimation performance and what we lose in estimation performance,
when we use the low complexity adaptive algorithm presented in this chapter, instead of the
algorithms presented in the previous chapters.
3.5.1
General considerations
Threshold variations. In what follows the threshold variations are considered to be uniform and given by (3.50)
′
′
τ = −τ NI = −∞ · · · −
2
τ1′
= −1 0
+
τ1′
′
= +1 · · · + τ NI
2
⊤
= +∞ .
Evaluation of Iq (0) and the algorithm parameters
For a given type of noise, supposing that its noise scale parameter δ is known, for a fixed
NI , Iq (0) can be evaluated by using the normalized CDF and PDF, Fn and fn (CDF and
PDF for δ = 1),
in (3.45) (or (1.46)). Using the parametrization ∆ = c∆ δ and the fact that
f (x) = 1δ fn xδ , we have
NI
2
2 X
{fn [(i − 1) c∆ ] − fn [ic∆ ]}2
Iq (0) = 2
.
δ
{Fn [ic∆ ] − Fn [(i − 1) c∆ ]}
(3.70)
i=1
As Iq (0) is now a function of c∆ only, it can be maximized by adjusting this parameter. Being
a scalar maximization problem this can be done by using grid optimization (searching for the
maximum in a fine grid of possible c∆ ). After finding the optimal c⋆∆ , the coefficients ηi = ηi⋆
can be evaluated using the normalized CDF and PDF in (3.47). This gives
ηi⋆ =
1 fn [(i − 1) c⋆∆ ] − fn [ic⋆∆ ]
.
δ Fn ic⋆∆ − Fn (i − 1) c⋆∆
Then, with δ, the optimal Iq (0) and depending on the model, σw or u, we can evaluate
and then all the algorithm parameters are defined.
(3.71)
1
∆,
γk
Discussion on the signal model
Note that it is supposed that the model for Xk is known, as setting γk depends on it. As
a consequence of this assumption, in a real application the choice between the three models
must be clear. When this choice is not clear from the application, it is always simpler to
choose Xk to be a Wiener process, first, because the complexity of the algorithm is lower and
136
Chapter 3. Adaptive quantizers for estimation
second, because supposing that the increments are Gaussian and i.i.d. does not impose too
much information on the evolution of Xk . Still, σw must be known, in practice it can be set
based on prior knowledge on the possible variation of Xk or by accepting a slower convergence
and a small loss of asymptotic performance, it can be estimated jointly with Xk using an extra
adaptive estimator for it.
In the last case, when it is known that the increments of Xk have a deterministic component, the fact that the γk depends on u is not very useful and prior information on the
variations of Xk are not normally as detailed as knowing u itself, making it necessary to accept a small loss of performance to estimate u jointly. The estimation of u can be done using
(3.67) where prior knowledge on the variations of uk can be integrated in the gain γku . If precise
knowledge on the evolution of uk is known through dynamical models, it might be more useful
to use other forms of adaptive estimators known as multi-step algorithms [Benveniste 1990,
Ch. 4].
Discussion on the noise model
The evaluation of the loss and the verification of the results will be done considering two
different classes of noise that verify assumptions AN1, AN2 and AN3, namely, generalized
Gaussian (GGD) noise and noise distributed according to the Student’s-t distribution
(STD). The motivation for the use of these two distributions comes from signal processing,
statistics and information theory.
In signal processing, when additive noise is not constrained to be Gaussian, a common
assumption is that the noise follows a GGD [Varanasi 1989]. This distribution not only contains the Gaussian case as a specific example, but also by changing one of its parameters,
one can model the impulsive Laplacian distribution as well as distributions close to uniform.
In robust statistics, when the additive noise is considered to be impulsive, a general class
for the distribution of the noise is the STD [Lange 1989]. STD includes as a specific case
the Cauchy distribution, known to be heavy-tailed and used intensively in robust statistics.
Also, by changing a parameter of the distribution, an entire class of heavy-tailed distributions
can be represented. When looking from an information point of view, if no prior is used for
the noise, noise models must be as random as possible to ensure that the noise is an uninformative part of the measurement. Thus, noise models must maximize some criterion of
randomness. Commonly used criteria for randomness are entropy measures and both distributions considered above are entropy maximizers. The GGD maximizes the Shannon entropy
under constraints on the moments [Cover 2006, Ch. 12] and the STD maximizes the Rényi
entropy under constraints on the second order moment [Costa 2003].
Both families of distributions are parametrized by a shape parameter β ∈ R+ and a scale
parameter δ. The CDF and PDF of the GGD were given in Ch. 1 by (1.39) and (1.40)
β
x β
exp − ,
fGGD (x) =
δ
2δΓ β1


1 x β
γ
β, δ
1
,
FGGD (x) =
1 + sign (x)
2
Γ 1
β
3.5. Simulations
137
while for the STD, the CDF and PDF are respectively
β+1
Γ β+1
2
1 x 2 − 2
1+
fST D (x) =
,
√
β δ
δ βπΓ β2
(
"
#)
1
β 1
1 + sign (x) 1 − I β
,
,
FST D (x) =
2
2
( x ) +β 2 2
(3.72)
(3.73)
δ
I (, ) is the incomplete beta function
Iw (x, y) =
Zw
0
3.5.2
z x−1 (1 − z)y−1 dz.
Theoretical performance loss due to quantization
The main quantity that must be evaluated before simulating the algorithm is the theoretical
loss Lq . This quantity will not only be useful to check the simulation results, but will also
be useful to observe how the performance evolves as we change the number of quantization
intervals and as we change the noise model.
To evaluate Lq , after evaluating Iq (0) based on the CDF and PDF given above, we also
need to evaluate Ic . The continuous measurement FI for the GGD can be obtained by using
(1.39) in the integral expression (3.58), this gives (Why? - App. A.1.7)
1
β
(β
−
1)
Γ
1
−
β
1
.
(3.74)
Ic,GGD = 2
1
δ
Γ
β
For the STD the continuous measurement FI is given by using (3.72) also in (3.58). Integrating,
we obtain (Why? - App. A.1.8)
1 β+1
.
(3.75)
Ic,ST D = 2
δ β+3
We evaluated the theoretical loss for NI ∈ {2, 4, 8, 16, 32}, which corresponds to NB =
log2 (NI ) ∈ {1, 2, 3, 4, 5} numbers of bits, for shape parameters β ∈ {1.5, 2, 2.5, 3} for GGD
noise and β ∈ {1, 2, 3} for STD noise. The results are shown in Fig. 3.4. As it was intuitively
expected, the loss reduces with increasing NB . It is interesting to note that the maximum
loss, observed for NB = 1, goes from approximately 1dB to 4dB, which represents factors less
than 3 in MSE increase for estimating a constant with 1 bit quantization. Also interesting is
the fact that the loss decreases rapidly with NB , for 2 bit quantization all the tested types of
noise produce losses below 1dB, resulting in linear increases in MSE not larger than 1.3. This
indicates that when using the adaptive estimators developed here, it is not very useful to use
more than 4 or 5 bits for quantization.
The performance for one bit seems to be related to the noise tail. Note that smaller losses
were obtained for distributions with heavier tail (STD in general and GGD with β = 1.5).
This is due to the fact that for large tail distributions a small region around the median of the
138
Chapter 3. Adaptive quantizers for estimation
4
Loss [dB]
3
-
β
β
β
β
1
= 1.5
= 2 (Gaussian)
= 2.5
=3
STD - β = 1 (Cauchy)
STD - β = 2
STD - β = 3
0.8
Loss [dB]
GGD
GGD
GGD
GGD
2
0.6
0.4
1
0.2
0
1
2
3
4
Number of bits [NB ]
(a)
5
0
1
2
3
4
Number of bits [NB ]
5
(b)
Figure 3.4: Adaptive algorithm loss of estimation performance due to quantization of measurements corresponding to the constant case Lq (theoretical). The loss is evaluated for different
types of noise, GGD noise in (a) and STD noise in (b), and different numbers of quantization
bits. For the other models of parameter studied here, the loss is proportional to Lq .
distribution is very informative, thus (as most of the information is contained there) when the
only threshold available is placed close to the median, the relative gain of information is greater
than in the other cases, leading to smaller losses. This can also be the reason for the slow
decrease of the loss for these distributions. As the quantizer thresholds are placed uniformly,
some of them will be placed in the non informative amplitude region and consequently, the
decrease in loss will be not as sharp as in the other cases.
The loss was not shown in Fig. 3.4 for the Laplacian distribution, because for this distribution the adaptive optimal estimator in the continuous case is already an adaptive estimator
with a binary quantizer. One can see this by evaluating the coefficients ηi , which in this case
are constant for positive i showing that only the sign of the difference between the measurement
and the last estimate is important. This behavior of optimality for binary quantization was
already observed in Ch. 1, where we showed that the CRB for binary quantized measurements
can be equal to the CRB for continuous measurements in the Laplacian case. Consequently,
the loss in this case is zero dB for all NB .
3.5.3
Simulated loss
To validate the results, we will simulate the loss of performance. The simulation results will
be presented in the same order as the theoretical results presented in the previous sections.
First the constant case, then the Wiener process case and finally the Wiener process with
drift. All the simulations are done for NB ∈ {2, 3, 4, 5}.
Simulated loss: constant case
In the constant case, the 7 types of noise with previously evaluated Lq were tested, the value
of X0 = x was set to zero and the initial condition of the adaptive algorithm was set with a
small error (X̂0 ∈ {0, 10}). The number of samples was set to 5000 to ensure convergence.
The algorithm was simulated 2.5 × 106 times and the error results were averaged yielding a
3.5. Simulations
139
Loss [dB]
simulated MSE. Based on the simulated MSE a simulated loss was calculated. GGD noise
was simulated using a transformation of gamma variates (How? - App. A.3.2), while STD
noise was simulated using a transformation of independent uniform variates similar to the
transformation used for generating Gaussian variates (How? - App. A.3.5). The results are
shown in Fig. 3.5 for GGD noise and in Fig. 3.6 for STD noise.
1
0.5
β
β
β
β
= 1.5
= 2 (Gaussian)
= 2.5
=3
0
100
101
102
Time [k]
103
(a)
Loss [dB]
0.15
0.1
0.05
0
100
β
β
β
β
= 1.5
= 2 (Gaussian)
= 2.5
=3
101
102
Time [k]
103
(b)
Figure 3.5: Quantization loss of performance for GGD noise and NB ∈ {2, 3, 4, 5} when Xk
is constant. For each type of noise there are 4 curves, the constant losses are the theoretical
results and the decreasing losses are the simulated results, thus producing pairs of curves of
the same type, for each pair the higher results represent lower number of quantization bits. In
(a) results for NB = 2 and 3 are shown. In (b) the results for NB = 4 and 5 are shown. The
simulated results were obtained through Monte Carlo simulation using 2.5 × 106 realizations
of blocks of 5000 error samples, the true parameter value in all simulations was set to zero,
while X̂ was set to have a small initial error (X̂0 ∈ {0, 10}). We used δ = 1 in all simulations.
140
Chapter 3. Adaptive quantizers for estimation
Loss [dB]
1
0.8
0.6
β = 1 (Cauchy)
β=2
β=3
0.4
0.2
0
100
101
102
Time [k]
103
(a)
Loss [dB]
0.4
0.3
0.2
β=1 (Cauchy)
β=2
β=3
0.1
0
100
101
102
Time [k]
103
(b)
Figure 3.6: Quantization loss of performance for STD noise and NB ∈ {2, 3, 4, 5} when Xk
is constant. For each type of noise there are 4 curves, the constant losses are the theoretical
results and the decreasing losses are the simulated results, thus producing pairs of curves of
the same type, for each pair the higher results represent lower number of quantization bits.
In (a) results for NB = 2 and 3 are shown. In (b) results for NB = 4 and 5 are shown. The
simulated results were obtained through Monte Carlo simulation using 2.5 × 106 realizations
of blocks of 5000 error samples, the true parameter value in all simulations was set to zero,
while X̂ was set to have a small initial error (X̂0 ∈ {0, 10}). We used δ = 1 in all simulations.
Remarks:
• note that the losses are independent of δ as both Iq (0) and Ic depend on it through the
same multiplicative constant δ12 .
• The simulated results seem to converge to the theoretical approximations of Lq , thus
validating these approximations. This also means that the variance of estimation tends
in simulation to the CRB for quantized observations kIq1(0) , showing that the algorithm
is asymptotically optimal.
• The convergence time seems to be related to NB (when NB increases, the time to get
closer to the optimal performance decreases).
3.5. Simulations
141
Simulated loss: Wiener process case
For a Wiener process, LW
q was evaluated by setting X̂0 randomly around 0 and X0 = 0, then
4
10 realizations with 105 samples were simulated and the MSE was estimated by averaging
the realizations of the squared error for each instant. As it was observed that the error was
approximately stationary after k = 1000, the sample MSE was also averaged resulting in an
estimate of the asymptotic MSE. Based on the obtained values of the MSE, a simulated loss
was evaluated. The results for the 7 types of noise and σw = 0.001 are shown in Fig. 3.7. As
expected, the results have the same form of the theoretical loss given in Fig. 3.4.
0.5
GGD - β = 1.5
GGD - β = 2 (Gaussian)
GGD - β = 2.5
GGD - β = 3
STD - β = 1 (Cauchy)
STD - β = 2
STD - β = 3
Loss [dB]
0.4
0.3
0.2
0.1
2
3
4
Number of bits [NB ]
5
Figure 3.7: Simulated quantization performance loss for a Wiener process Xk with σw = 0.001,
different types of noise and numbers of quantization bits. The simulated losses were obtained
through Monte Carlo simulation. For each evaluated loss (each symbol on the curves) 104
realizations with 105 samples were simulated. As it was observed that the error is stationary
after k = 1000, the sample MSE was also averaged leading to an estimate of the asymptotic
MSE and consequently of the loss. The simulations were done by setting the initial estimate
randomly around zero (with a Gaussian distribution) and also by setting X0 = 0. In all
simulations, we considered δ = 1.
To verify the results for different values of σw , the loss was evaluated through simulation
also for σw = 0.1 in the Gaussian (GGD with β = 2) and Cauchy cases (STD with β = 1). The
results are shown in Fig. 3.8, where the theoretical losses for these cases are also shown. These
results clearly show that Xk may move slowly to give a performance close to the theoretical
results. However, it is also interesting to note that the simulated loss seems to have the same
decreasing rate as a function of NB when compared with the theoretical results. This means
that the dependence on Iq (0) of the MSE seems to be still correct. Moreover, it indicates that
even in a faster regime for Xk , the threshold variations can be set by maximizing Iq (0).
142
Chapter 3. Adaptive quantizers for estimation
Cauchy - σw = 0.1
Cauchy - σw = 0.001
Gaussian - σw = 0.1
Gaussian - σw = 0.001
Cauchy – Theo.
Gaussian – Theo.
Loss [dB]
1
0.5
0
2
3
4
Number of bits [NB ]
5
Figure 3.8: Comparison of simulated and theoretical losses in the Gaussian and Cauchy noise
cases when estimating a wiener process with σw = 0.1 or σw = 0.001. The simulated losses
were obtained through Monte Carlo simulation. For each evaluated loss (each symbol on the
curves) 104 realizations with 105 samples were simulated. As it was observed that the error
is stationary after k = 1000, the sample MSE was also averaged leading to an estimate of the
asymptotic MSE and consequently of the loss. The simulations were done by setting the initial
estimate randomly around zero (with a Gaussian distribution) and also by setting X0 = 0. In
all simulations, we considered δ = 1.
Simulated loss: Wiener process with drift case
For a Wiener process Xk with drift, Wk was simulated with mean and standard deviations
u = σw = 10−4 , which represents a slow drift with small random fluctuations. The initial
conditions were set to X0 = X̂ = 0 and the drift estimator was set with constant gain
γku = 10−5 . Its initial condition was set to the true u to reduce the transient time and,
consequently, the simulation time. As uk is constant, the loss evaluation was done in the same
form as for Xk without drift, after averaging the squared error through realizations and time.
The results for the Gaussian and Cauchy cases are shown in Fig. 3.9.
The small offset between the simulated and theoretical results is justified by the joint
estimation of u and Xk . Note that keeping γku small allows one to adaptively follow slow
variations in the drift. The convergence to the simulated loss in Fig. 3.9 was also obtained
for simulations including errors in the initial conditions. However, in this case, the transient
regime was very long, indicating that other schemes might be considered when the theoretical
performance is needed in a short period of time.
Note also that if the drift is known, the procedure simulated for tracking Xk is clearly
suboptimal. In this case, we can obtain better asymptotic results by using the prediction
(which includes the drift) in the adaptive algorithm. However, in practice, as we have to
estimate jointly the unknown drift, the simulated algorithm normally has a shorter transient
than the version using the prediction. This is an advantage when the drift can vary in time.
3.5. Simulations
143
Gaussian - Sim.
Cauchy - Sim.
Gaussian - Theo.
Cauchy - Theo.
Loss [dB]
0.4
0.3
0.2
0.1
2
3
4
Number of bits [NB ]
5
Figure 3.9: Comparison of simulated and theoretical losses in the Gaussian and Cauchy noise
cases for estimating a Wiener process with constant mean drift uk = 10−4 and standard deviation σw = 10−4 . The simulation results were obtained with 104 realizations of 105 samples, for
evaluating the simulated asymptotic MSE, the squared error samples were averaged through
the realizations and through the time samples after the transient time (for k > 1000). The
initial estimate value and initial parameter value were both set to zero. The initial value of
the estimate of the drift was also set to the true parameter value to reduce the transient time.
3.5.4
Comparison with the high complexity algorithms
The adaptive algorithms that we propose will be compared with their equivalent counterparts
given in previous chapters. When the parameter is constant, we will compare the adaptive
algorithm with decreasing gain (a3) with the adaptive algorithm based on the MLE (a2.2)
presented in Ch. 1 (p. 69). We will discuss the main differences in terms of performance and
computational complexity.
Adaptive algorithm vs adaptive MLE
Asymptotic performance. Asymptotically both algorithms are equivalent, since they are
asymptotically unbiased and their asymptotic variance is equivalent to kIq1(0) . This means that
for commonly used noise distributions both algorithms are asymptotically optimal under the
unbiasedness constraint. Thus, if there is a difference in performance, this difference might be
found in the transient, before getting close to the asymptotic performance.
Transient performance. The transient for both algorithms is difficult to study analytically.
For the adaptive scheme with decreasing gain, the first few steps will be mainly characterized
by the bias. Unfortunately, the bias approximation given by the ODE approximation cannot
be used in the initial transient as the size of the steps is too large. For the adaptive scheme
based on the MLE, we cannot obtain any result either, as the general behavior of the MLE is
known only asymptotically. Therefore, we will analyze the transient through simulations.
We simulated both algorithms for NI = 8 and two different types of noises, Gaussian
and Cauchy noises. The threshold variations were considered to be uniform with step-length
144
Chapter 3. Adaptive quantizers for estimation
chosen in the same way as for the evaluation and simulation of the losses. For evaluating
the simulated MSE for the transient, we simulated 1000 realizations of the algorithms, each
realization with 50 samples. The noise scale factor used for both cases was δ = 1 and the
parameter and initial estimate were x = 0 and X̂0 = 1. For starting the adaptive scheme based
on the MLE, 10 samples with fixed thresholds were used for obtaining the first estimate. The
algorithm used in the maximization procedure of the MLE was a search algorithm1 . The
results are shown in Fig. 3.10, where we also show the CRB for quantized measurements
when the central threshold is placed at the true parameter CRBq⋆ = kIq1(0) .
MSE
1
CRB⋆q
Sim. – adaptive alg.
Sim. – adaptive MLE
0.5
0
10
20
30
Time [k]
40
50
(a)
MSE
1
CRB⋆q
Sim. – adaptive alg.
Sim. – adaptive MLE
0.5
0
10
20
30
Time [k]
40
50
(b)
Figure 3.10: Minimum CRB and simulated MSE for the adaptive algorithm with decreasing
gain and for the adaptive algorithm based on the MLE. Both algorithms were simulated with
NI = 8, optimal uniform thresholds, Gaussian and Cauchy noise with δ = 1, x = 0 and X̂0 = 1.
For evaluating the transient MSE, 50 samples were simulated 1000 times for each algorithm.
The scheme based on the MLE is started by applying the MLE with samples obtained with
fixed thresholds. The maximization in the MLE is done with a search algorithm1 . In (a),
results for Gaussian noise are shown. In (b), the results for Cauchy noise are shown.
We would expect that the MLE based algorithm would produce better results, as it seems
that we treat the data in an intuitively better way (we maximize the likelihood of the data).
This is indeed the case when we consider Cauchy noise, but the opposite happens when we
test it with Gaussian noise. The decreasing gain algorithm is even slightly below the bound
initially (which is possible only because the algorithm is initially biased). Thus, we cannot
R
1
function fminsearch. We chose this function instead of Newton’s
More precisely we used the MATLAB
method because it can handle non-convex problems. For Cauchy noise the likelihood is not convex.
3.5. Simulations
145
say that one of the algorithms is better than the other.
As the algorithms performance seems equivalent, a practical choice can be done in terms
of complexity.
Complexity. At time k, the adaptive scheme based on MLE must solve a maximization
problem using the last k measurements i1:k . Each measurement produces an additional term
on the log-likelihood to be maximized, thus at time k, the evaluation of the log-likelihood
function itself requires k evaluations of the logarithm of the marginal likelihood. Note that
the marginal likelihood can be very costly to be evaluated as it is a difference of CDF.
For the adaptive algorithm with decreasing gains, the gains can be precalculated and stored
in a table, or they can be obtained by using one division, the update coefficients can also be
precalculated and stored in a table. To generate one estimate the adaptive algorithm then
requires: one search in a table to have the update coefficient, one division or one search in a
table to have the gain, one multiplication to obtain the total correction and one sum to have
the final estimate.
One can conclude that the adaptive algorithm with decreasing gains has far lower complexity requirements when compared with the scheme based on the MLE. Note also that the
adaptive algorithm based on MLE needs a certain number of measurements with fixed (or not
adaptive) thresholds to start. This is due to the fact that the MLE for one measurement is
ill defined and produces estimates equal to +∞ or −∞. Note that it can also happen with
more than one measurement, if all measurements are equal to +1 or if they are all equal to
−1. Thus, for the adaptive algorithm based on MLE we can have realizations with unbounded
values and this will happen especially in the cases when the initial quantizer dynamics is far
away from the parameter. Such behavior will not happen for the adaptive algorithm with
decreasing gains as the update coefficients are bounded above (considering PDF with upper
bounded f˜d and lower bounded from zero F̃d ). Therefore, for practical purposes the choice
between them is clear, we should choose the algorithm with decreasing gains (a3).
Adaptive algorithm vs PF
We compare now the adaptive algorithm (with fixed gain) and the PF procedure for tracking
a Wiener process.
Asymptotic performance for fast parameter evolution. In this case, for any σw , the
PF is known to be optimal if the number of particles tends to infinity. Thus, for a very large
number of particles we expect the PF procedure to be as good as the adaptive algorithm.
Asymptotic performance for slow parameter evolution. When σw is small, the procedures have equivalent asymptotic performance. The PF is approximately unbiased, if we
choose a sufficiently large number of particles and the adaptive procedure is asymptotically
unbiased. Their asymptotic MSE is approximately √σw . Thus, when σw is small, the
Iq (0)
differences, if they exist, will also occur in the transient performance.
146
Chapter 3. Adaptive quantizers for estimation
Transient performance. Similarly to the constant case, we analyze the transient performance through simulation. We simulated both the adaptive algorithm and the PF for NI = 8
and asymptotically optimal uniform quantization. The parameter model was a Wiener process
with increment standard deviation σw = 0.001, with initial standard deviation Var (X0 ) = 0.1
and with initial mean equal to zero. We simulated the algorithms both for Gaussian and
Cauchy noise with δ = 1. For obtaining the simulated transient MSE, 1000 samples were
simulated 2500 times for each algorithm and each noise distribution. The initial estimate for
both algorithms X̂0 was set to zero in all the cases. We used 5000 particles in the PF and its
resampling procedure was triggered each time the number of effective particles was below 50.
The results are shown in Fig. 3.11 where the asymptotically optimal performance ( √σw ) for
Iq (0)
small σw is also presented.
1 · 10−2
Asymp. optimal
Sim. – adaptive alg.
Sim. – PF
MSE
8 · 10−3
6 · 10−3
4 · 10−3
2 · 10−3
0
200
400
600
800
1,000
Time [k]
(a)
Asymp. optimal
Sim. – adaptive alg.
Sim. – PF
MSE
1 · 10−2
5 · 10−3
0
200
400
600
800
1,000
Time [k]
(b)
Figure 3.11: Asymptotic MSE for the optimal estimator of a Wiener process with small σw and
simulated MSE for the adaptive algorithm with constant gain and for the PF with dynamic
central threshold. Both algorithms were simulated with NI = 8, optimal uniform thresholds,
Gaussian and Cauchy noise with δ = 1, σw = 0.001, E (X0 ) = 0, Var (X0 ) = 0.1 and X̂0 = 1.
The evaluation of the transient MSE was done with 2500 simulations of the algorithms for
blocks with 1000 samples. The PF was simulated with 5000 particles and its threshold for the
resampling procedure was set at Nthresh = 50. In (a) results for Gaussian noise are shown,
while in (b) we have the results for Cauchy noise.
In this case the expected results are obtained. The PF, which might be close to optimal
when the number of particles is large, is clearly faster to converge when compared with the
adaptive algorithm.
Complexity. When comparing the complexity of the algorithms the difference is impressive.
3.5. Simulations
147
At time k, for each particle in the PF, a Gaussian r.v. has to be simulated in the prediction
step and its likelihood has to be evaluated. After that, the weighted mean of the particles is
computed. It is then followed by the evaluation of the effective number of particles with a
possible resampling step.
For the adaptive algorithm the complexity is one search in a table, to obtain the update
coefficient, one multiplication with the constant gain and one sum with the previous estimate.
Therefore, one might choose the PF whenever there is no restriction on the complexity
of the algorithm2 . If there is a strong complexity restriction, by paying the price of a slower
convergence, the adaptive algorithm can be a good solution.
3.5.5
Discussion on the results
We summarize the main points observed until now and we will discuss some of them.
• We proposed a low complexity adaptive algorithm to track one of three models, constant,
Wiener process and Wiener process with drift. Under the hypothesis that the noise PDF
is symmetric and strictly decreasing and that the quantizer is also symmetric with its
center placed on the previous parameter estimate, we could prove by using Lyapunov
theory that the algorithm is asymptotically unbiased for the estimation of a constant
and of a Wiener process. We showed that the asymptotic performance for the optimal
update coefficients is a function of the FI Iq (0), which shows that this function plays an
important role in the choice of the threshold variations, as it was also observed in Ch.
1 and 2.
• For the optimal update coefficients, the adaptive algorithm that is obtained is a generalization of the recursive algorithm found at the end of Ch. 1, being exactly equal if we
constrain NI = 2.
In the case of estimating a Wiener process, the adaptive algorithm with optimal update
coefficients is equal to the asymptotic recursive algorithm presented at the end of Ch.
2. Therefore, the adaptive algorithm is a low complexity alternative to the algorithms
presented in Ch. 1 and 2 with equivalent asymptotic performance.
• For testing the results, we considered two different families of noises, generalized Gaussian noises and Student’s-t noises, both tested with uniform quantization. First, we
evaluated the theoretical loss of performance due to quantization w.r.t. the continuous
measurement equivalent estimator for different numbers of quantization intervals. The
results indicate that with only a few quantization bits (4 and 5) the adaptive algorithm
performance is very close to the continuous measurement case and it was observed that
uniform quantization seems to penalize more estimation performance under heavy tailed
distributions.
• Estimation in the three possible scenarios was simulated and the results validated the
accuracy of the theoretical approximations.
2
Note that the number of particles necessary to have close to optimal performance can be reduced by using
the optimal proposal distribution, thus reducing complexity. This can have an impact on the choice of the
algorithm when the restriction on complexity is not strong.
148
Chapter 3. Adaptive quantizers for estimation
In the constant case, it was observed that the algorithm performance was very close to
the Cramér–Rao bound.
In the Wiener process case it was observed that the theoretical results are very accurate
for small increments of the Wiener process and in the drift case it was seen that by
accepting a small increase in the MSE it is possible to estimate jointly the drift.
• As the algorithms are asymptotically equivalent in performance to the adaptive scheme
based on the MLE in the constant case and to the PF in the Wiener process case, we
simulated their transient performance, to see if we lose in performance and how much
we lose by using the low complexity approach.
In the constant case, we cannot say that the adaptive scheme based on the MLE is
better, thus in practice, the adaptive algorithm with decreasing gain might be used as
it requires far lower complexity.
In the Wiener process case, the PF is superior to the adaptive algorithm with constant
gain, thus if no complexity constraints are considered, we might use the PF. If we have
strong complexity constraints, by accepting a slower convergence, the adaptive algorithm
gives a good solution.
• An interesting link between standard quantization and the adaptive algorithm for tracking the Wiener process can be observed. In the binary case, the adaptive algorithm
proposed here is similar to delta modulation [Gersho 1992, p. 214], the difference is that
here we do not use the quantization noise approach for obtaining its performance and
we also consider the effect of the measurement noise on the final performance.
When NI > 2 the algorithm that we propose can be seen as a form of predictive quantization intended for estimation and not for reconstruction of the measurements.
• Another interesting result is that a varying parameter has a loss of performance due to
quantization smaller than the loss for a constant parameter, thus a type of dithering effect
seems to be present. In this case, the variations of the input signal brings the tracking
performance of the estimator closer to the continuous measurement performance.
• The fact that the number of quantization bits does not influence much the performance
of estimation leads to conclude that it seems more reasonable to focus on using more
sensors than using high resolution quantizers for increasing performance. Consequently,
this motivates the use of sensor network approaches. An approach of this type will be
presented in Subsec. 3.6.2.
• As in practice sensor noise scale parameter and Wiener process increment standard
deviation can be unknown and slowly variable, it would be also interesting to study how
the algorithm design and performance would change by estimating all these parameters
jointly.
We will study the joint estimation of the constant x and the scale parameter in Subsec.
3.6.1. The joint estimation of σw in the Wiener process case will lead to a scheme similar
to delta modulation with variable gain, this is left for future work.
3.6. Adaptive quantizers for estimation: extensions
3.6
149
Adaptive quantizers for estimation: extensions to locationscale estimation and to the multiple sensor approach
We present now the two extensions discussed in the previous section.
The first extension that will be presented is the joint estimation of the unknown noise scale
factor. We will see that the adaptive estimation of x does not change, the only thing that
changes is the addition of the adaptive estimator of the scale parameter δ. We will also see
that the fact that we do not know the scale parameter value does not degrade the asymptotic
estimation performance when compared with the location-only estimation problem.
We then present the multiple sensor approach based on a fusion center architecture. We
will see that the optimal correction of the adaptive algorithm based on multiple quantized
measurements from different sensors will be simply a weighted sum of their corrections in the
single sensor case.
3.6.1
Joint estimation of location and scale parameters
We start by stating the problem and defining the adaptive estimator. In a second step, we look
for its performance and we optimize the algorithm in a similar way as it was done previously.
We find the optimal adaptive gain, i.e. the optimal adaptive gain matrix. The optimal update
coefficients are obtained in a third step. At the end of the section, we present some simulations
and we discuss the results.
Problem statement and estimator
are quantized with
We consider that a sequence of i.i.d. r.v. Yk with marginal CDF Fn y−x
δ
an adjustable quantizer (Fn () is the noise CDF for δ = 1), resulting in a sequence of discrete
measurements i1:k . The pair of parameters (x, δ) is unknown and the objective is to estimate
it based on the quantized measurements. This is equivalent to the following modification of
problem (a) (p. 27):
(a’) Solve problem (a) when the noise scale parameter δ is unknown
and must be estimated jointly with x.
Observe that this problem is a joint location-scale estimation based on quantized measurements.
The adjustable quantizer is given by (3.1), where for enhancing the estimation performance,
we set the offset and the input gain to be
bk = X̂k−1 ,
∆k = c∆ δ̂k−1 .
(3.76)
Note that the main difference with the adjustable quantizer used previously is the use of the
last scale parameter estimate for setting the input gain.
150
Chapter 3. Adaptive quantizers for estimation
The adaptive estimation algorithm can be extended to include the joint estimation of the
scale parameter. The extended version is
"
# "
#
Γ
X̂k
X̂k−1
ηx (ik )
=
+ δ̂k−1
,
(3.77)
ηδ (ik )
k
δ̂k
δ̂k−1
where
io of gains,
n h ηx [i]i and ηδ [i]h are
iosequences of NI update coefficients
n
h Γ isi a 2 × 2 hmatrix
NI
NI
NI
NI
and ηδ − 2 , . . . , ηδ 2
.
η x − 2 , . . . , ηx 2
The advantages of this extended version are the following:
• it is still a low complexity algorithm, requiring only a few operations more than the
initial adaptive algorithm.
• It is an online algorithm. Making it possible for real-time applications to have access to
the recent estimates at any time k.
• Its performance can also be studied using the general results from [Benveniste 1990].
The noise and quantizer follow the assumptions AN1–AQ1, AQ2’ and AN3. For simplification purposes and to have a stable algorithm, we will assume that both ηx [i] and ηδ [i] are
symmetric, ηδ [i] have even symmetry with negative3 ηδ [1] = ηδ [−1], while ηx [i] are defined
with odd symmetry and they are positive for positive i, similarly as stated in AQ3.
Assumption (on the quantizer output levels):
AQ3’ The quantizer output levels ηx [i] are odd and the output levels ηδ [i] are even.
ηx [i] = −ηx [−i] ,
ηδ [i] = ηδ [−i] ,
(3.78)
with ηx [i] > 0 for i > 0 and ηδ [1] < 0.
The estimation scheme is depicted in Fig. 3.12, where the UPDATE block is the estimation
algorithm.
3
This constraint on ηδ [1] is imposed to guarantee the convergence of δ̂k . The idea here is that when the
quantized measurements are small, it means asymptotically (when X̂k is close to x) that the quantizer range
is too large, thus the range and, consequently, δ̂k must be reduced. If we set the coefficients with the opposite
sign, δ̂k will diverge.
3.6. Adaptive quantizers for estimation: extensions
Adjustable
Quantizer
151
τ2′
τ1′
Yk
1
c∆ δ̂k−1
0
−τ1′
−
ik
Quantized
measurements
−τ2′
δ̂k−1
X̂k−1
UPDATE
X̂k
Estimate
Figure 3.12: Scheme representing the adjustable quantizer. The offset and gain are adjusted
dynamically using the estimates while the quantizer thresholds (the threshold variations) are
fixed.
Optimal parameters and performance
The analysis of the algorithm will be done using the results from [Benveniste 1990, Ch. 3].
We will analyze the bias and the asymptotic covariance matrix of the estimation error.
Similarly to the estimation of the constant location parameter, the algorithm mean can
be approximated by the solution of an ODE. However, in this case, we have a vectorial ODE
with one component for x̂ and one component for δ̂:
d x̂
= Γh x̂, δ̂ .
(3.79)
dt δ̂
The relation between continuous and discrete time is tk =
k
P
j=1
vector field:
h x̂, δ̂
1
j
and h is the following mean

δ̂ηx Q y−x̂
c∆ δ̂  =
= E
δ̂ηδ Q y−x̂
c∆ δ̂
 N
I
n o
2
P

F̃
η
[i]
i,
x̂,
x,
δ̂,
δ
−
F̃
−i,
x̂,
x,
δ̂,
δ
x
d
d

i=1
= δ̂ 
N
 I
n o
2
 P
ηδ [i] F̃d i, x̂, x, δ̂, δ + F̃d −i, x̂, x, δ̂, δ

i=1
(3.80)



,


where the expectation is w.r.t. to the noise marginal probability measure, the second equality
comes from the symmetry assumptions and F̃d is
 n
o
Fn τi c∆ δ̂ + x̂−x − Fn τi−1 c∆ δ̂ + x̂−x , if i ∈ 1, · · · , NI ,
δ
δ
2
δ
n
o
δ
F̃d =
(3.81)
Fn τi+1 c∆ δ̂ + x̂−x − Fn τi c∆ δ̂ + x̂−x , if i ∈ −1, · · · , − NI .
δ
δ
δ
δ
2
152
Chapter 3. Adaptive quantizers for estimation
The conditions on the mean convergence of the algorithm are then conditions on the global
asymptotic stability of the point x̂ = x and δ̂ = δ. One necessary condition for asymptotic
stability
true parameters must be an equilibrium point of the ODE, which means
is that the that h x̂ = x, δ̂ = δ must be zero. From the symmetry assumptions:
h x̂ = x, δ̂ = δ =
where the vector Fvec
d is
Fvec
d
0
⊤
2η δ Fvec
d
= F̃d [1] · · · F̃d
NI
2
⊤
,
,
with elements Fd [i] = Fd (i, x, x, δ, δ) independent of the parameters. Then, the condition for
the parameters to be the equilibrium point is
vec
η⊤
δ Fd = 0.
(3.82)
Other conditions are necessary for the mean convergence of the algorithm. These conditions
can be found by the analysis of the ODE using Lyapunov theory. The analysis of these
other conditions will not be detailed here and under the assumptions already stated and the
constraint on η δ given in (3.82), it will be assumed that the algorithm converges in the mean
to the true parameters.
We turn our attention now to the asymptotic fluctuation of the algorithm, which is given
by its asymptotic covariance matrix. Under the assumptions stated previously (assumptions
AN1–AQ3’ and the assumption that the algorithm is asymptotically unbiased),
√ it can be
shown [Benveniste 1990, pp. 110–113] that the normalized estimation error kεk tends in
distribution to a zero mean Gaussian random variable as follows
√
kεk
N (0, P) ,
(3.83)
k→∞
where P is the covariance matrix given by the optimal gain Γ⋆ . The matrices P and Γ⋆ are
the following:

 T
η F ηx
x d 0
(x) 2

ηT
δ2 
x fd


(3.84)
P=
TF η
η


d δ
2
2
δ
0
(δ)
ηT
δ fd
and

1
Γ⋆ = − 
2
1
(x)
ηT
x fd
0
0
1
(δ)
ηT
δ fd

,
(3.85)
h i
(x)
˜(x) [1] · · · f˜(x) NI ]T and f (δ) =
where Fd is a diagonal matrix Fd = diag [Fvec
],
f
=
[
f
d
d
d
d
d
2
h i
(δ)
(δ)
N
T
I
[f˜d [1] · · · f˜d
] are the derivatives in vector form of the quantizer output probabilities
2
F̃d i, x̂, x, δ̂, δ multiplied by δ̂ when x̂ = x and δ̂ = δ:
(x)
f˜d [i] = fn (τi ) − fn (τi−1 ) ,
(δ)
f˜d [i] = c∆ [τi fn (τi ) − τi−1 fn (τi−1 )] .
(3.86)
(3.87)
3.6. Adaptive quantizers for estimation: extensions
153
These results are obtained in an equivalent way as the results presented for the estimation
of x. But in this case Γ⋆ is not the inverse of the scalar derivative
of −h, but instead is the
inverse of the Jacobian matrix of −h evaluated at the point x̂ = x, δ̂ = δ . In the same way,
the normalized
for the optimal gain is the normalized covariance of the vector of
covariance
ηx (ik )
corrections
pre and post-multiplied by the inverse of the Jacobian of h, with all
ηδ (ik )
the factors being evaluated at x̂ = x, δ̂ = δ . The specific diagonal pattern of the Γ⋆ and P
comes from the symmetry assumptions on the noise and the quantizer.
Minimization of the estimation variance can be done through the minimization of the
diagonal terms of P w.r.t. η x and η δ . The two minimization problems can be solved separately.
In the case of the optimization w.r.t. η δ , the equilibrium constraint (3.82) has to be taken
into account. The optimal η x can be found by using the Cauchy-Schwarz inequality, while
the optimal η δ are obtained by casting the constrained minimization problem as a modified
eigenvalue problem solved in [Golub 1973] (Why? - App. A.1.9).
The optimal coefficients are
(x)
η x ∝ F−1
d fd ,
(δ)
(δ)
η δ ∝ F−1
d fd − 1fd
(δ)
= F−1
d fd ,
where 1 is a squared matrix with ones. The second equality comes from the fact that the sum
(δ)
of fd is zero. To respect the assumptions we can set
(x)
(3.88)
(δ)
(3.89)
η x = −F−1
d fd ,
η δ = −F−1
d fd .
Therefore, the optimal P and Γ⋆ are

2
δ

P = δ 2 Γ⋆ =
2
1
(x)T
fd
0
(x)
F−1
d fd
0
1
(δ)T
fd
(δ)
F−1
d fd

.
(3.90)
Note that the asymptotic variances are equal to the CRB for estimating the parameters based
on the quantized measurements, when the quantizer offset and input gain are placed exactly
at x and c∆1 δ .
154
Chapter 3. Adaptive quantizers for estimation
We have the following solution to problem (a’) (p. 149):
Solution to (a’) - Adaptive algorithm with decreasing gain
for estimating x and δ
(a’1) 1) Estimator
For each time k, the estimate, the quantizer offset and the
quantizer input gain are obtained using (3.77)
# "
"
#
# "
τ0,k
Γ
X̂k
X̂k−1
ηx (ik )
=
+ δ̂k−1
,
=
∆k
ηδ (ik )
k
δ̂k
δ̂k−1
c∆
with ik = Q
ηx (ik )
ηδ (ik )
Yk −X̂k−1
c∆ δ̂

(x)
f˜d [ik ]
, Γ=

 − F̃d [ik ] 
=
.
(δ)
f˜ [i ]
− F̃d [i k]
1
2


1
(x)T
fd
0
(x)
F−1
d fd
0
1
(δ)T
fd
(δ)
F−1
d fd

 and
d k
2) Performance (assumed and asymptotic)
The estimator is assumed to be asymptotically unbiased.
√
When k → ∞ the normalized estimation error vector kεk
is Gaussian distributed with covariance matrix P given by
(3.90)


1
0
T −1 (x)
2
(x)
δ  fd F d f d
.
P=
1
0
2
(δ)T −1 (δ)
fd
F d fd
Observe also that the asymptotic performance can still be optimized through τ ′ and c∆ .
As optimization through τ ′ is difficult, in the simulation section we will consider again that
the thresholds variations are uniform as in (3.50)
′
′
τ = −τ NI = −∞ · · · −
2
τ1′
= −1 0
+
τ1′
′
= +1 · · · + τ NI
2
⊤
= +∞ ,
thus the only free parameter for optimization is c∆ .
Simulations
The algorithm will be simulated to validate the theoretical results. The simulation will be
focused on the performance for the estimation of x. As it was mentioned, the quantizer is
uniform and c∆ will be chosen so as to minimize the variance of estimation of x. As this is a
scalar problem, it can be solved by an exhaustive search using a fine grid. After finding the
3.6. Adaptive quantizers for estimation: extensions
155
optimal c∆ , the other parameters of the algorithm Γ, η x and η δ can be evaluated using the
information from the noise distribution.
The Gaussian and Cauchy distribution will be used for modeling the noise. The algorithm
will be simulated for 5 × 105 blocks with 4 × 104 samples each. The simulated MSE for the
estimation of the location parameter will be evaluated by calculating the mean of the squared
error for each sample. Other simulation parameters are δ = 1, δ̂0 = 2, x = 0, X̂0 = 1 and
NI ∈ {4, 8, 16, 32}. For comparison purposes, the CRB for the estimation of x based on
continuous measurements CRBc will be also evaluated for Gaussian and Cauchy distributions.
Using the fact that the measurements are independent and the expressions for Ic for the GGD
given in (3.74) with β = 2 and for the STD given in (3.75) with β = 1, the CRBc for Gaussian
2
2
and Cauchy noise are respectively 12 δk and 2 δk .
The results of the simulation are shown in Fig. 3.13, where we also plotted the CRB for
the estimation with quantized measurements when the offset and gain are static and set with
the true parameter values. The MSE was normalized by k and the logarithm scale is used in
both axis for better visualization.
It can be observed that after a transient time, the simulated performance becomes very
close to the asymptotic theoretical results, also it can be seen that the gain in performance
when increasing NI is very small even for a small number of quantization intervals (NI = 8
or 16) and that the gap between the performance given by NI = 32 and the continuous
measurement bound is negligible.
Discussion on the results
Despite the very low complexity of the algorithm, its asymptotic performance for estimating
the parameters is not only decoupled (the covariance is diagonal) but it is also optimal. The
normalized asymptotic variance for estimating x is Iq1(0) and the variance for estimating δ is
also the inverse of the corresponding FI. This optimal decoupling means that no degradation
of performance is brought by estimating jointly the scale parameter. As no degradation is
present, the asymptotic performance of the estimator of x has the same behavior as it was
shown previously, if we choose NI = 4 or 5 the estimation performance is very close to the
optimal continuous measurement performance. This indicates that even when δ is unknown
there is no need to use high resolution quantizers, if we have a large number of samples.
3.6.2
Fusion center approach with multiple sensors
We present now the adaptive algorithm for estimating a constant parameter, when a fusion
center has access to quantized measurements from multiple sensors. We will define first, the
problem, the architecture to be used and the adaptive estimator. Then, similarly to the joint
location-scale problem, we obtain the algorithm performance and its optimal parameters. We
close this section with simulations and a discussion about our results.
156
Chapter 3. Adaptive quantizers for estimation
10−0.25
MSE × k
CRBc
CRB⋆q
Sim. – adapt. alg.
10−0.3
100
101
102
Time [k]
103
104
103
104
(a)
MSE × k
100.6
100.4
CRBc
CRB⋆q
Sim. – adapt. alg.
100.2
100
101
102
Time [k]
(b)
Figure 3.13: CRB for estimating a location parameter of Gaussian and Cauchy distributions
based on quantized and continuous measurements and simulated MSE for the estimation of
the location parameter with the adaptive location-scale parameter estimator. In all cases, we
considered the true scale parameter and its initial estimate δ = 1, δ̂0 = 2, for the location
parameter we considered x = 0, X̂0 = 1. The numbers of quantization intervals simulated
were NI ∈ {4, 8, 16, 32}. For obtaining the simulated MSE for the location parameter, the
algorithm was simulated for 5 × 105 blocks with 4 × 104 samples each. The curves that are
asymptotically lower are related to a higher number of quantization intervals.
Problem statement and estimator
The scalar parameter is supposed to be a constant x and it is measured by Ns sensors. Each
sensor measures the parameter with additive noise
(j)
Yk
(j)
= x + Vk ,
for j ∈ {1, · · · , Ns } ,
(3.91)
(j)
where Vk is the noise r.v. for the sample k obtained at the sensor j. The sensor noises are
independent and each sensor noise is i.i.d.. The noise r.v. also respects assumptions AN1,
AN2 and AN3. Its marginal CDF for sample k of sensor j will be denoted as F (j) (v) and its
PDF as f (j) (v).
The measurements at each sensor are quantized by a scalar adjustable quantizer, similar
to the quantizer used in the previous sections. The quantizers for the sensors are then char(j)
acterized by their input gains 1(j) , input offsets bk and the vector of threshold variations
∆k
3.6. Adaptive quantizers for estimation: extensions
157
(j)
(considered to be static) that defines the NI quantizer intervals
(j)
(j)
(j)
(j)
(j)
(j)
τ ′ = τ ′ NI · · · τ ′ −1 τ ′ 0 τ ′ 1 · · · τ ′ NI .
−
2
2
We will consider again the following assumptions:
• AQ1 on the quantizer outputs: theset of possible quantizer outputs of the sensor j is
I (j) =
(j)
(j)
−
NI
2
, · · · , −1, 1, · · · ,
NI
2
.
• AQ2 on the quantizer threshold variations: the quantizers have symmetric threshold
(j)
(j)
(j)
(j)
variations τ ′ i = −τ ′ −i with τ ′ 0 = 0 and τ ′ NI = +∞.
2
The output of quantizer j is then given by
(j)
ik = Q(j)
(j)
Yk
(j)
− bk
(j)
∆k
!
(j)
= i sign Yk
(j)
− bk
, for
(j)
(j) Y k − bk (j)
∆k
h
(j)
(j)
∈ τi−1 , τi
.
(3.92)
The noise CDF are considered also to have a known scale parameter δ (j) . Therefore, similarly
to what was done before, we can use the noise scale factor to normalize the input of the
quantizer
(j)
(j)
∆k = c∆ δ (j) ,
(3.93)
where c∆ is a free parameter which, as it was explained before, can be used to adjust the
quantizer input range or to optimize quantization performance when the threshold variations
are fixed.
After obtaining the quantized measurements, the sensors send their measurements to a
fusion center. The transmission of the quantized measurements is supposed to be perfect,
as it was explained in the Introduction. The fusion center can feedback information to the
sensors through perfect continuous amplitude channels. Thus, we want to solve the following
modification of problem (a) (p. 27):
(a”) Solve problem (a) with independent quantized measurements
from Ns sensors. The measurements from the Ns sensors are
available at a fusion center that can process these measurements and feedback information to the sensors through perfect
continuous amplitude channels.
Note that the simplifying assumption of perfect feedback channels means that the fusion
center has enough power and/or band for feedbacking real (or very finely quantized) noiseless
estimates.
To solve problem (a”), the fusion center generates an online estimate X̂k that will be
broadcasted to the quantizers through the feedback channels, so that they can use it as their
next input offset for enhancing estimation performance. At time k, this means that
(j)
bk = X̂k−1 .
(3.94)
158
Chapter 3. Adaptive quantizers for estimation
(1)
Vk
(1)
Quantizer 1
ik
Sensor 1
(2)
Vk
(2)
x
Quantizer 2
ik
Sensor 2
ik
UPDATE
X̂k
Fusion center
(Ns )
Vk
(Ns )
Quantizer Ns
ik
Sensor Ns
Figure 3.14: Scheme representing the sensor network. The fusion center updates the estimate
of the parameter and broadcasts it through a perfect channel to the sensors. The sensors then
use the new estimate as their quantizer input offset (their quantizer central threshold).
The general scheme is depicted in Fig. 3.14, where the UPDATE block contains an online
estimator of the parameter.
For estimating the parameter, we can use an extension of the adaptive algorithm with
decreasing gains
X̂k = X̂k−1 +
γ
η (ik ) ,
k
(3.95)
h
i
(1)
(N ) T
where γ is a positive gain, ik is the vector of quantized observations ik · · · ik s
and
η [i] is the update
coefficient (or the quantizer output level) defined as a function from
(1)
(N
)
s
I ,··· ,I
to R. The main advantage of this algorithm when compared with an adaptive scheme based on the MLE is its low complexity both in terms of processing and memory
requirements.
Optimal parameters and performance
Using the results from [Benveniste 1990, Ch. 3], the asymptotic variance of the estimation
error can be obtained under the condition that the mean error converges to zero as k → ∞.
To prove this convergence, it would be sufficient to use the ODE approximation of the mean
of X̂k and then prove global convergence properties for the ODE using Lyapunov theory. Such
analysis is left for future work. Here, only the mean behavior of the algorithm at equilibrium
(X̂k = x) will be studied.
3.6. Adaptive quantizers for estimation: extensions
159
When X̂k−1 = x, the normalized mean increment γk E X̂k − X̂k−1 is given by
k E X̂k − X̂k−1 = E [η (i)] = η T Fvec
d ,
γ
(3.96)
where η is a vector regrouping all possible values of the output coefficients
!
!#T
"
η= η i
−
h
and Fvec
d = · · · F̃d [i] · · ·
N
i⊤
(1)
I
2
, ··· , i
−
N
· · · η i N (1) , · · · , i N (Ns )
(Ns )
I
2
I
I
2
2
with
F̃d [i] =
Ns
Y
(j)
F̃d
j=1
h
i
i(j) ,
(3.97)
(j) (j) i
is the probability of having the output i(j) at the sensor j when X̂k = x:

(j)
N

(j)
(j)
(j)
(j)
(j)
(j)
(j)
(j)
(j)
I

τi c δ δ
−F
,
τi−1 cδ δ
, if i ∈ 1, · · · , 2
F
h i
(j) (j)
F̃d i
=
(j)
NI

(j) (j) (j)
(j) (j) (j)
(j)
(j)
(j)

.
τi+1 cδ δ
−F
τi c δ δ
, if i ∈ −1, · · · , − 2
F
where F̃d
(3.98)
Thus, the following condition is needed to have an equilibrium point at the true parameter:
η T Fvec
d = 0.
(3.99)
Note that this is a necessary condition for asymptotic unbiasedness of the algorithm.
Assuming that the algorithm is asymptotically unbiased, similarly to the single sensor
case, we can use the results in [Benveniste 1990, pp. 110–113] to obtain the asymptotic distribution of the estimation error, the optimal gain γ ⋆ and the minimum normalized asymptotic
2 . The asymptotic estimation error is Gaussian distributed and it
estimation error variance σ∞
is given as follows
√
2
kεk
N 0, σ∞
.
(3.100)
k→∞
The optimal γ and minimum
2
σ∞
are then given by
γ⋆ = −
and
2
=
σ∞
1
η T fd
ηT F dη
(η T fd )2
(3.101)
(3.102)
.
vec
The matrix F d is a diagonal matrix diag [Fvec
d ] and fd is the vector form (as η and Fd )
regrouping the elements
f˜d [i] =
Ns
X
j=1
h i
f˜d i(j)
Ns
Y
j′ = 1
j ′ 6= j
(j ′ )
F̃d
h
i
′
i(j ) ,
(3.103)
160
Chapter 3. Adaptive quantizers for estimation
where

(j)
NI

(j) (j) (j)
(j) (j) (j)
(j)
(j)
(j)

,
τi c δ δ
−f
τi−1 cδ δ
, if i ∈ 1, · · · , 2
h i f
(j)
f˜d i(j) =
(j)
NI

(j) (j) (j)
(j) (j) (j)
(j)
(j)
(j)

.
τi+1 cδ δ
−f
τi c δ δ
, if i ∈ −1, · · · , − 2
f
(3.104)
The asymptotic performance can also be optimized through the choice of η, this can be
done by minimizing (3.102) w.r.t. η under the equilibrium constraint (3.99). This problem
can be solved in the same way as it was done for finding the optimal vector η δ in the joint
estimation of location and scale parameters. Consequently, we find the following optimal
vector η (Why? - App. A.1.9):
−1
η ∝ F−1
d fd − 1fd = Fd fd .
The second equality comes from the fact that the sum of the elements of fd is zero. For
proving this, note that for each possible i there is −i. As the function f˜d [i] is odd and F̃d [i]
˜
˜
˜
is
fd [i] = −fd [−i]. Therefore, when adding fd [i] for all the possible i, the pairs
even, we have
f˜d [i] , f˜d [−i] will cancel each other, resulting in a zero sum. Similarly to the previous cases
we will choose
η = −F−1
(3.105)
d fd .
For the update coefficients given by (3.105), the asymptotic normalized variance and the
optimal gain are
1
2
= γ ⋆ = T −1 .
σ∞
(3.106)
fd F d fd
Using the expressions for F̃d [i] (3.97) and for f˜d [i] (3.103), for a given measurement vector
i the update coefficients are4
Ns ˜(j) (j) X
fd i
η (i) = −
.
(3.107)
(j) (j) i
j=1 F̃d
If we use the symmetry assumptions, the expression for the asymptotic normalized variance
and for the optimal gain (3.106) becomes (Why? - App. A.1.10)
2
σ∞
= γ⋆ =
1
Ns
P
P
(j) 2
f˜d [i(j) ]
(j)
j=1 i(j) ∈I (j) F̃d
.
(3.108)
[i(j) ]
Observe that the update coefficients are the sum of the update coefficients obtained in the
single sensor approach. The asymptotic normalized variance is equal to the inverse of the sum
of the FI Iq (0) for each sensor, which means that the algorithm is asymptotically efficient.
4
Using this specific form for the update coefficients, we can prove, similarly as it was done for the single
sensor case, that the algorithm is asymptotically unbiased.
3.6. Adaptive quantizers for estimation: extensions
161
We have then the following solution to problem (a”) (p. 157):
Solution to (a”) - Adaptive algorithm with decreasing gain
for estimating x using multiple sensors and a fusion center
(a”1) 1) Estimator
For each time k,
• the sensors send
(j)
ik
=
Q(j)
(j)
Yk −X̂k−1
(j)
c∆ δ (j)
to the fusion cen-
ter.
• The fusion center estimates the parameter using (3.95)
X̂k = X̂k−1 +
where γ ⋆ =
1
(j) 2 (j)
f˜
i
P
d
(j) (j)
i
j=1 i(j) ∈I (j) F̃d
γ
η (ik ) ,
k
and η (i) = −
(j)
f˜d [i(j) ]
j=1 F̃ (j) i(j) .
]
d [
P Ns
[ ]
[ ]
• The fusion center then broadcasts the estimate to the
sensors through perfect channels to be used as the next
quantizers input offset.
N
Ps
2) Performance (assumed and asymptotic)
The estimator is assumed to be asymptotically
√ unbiased.
When k → ∞ the normalized estimation error kεk is Gaus2 given by (3.108)
sian distributed with variance σ∞
2
σ∞
=
1
Ns
P
P
(j) 2
f˜d [i(j) ]
(j)
j=1 i(j) ∈I (j) F̃d
.
[i(j) ]
(j)
Note that, again here, we can still optimize the performance through τ ′ (j) and c∆ . In
the same way as it was done previously, in what follows we consider that the threshold vari(j)
ations are uniform with unitary step-length and that only c∆ are used for optimizing the
performance.
162
Chapter 3. Adaptive quantizers for estimation
Simulations
The validity of the results will be verified through simulations. All the sensors within a
simulation will be considered to have the same type of noise and the same noise scale factor
δ = 1. The noise considered will be Gaussian or Cauchy distributed. Optimization w.r.t. c∆
(the same gain for all sensors in this case, as the noise is identically distributed) will be done
by searching the maximum of the corresponding FI in a fine grid. After finding the optimal
˜
and the gain γ ⋆ can be calculated.
c∆ , the coefficients − F̃fd [i]
[i]
d
For all the following simulations, the length of the block of samples will be 5000 and for
evaluating the MSE the average of the squared error will be calculated using 5 × 104 blocks.
The parameter value and initial estimator value are x = 0 and X̂0 = 1.
In the first simulation, it will be considered that all the quantizers have NI = 4 and Ns will
be 1, 2 or 3, the results can be observed in Fig. 3.15 in log scale both in time and MSE. The
simulated results are compared with the theoretical approximations, for this algorithm they
are asymptotically equal to the CRB for quantized measurements obtained from a number of
sensors Ns , CRBqNs ,⋆ .
CRBqNs ,⋆ Gaussian
CRBqNs ,⋆ Cauchy
Sim. – Gaussian
Sim. – Cauchy
100
MSE
10−1
10−2
10−3
10−4
100
101
102
Time [k]
103
Figure 3.15: Cramér–Rao bound and simulated MSE for the adaptive algorithm when NI = 4,
Ns = 1, 2, 3 and the noise is Gaussian or Cauchy distributed, both with δ = 1. For obtaining
the simulated MSE, the algorithm was simulated 5 × 104 times for blocks with 5000 samples.
For all simulations the true parameter was set to zero and the initial estimate was X̂0 = 1.
In each set of curves the results for the three different number of sensors are represented, the
highest MSE curves represents the performance for Ns = 1 and the lowest MSE represent
Ns = 3. The curves are plotted in log log scales for better visualization.
As it was expected, the MSE decreases with the number of sensors and the simulated results
are very close to the theoretical approximation for a large number of samples. To have a more
appropriate comparison between different numbers of sensors, channel bandwidth constraints
must be considered.
In the second simulation, the total rate will be fixed to 5 bits. Two possible settings will
be considered, a single sensor approach using the 5 bits (NI = 32) and a multisensor approach
3.6. Adaptive quantizers for estimation: extensions
163
with one sensor quantizing the measurements with 2 (NI = 4) bits and the other with 3
bits (NI = 8). We keep all the other simulation parameters from the previous simulation.
The results are shown in Fig. 3.16, also with a comparison with the asymptotic performance
(which again is equal to the optimal CRB for quantized measurements).
CRBqNs ,⋆ Gaussian
CRBqNs ,⋆ Cauchy
Sim. – Gaussian
Sim. – Cauchy
100
MSE
10−1
10−2
10−3
10−4
100
101
102
Time [k]
103
Figure 3.16: Cramér–Rao bound and simulated MSE for the adaptive algorithm for Ns = 1
and NB = 5 and for Ns = 2, one sensor with NB1 = 2 bits and the other with NB2 = 3
bits. The noise was considered to be Gaussian or Cauchy distributed, both with δ = 1. For
obtaining the simulated MSE, the algorithm was simulated 5 × 104 times for blocks with 5000
samples. For all simulations the true parameter was set to zero and the initial estimate was
X̂0 = 1. In each set of results the higher curve represents the performance for Ns = 1. The
curves are plotted in log log scales for better visualization.
For both types of noise, the theoretical and simulated results show that the multisensor
approach is superior.
Discussion on the results
The proposed algorithm shows that in practice, in a rate constrained context, a multiple sensor
approach with low resolution quantizers might be superior to a a high resolution single sensor
approach. Such observation motivates the use of low resolution sensor networks for estimation
purposes.
Note that in the case studied, we did not analyze the interaction between the noise scale
factor (it is considered to be constant over the sensors) and the number of quantization bits
used in each sensor. When the total number of bits to be transmitted to the fusion center
is constrained, an interesting problem for further investigation is the problem of optimal
allocation of number of bits to sensors as a function of their noise scale factors. This problem
will be studied in an approximate form in Part II.
The adaptive algorithm that is implemented in the fusion center has very low complexity.
The complexity is roughly linear in the number of sensors, as the optimal correction η is
equivalent to a weighted sum of the corrections given by the single sensor algorithm. Despite
164
Chapter 3. Adaptive quantizers for estimation
this fact, it can be very costly to implement this algorithm due to the perfect feedback channels
requirement. Thus in future work, we can consider that the feedback channels are not perfect,
for example, by considering that the estimates are fedback after being quantized and that they
are corrupted with additive noise.
3.7
Chapter summary and directions
We summarize now the main points observed in this chapter and we also present some subjects
that are interesting for further research.
• We presented an adaptive algorithm that can estimate three types of parameter: constant, slowly varying with a Wiener process model without drift or slowly varying with
a Wiener process model with drift. The adaptive algorithm can be used for any even
number of quantization intervals and under the assumption that the noise is symmetric,
unimodal and has a regular CDF (locally Lipschitz continuous) it was shown that
– using decreasing gains, when the parameter is a constant, the algorithm converges
asymptotically in the mean to the true parameter value and its asymptotic performance in terms of variance attains the minimum CRB for common noise distributions CRB⋆q . Thus, the answers to the two initial questions (p. 105) are positive:
the algorithm with gain proportional to k1 converges and it can be extended to a
multibit setting.
– Using a constant gain, when the parameter is modeled by a slowly varying Wiener
process without drift, the algorithm also converges in the mean to the true parameter and the algorithm is approximately asymptotically optimal. This answers the
third question also in a positive way.
– Using a constant gain, when the parameter is modeled by a slowly varying Wiener
process with drift, the algorithm is biased and its asymptotic MSE can be minimized
by setting the gain as a function of the drift.
• Using the asymptotic performance results, we evaluated the loss of estimation performance due to quantization for the algorithm. We observed the following:
– the loss in all cases is a function of Iq (0), showing one more time the importance of
studying the behavior of this quantity as a function of the threshold variations. We
remind that this problem will be studied with an asymptotic approach NI → ∞ in
Part II.
– When the parameter varies, the loss due to quantization is smaller than when the
parameter is constant. Thus, when using quantized measurements for estimation,
it seems that a type of dithering effect is present.
– The loss of performance is almost negligible in all cases for 4 or 5 quantization bits.
In a rate constrained scenario this seems to be a strong motivation for using a low
to medium resolution multiple sensor approach instead of a high resolution single
sensor approach. This was validated using an extension of the adaptive algorithm
3.7. Chapter summary and directions
165
designed for multiple sensors that can communicate their measurements to a fusion
center.
• When comparing the adaptive algorithm with its equivalent counterparts studied in Ch.
1 and 2, the following was observed:
– for estimating a constant, the adaptive algorithm has a very low complexity when
compared with the adaptive scheme based on the MLE and their performance is
equivalent.
– for estimating a slowly varying Wiener process, the algorithm has also a very low
complexity when compared with the PF scheme using dynamical central threshold.
In this case the only drawback of the adaptive algorithm is that it has a longer
transient time.
Therefore, if complexity constraints are present, the adaptive algorithm seems to be the
best analyzed solution.
If no constraints on complexity are considered, then the adaptive algorithm is still the
best choice for estimating a constant, but it should be replaced by the PF for estimating
the slowly varying parameter. An interesting point for future work would be to look
for ways of choosing the quantizer update coefficients during the transient, so that the
adaptive algorithm performance would be similar to the PF performance.
• We presented two extensions of the algorithm, both for estimating a constant parameter.
They are the following:
– the joint location-scale adaptive estimator, for which we showed that even if we
do not know the noise scale parameter it is possible to estimate it with the same
asymptotic performance obtained for the case with known scale parameter.
– The fusion center approach with multiple sensors. In this extension of the algorithm, we considered that measurements from multiple sensors are sent to a fusion
center. The role of the fusion center is to estimate the parameter and then broadcast
the estimate to the sensors, so that it can be used for setting the quantizers offset.
As it was mentioned above, with this approach we showed that a low to medium
resolution multiple sensor approach might be better for estimation purposes than a
high resolution single sensor approach. We remind that this was shown for sensors
with the same type of noise distribution and the same noise scale parameter value.
Thus, an interesting subject to study is the bit allocation problem among sensors,
when the total bandwidth is constrained and the sensors have the same type of
noise distribution but different scale parameters. This will be done in Part II, in
the case of a weak bandwidth constraint.
• Many other extensions can be the subject of future work. They are
– the joint estimation of the drift when we track a Wiener process with drift. Actually,
this was already done by adding a simple adaptive estimator of the drift. However,
in some cases we can have a more detailed dynamical model for the drift. By
166
Chapter 3. Adaptive quantizers for estimation
using adaptive multistep algorithms [Benveniste 1990, Sec. 4.2], we can use this
additional information to have a better estimate of the Wiener process.
– The joint estimation of the Wiener process increment standard deviation and the
Wiener process itself. This will lead to a robust multibit generalization of delta
modulation with varying gain.
– The joint estimation of a location parameter and the shape of the noise distribution.
In this case, we can consider that the noise CDF has an unknown shape but a known
structure, for example that it is locally polynomial, and that we want to estimate
jointly the location parameter and the parameters of the noise distribution.
– The nonparametric estimation of the location parameter. We can consider for
example that we only know that the noise distribution is symmetric, without any
specific parametrization. Then we can try to define a nonparametric adaptive
algorithm based on adaptive histograms for getting as close as possible to the
parametric performance.
– The joint estimation of location-scales parameters when the multiple sensor fusion
center approach is considered. This extension can be directly implemented by
joining the features of the adaptive location-scale estimator with the fusion center
approach. The main difference in this case is that for reducing the communication
complexity, the sensors will have to estimate their individual scale parameters for
setting their quantizers.
• We can also consider some extensions of the estimation problem itself for which modifications of the adaptive algorithm would be a good solution. Some examples are:
– a fusion center approach where the quantized measurements from the sensors are
transmitted through noisy channels. This is a problem that we decided not to treat
but it is an interesting and more realistic point for further development.
– A fusion center approach where the information that is fedback is quantized and
passed through noisy channels. For this extension, we can consider an additional
adaptive algorithm at the sensors for smoothing out the noise from the feedback
channels. For dealing with quantization of the estimates we can consider including
a dither signal. With this extension we will be able to assess the importance of the
feedback channel quality, thus giving a more realistic global estimate of the sensor
network cost.
The main issue that makes these two extensions far more difficult to be studied
is that the output quantizer indexes cannot be defined arbitrarily, as they are
corrupted by the channel noise.
– The estimation of a scalar parameter following an autoregressive model
Xk = aXk−1 + Wk
instead of the Wiener process model. This will lead to a robust generalization of
scalar predictive quantization.
3.7. Chapter summary and directions
167
– Compression of a Wiener process with drift. We can consider that at sensor level we
can store continuous measurements (or very finely quantized) and then we can apply
the adaptive algorithm for a block of measurements in both time directions (forward
and backward) and average the results to have final estimates with reduced bias.
By storing the initial and final continuous measurements and both the forward and
backward quantized sequence, we equivalently have stored a compressed estimate
of the true parameter sequence.
Conclusions of Part I
The main objective of Part I was to propose and study the performance of algorithms for estimation based on quantized measurements. We assumed simple parameter and measurement
models:
• Parameter model – a scalar parameter that can be either constant or varying with a
Wiener process model.
• Measurement model (noise model) – the scalar constant is measured with independent,
unimodal and symmetrically distributed noise.
• Measurement model (quantizer) – the quantizer is symmetric.
Under these settings, we obtained the following conclusions:
• Adaptiveness is important. The performance of estimation based on quantized measurements is mainly dependent on the FI for quantized measurements and this dependence is direct. Increased FI is equivalent to increased estimation performance.
For the noise distributions considered, the FI is increased if we set the quantizer dynamic
range close to the parameter to be estimated and for commonly used noise distributions
(Gaussian, Laplacian, Cauchy), we must put the quantizer central threshold exactly at
the true parameter value. As we do not know the value of the parameter, for obtaining
optimal estimation performance, we must resort to adaptive algorithms that place the
quantizer range close to the true parameter value, for example by placing the quantizer
central threshold at the most recent estimate of the parameter. Therefore, this indicates
that adaptiveness of the quantizer is a main requirement for optimal estimation.
• Low complexity is possible and it might be even asymptotically optimal. It
is possible to estimate a constant and a slowly varying parameter with a low complexity
adaptive algorithm. The adaptive algorithm is not only convergent in the mean (with
a small bias in the drift case) but its parameters can be chosen in such a way that it is
exactly equivalent to the asymptotically optimal estimator. This observation goes in the
exact opposite direction of some proposed solutions (adaptive scheme based on the MLE
and the PF) which requires high complexity both in terms of memory and processing.
• Low to medium resolution is enough. Both for a constant and slowly varying parameter model, the loss of performance that is incurred by using quantized measurements
instead of continuous amplitude measurements seems to be negligible for a number of
quantization bits larger than 4 or 5. In a rate constrained context this means that using more sensors with less resolution may be better than using less sensors with more
resolution.
169
Part II
Estimation based on quantized
measurements:
high-rate
approximations
171
173
"Finite – to fail, but infinite to venture" - part of a poem of Emily Dickinson.
Motivation
The introduction of this part will also be done using a motivational example.
To maintain their economic growth, emerging economies will have to look for new mineral
and material sources. This will generate a potential increase of exploration in unusual places,
for example the seafloor. Sulfur and metal base rich mineral deposits can be found at seafloor
in hydrothermal vent sites [Hoagland 2010].
Hydrothermal vents, also called black smokers, occur when seawater penetrates the oceanic
crust through fissures. The water penetrates so deeply in the oceanic crust that it enters in
contact with upper parts of magma chambers. A large increase in temperature (from ≈ 2◦ C to
≈ 400◦ C) is observed, along with a decrease in pH and Eh. The hot corrosive liquid rises then
through fissures carrying metal and sulfur from the rocks. The mineral rich water is released in
the seawater as hot black smoke. Precipitation of the elements present in the smoke happens
when the hot water is mixed with ocean cold water. As a result, a mineral rich chimney and
a massive sulfide deposit are formed around the hot water releasing point [Herzig 2002].
To mine the sulfide deposit, first, it must be located. One possible way to locate it is
by measuring the concentration of chemical compounds and elements in the seawater. The
chemical plume generated by the hydrothermal vent can be detected using CH4 , Fe, H, He
or Mn concentration measurements [Baker 2004]. After detecting the plume, for example
using sensor measurements from multiple autonomous underwater vehicles (AUV) that
communicate with a fusion center, the source location must be found. This can be done by
following the ascent gradient direction of a chemical compound concentration. The gradient
direction can be obtained by exploiting the local information measured by the AUV.
Underwater communication is challenging as bandwidth is severely constrained in this
environment. To overcome this problem, quantization of the concentration measurements
from the AUV can be considered. As a consequence, to calculate an approximated gradient
at the fusion center we will have to deal with the same problem treated in Part I, which is the
following:
• How to estimate a scalar constant (the concentration) based on noisy quantized measurements?
Algorithms for doing this were presented in Part I, where it was noted that estimation performance is given (at least asymptotically) by the FI for quantized measurements. However,
a question remained without answer:
• How to set the quantizer thresholds to have an optimized estimation performance?
Only in some cases the answer for this question was given:
174
1. In the binary case it was observed that for commonly used noise models, the quantization threshold should be placed exactly at the parameter.
2. In the multibit uniform quantization case, after setting the central threshold at the
parameter, the corresponding performance maximization is a one-dimensional optimization problem, which can be solved using exhaustive search.
In the general nonuniform case, setting the threshold was observed to be a complicated optimization problem.
Similarly to standard quantization [Gersho 1992, pp. 185–186], where analytical characterization of quantization performance is difficult for a finite number of quantization intervals,
when the number of quantization intervals is large, the set of intervals can be approximated
by an interval density. The interval density is a function whose integral over an interval gives
the fraction of the number of quantization intervals contained in that interval. By using small
interval approximations of the FI, we can obtain an asymptotic expression (NI → ∞) for the
FI as a function of the interval density. The resulting FI can be maximized w.r.t. the interval
density to get an approximation of the optimal interval set. After that, the optimal interval
density can be used to have an approximated analytical expression for the optimal FI, thus
giving a complete asymptotic characterization of the estimation algorithms.
As the interval density is an asymptotic quantity, a main issue that must also be solved
is how to do a practical approximation of this density with a finite number of intervals. An
interesting question would be to find an analytical expression for the approximately optimal
quantization thresholds as a function of the number of intervals.
Writing it in a more detailed form, we want to do the following:
• Find an asymptotic (in terms of number of quantization bits)
approximation of the Fisher information for estimating a constant parameter embedded in noise as a function of an interval
density.
(c)
• Find the asymptotically optimal interval density.
• Give an analytical expression approximating the maximum FI
for the optimal interval density.
• Obtain a practical approximation for the optimal quantization
thresholds.
Now, with the optimal thresholds given by the asymptotic approximation, the adaptive
estimation algorithms from Part I work (at least asymptotically) in an optimal way. We can
imagine that for reliability issues or for reducing measurement latency, multiple concentration
sensors are installed in each AUV. Due to deterioration, the sensors do not have exactly the
same noise levels. Thus, under a given rate constraint, another question arises:
175
• How many quantization bits do we allocate for each sensor?
For an array of sensors with the same type of noise and considering the independence between
sensor noise, the only parameter that can change from sensor to sensor is the noise scale factor.
Thus, what we want to do precisely is
(d) Find the optimal or approximately optimal number of bits per
sensor as a function of the noise scale factors under a maximum
constraint on the total number of bits.
The problems presented above are quite general and they can appear in the performance
analysis of any optimal estimation algorithm in a constrained rate context. In what follows,
we will obtain insight on how we can solve these problems. As we have only one chapter in
this part, its outline will be given directly at the chapter introduction.
Chapter 4
High rate approximations of the FI
To obtain insight on how we solve problems (c) and (d), we will resort to an asymptotic
approach, that is, we will make the number of quantization intervals goes to infinity NI → ∞
and we will see how the FI behaves as a function of the quantizer. This approach can also
be found under the names high resolution or high-rate (in the title), the former is used to
emphasize that the quantizer intervals are supposed to be very small and the latter is used to
make explicit that the communication rate must be high, as the number of quantization bits
is large.
Note that making NI → ∞ seems to contradict one of the conclusions of Part I, that states
that with only a few quantization bits we have a negligible loss of performance due to quantization. However, even if we make NI → ∞, we will see that the asymptotic approximations
still depend on NI , so that, as we stated above, we can use it to gain some insight on the
estimation performance for finite NI . Actually, we will see that for the location parameter
estimation problem studied in Part I, the asymptotic approximations are valid even for small
numbers of quantization bits (NB = 4 and NB = 5), which is very fortunate, as these cases
were observed to be the practical useful limit in quantization for estimation and also they are
the cases with lowest number of bits for which the maximization of the estimation performance
w.r.t. the quantizer thresholds is difficult to be done in a direct way.
When NI → ∞, the quantizer can be characterized by its density of quantization intervals,
thus asymptotically, the behavior of the FI as a function of the quantizer can be characterized
by studying its behavior as a function of the intervals density. As a consequence, one of the
main objectives of this chapter is to obtain an asymptotic analytic expression of the FI as a
function of the interval density.
• Fixed rate encoding. We will obtain this expression for scalar quantization and we will
not impose any strong constraints on the type of estimation problem that is treated (for
example, we will not constrain it to be a location estimation problem).
• Variable rate encoding. Additionally to the fixed rate encoding scheme, where all the
quantizer outputs use the same number of bits for encoding, we will also obtain the
optimal interval density maximizing the FI for the variable rate encoding scheme, where
we can use different numbers of bits for different quantizer outputs. We will also discuss
on the difficulties of implementing the variable rate encoding scheme in practice.
• Practical implementation. We will describe how to implement in practice an approximation of the optimal interval density.
177
178
Chapter 4. High-rate approximations of the FI
We will check the validity of the results in the location parameter case by comparing the
theoretical results for the maximum FI obtained with the optimal interval density with the FI
obtained with the practical approximation of the optimal density and with the FI for optimal
uniform quantization. We will show that in practice we can obtain the asymptotic performance
results by using the adaptive algorithm presented in Ch. 3. We will also look in some detail
the location and scale parameter estimation problems for GGD and STD measurements.
In the single sensor location parameter case, we will study the problem of deciding how
many quantization bits we might allocate to each sensor in a sensor network, when the total
rate is constrained and all the sensors have the same type of noise distribution but different noise scale parameters. Approximate solutions for this problem will be given using the
asymptotic approximations.
To show the connections between the results found here and asymptotic results for other
inference problems, we will study the asymptotic approximation of a generalized inference
performance measure known as the generalized f –divergence. The asymptotic results for this
divergence were proposed in [Poor 1988], mainly for the uniform vector quantization fixed rate
encoding case and they were stated but not proved in the non uniform case. Here, we will
give a simple derivation of the asymptotic approximation for this divergence in the scalar case
using the same procedure as the one that is used for the FI, we will also extend the results
to the variable rate encoding case. After obtaining the general optimal density of thresholds,
we will point out the similarities and differences between the way quantization must be done
for three different inference problems: classic estimation (considered in this thesis), Bayesian
estimation and detection.
At the end of the chapter we will summarize the main results and we will indicate some
possible points for future work.
Contributions presented in this chapter:
• Asymptotic approximation of the optimal interval density for classic parameter estimation. The asymptotic analysis presented in [Poor 1988] is only detailed for uniform
quantization, differently from the development that is presented here, where we consider
non uniform quantization with the interval density approach.
• Practical implementation of the optimal quantizer in the location parameter estimation
problem. In this chapter, we show that the asymptotically optimal quantizer depends
on the true parameter value and we also show that in practice we can achieve the
asymptotically optimal performance using the adaptive algorithm with decreasing gain
presented in Ch. 3. This shows the importance of the adaptive approach. No result of
this type seems to be present on the literature.
• Approximate bit allocation for the multiple sensor approach. The approximations of the
optimal bit allocation among sensors seem to be new in the context of classic location
parameter estimation.
179
Contents
4.1
4.2
4.3
4.4
Asymptotic approximation . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.1.1
General setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.1.2
Loss of estimation performance due to quantization
4.1.3
Asymptotic approximation of the loss . . . . . . . . . . . . . . . . . . . . 181
4.1.4
Optimal fixed rate encoding . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.1.5
Variable rate encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.1.6
Estimation of GGD and STD location and scale parameters . . . . . . . . 190
4.1.7
Location parameter estimation . . . . . . . . . . . . . . . . . . . . . . . . 196
. . . . . . . . . . . . 180
Bit allocation for scalar location parameter estimation . . . . . . . . . 200
4.2.1
Unconstrained numbers of bits . . . . . . . . . . . . . . . . . . . . . . . . 201
4.2.2
Positive numbers of bits . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Generalization with the f –divergence . . . . . . . . . . . . . . . . . . . 207
4.3.1
Definition of the generalized f –divergence . . . . . . . . . . . . . . . . . . 207
4.3.2
Generalized f –divergence in inference problems . . . . . . . . . . . . . . . 207
4.3.3
Asymptotic results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
4.3.4
Interval densities for inference problems . . . . . . . . . . . . . . . . . . . 211
Chapter summary and directions . . . . . . . . . . . . . . . . . . . . . . 213
180
4.1
4.1.1
Chapter 4. High-rate approximations of the FI
Asymptotic approximation
General setting
The general setting considered here is the estimation of a scalar deterministic parameter x ∈ R
of a continuous distribution based on N independent measurements from this distribution
Y = [Y1 Y2 · · · YN ]⊤ . Again here, we will consider that the estimation of x is not based on
Y. Instead, it is based on a scalar quantized version of Y denoted
i = [i1 i2 · · · iN ]T = [Q (Y1 ) Q (Y2 ) · · · Q (YN )]T .
The function Q represents the scalar quantizer and is given by
Q (Y ) = i,
if Y ∈ qi = [τi−1 , τi ) ,
(4.1)
where i ∈ {1, · · · , NI }, NI is the number of quantization intervals qi and τi are the quantizer
thresholds. The first and last thresholds will be set to τ0 = τmin and τNI = τmax . Note that
the setting considered here is more general than the setting presented in Part I, as we do not
restrict the estimation problem to be a location estimation problem and we do not impose
any symmetry on the quantizer. Observe also that the quantizer interval indexes now go from
1 to NI .
It will be assumed that the marginal CDF of the continuous measurements parametrized
by x F (y; x) admits a PDF f (y; x) that is positive, smooth in both x and y and defined on
a bounded support. The bounded support assumption is needed to simplify the derivation of
the asymptotic results.
4.1.2
Loss of estimation performance due to quantization
For estimating a constant with quantized or continuous noisy measurements, we saw in Ch. 1
that the asymptotic performance of an optimal unbiased estimator attains the corresponding
CRB. This asymptotic characterization is not restricted to location parameter estimation.
Under regularity conditions on the likelihood, it can be applied to any situation where we
want to estimate a constant embedded in noisy measurements. Thus, for a general parameter
x and for a large number of samples, the estimation performance is still linked to the FI as
follows
h i
1
,
(4.2)
Var X̂ ∼ CRBq =
N Iq
where Iq is the FI for a quantized measurement that was already presented in Ch. 1 (for a
location parameter). Rewriting the FI with the notation from this part, we have
(
)
2
∂ log P (i; x) 2
I q = E Sq = E
∂x
NI
X
∂ log P (i; x) 2
=
P (i; x) ,
(4.3)
∂x
i=1
4.1. Asymptotic approximation
181
Sq is again the score function for quantized measurements and P (i; x) is the probability of
having the quantizer output i (parametrized by x):
P (i; x) = F (τi ; x) − F (τi−1 ; x) .
(4.4)
The FI for quantized measurements can be written as a function of the FI for continuous
measurements and the score functions, exactly in the same way as it was done in Ch. 1 (1.16
p. 42) (Why? - App. A.1.1)
h
i
Iq = Ic − E (Sc − Sq )2 ,
(4.5)
h
i
f (y;x)
is the score function for continuous measurements, L = E (Sc − Sq )2
where Sc = ∂ log∂x
is the loss of FI and consequently of estimation performance due to quantization. The main
objective from now on will be to minimize L through the choice of the quantizer intervals when
NI is large. Notice that minimizing L defined here is equivalent to minimizing Lq defined in
Ch. 3.
4.1.3
Asymptotic approximation of the loss
Similarly to standard quantization for measurement reconstruction, where optimal nonuniform
quantization intervals can be approximated for large NI , an approximation for Iq will now be
developed.
The loss L which is an expectation under the measure F can be rewritten as a sum of
integrals, each term of the integral corresponding to the loss produced by a quantization
interval:
NI Z X
∂ log f (y; x) ∂ log P (i; x) 2
L=
−
f (y; x) dy.
(4.6)
∂x
∂x
i=1 q
i
First term
∂ log f (y;x)
∂x
For the interval with index i, the PDF can be approximated with a Taylor series around the
central point yi = τi +τ2i−1 :
(y)
(yy)
fi
(y − yi )2 + o (y − yi )2 ,
(4.7)
2
where the superscripts indicate the variables for which the function is differentiated. The
subscript represents that the function (after differentiation) is evaluated at yi . It will be
assumed that the sequences of intervals for increasing NI are chosen such that, for any ε > 0,
it is possible to find a NI∗ for which
f (y; x) = fi + fi
(y − yi ) +
o (y − yi )2
< ε,
(y − yi )2
for NI > NI∗ , y ∈ qi .
(4.8)
Under the assumption that f > 0, the logarithm of f at interval qi can be approximated also
using a Taylor series:
(y)
(yy)
log f (y; x) = log fi + (log f )i (y − yi ) + (log f )i
(y − yi )2
+ o (y − yi )2
2
182
Chapter 4. High-rate approximations of the FI
and the derivative w.r.t. x is
2
∂ log f (y; x)
(x)
(yx)
(yyx) (y − yi )
= (log f )i + (log f )i (y − yi ) + (log f )i
+ o (y − yi )2 ,
∂x
2
(4.9)
which is an expression for the continuous score function on qi to be used in (4.6).
Second term
∂ log P(i;x)
∂x
Now, the other term in the squared factor must be calculated. Integrating the PDF in (4.7)
on the interval qi , which has length denoted by ∆i , one gets
3
(yy) ∆i
P (i, x) = fi ∆i + fi
+ o ∆3i .
24
(4.10)
Note that the term in ∆2i is zero as yi is the interval central point and the integral of (y − yi )
around it is zero. The logarithm of P (i, x) can be obtained by dividing the second and third
terms of the right hand side of (4.10) by the first term and then using the Taylor series for
log (1 + x) = x + ◦ (x). Differentiating the resulting expression w.r.t. x gives
∂ log P (i, x)
(x)
= (log f )i +
∂x
f (yy)
f
!(x)
i
∆2i
+ o ∆2i .
24
(4.11)
Loss L
Subtracting (4.11) from (4.9) and squaring makes the leading term with least power in (y − yi )
(yx)
or in ∆i to be (log f )i (y − yi ). When we square this difference and multiply by the Taylor
h
i
(yx) 2
series of f , we have a leading term (log f )i
fi (y − yi )2 and all other terms have larger
powers of (y − yi ) and/or ∆i . Therefore, after integrating the squared difference multiplied
by the Taylor series of f , we get
L =
=
N I h
X
i
∆3
(yx) 2
(log f )i
fi i
i=1
NI X
i=1
12
(y) 2
Sc,i
fi
+o
∆3i
∆3i
,
+ o ∆3i
12
(4.12)
where we have used the fact that f is smooth enough so that we can change the derivative
(yx)
(y)
order between y and x to get (log f )i
= Sc,i .
To obtain a characterization w.r.t. the quantization intervals, an interval density function
λ (y) is defined
1
, for y ∈ qi .
(4.13)
λ (y) = λi =
NI ∆i
The interval density when integrated in an interval gives, roughly, the fraction of the number
of quantization intervals contained in that interval. It is a positive function that always sums
4.1. Asymptotic approximation
183
to one1 . Rewriting (4.12) with this density gives
L=
N I X
i=1
(y) 2
Sc,i fi
∆i
+o
12NI2 λ2i
1
NI2
∆i .
(4.14)
As NI → ∞, it will be supposed that all ∆i converge uniformly to zero. Therefore,
lim NI2 L =
NI →∞
1
12
Z
2
∂Sc (y;x)
f
∂y
λ2 (y)
(y; x)
dy.
(4.15)
This asymptotic expression for the loss gives the following approximation for the FI
Iq ≈ Ic −
1
12NI2
Z
2
∂Sc (y;x)
f
∂y
λ2 (y)
(y; x)
dy,
(4.16)
which is valid for large NI . Note that when NI in (4.16) tends to infinity, if the quantizer
intervals are chosen in a way such that all ∆i tend to zero uniformly, then the asymptotic
estimation performance for quantized measurements will tend to the estimation performance
for continuous measurements.
4.1.4
Optimal fixed rate encoding
In the fixed rate encoding scheme, all the outputs of the quantizer are encoded with (binary)
words that have the same binary size, namely, NB = log2 (NI ). Thus, we can rewrite (4.16)
using the number of bits NB instead of the number of intervals NI . This gives
Iq ≈ Ic −
2−2NB
12
Z
2
∂Sc (y;x)
f
∂y
λ2 (y)
(y; x)
dy.
(4.17)
This shows that the FI for quantized measurements under fixed rate encoding tends exponentially to the FI for continuous measurements with increasing number of bits. Moreover, the
constant that multiplies the exponential depends not only on the measurement distribution
and on the estimation problem, through f and Sc , but also on the quantizer intervals through
λ.
Optimal interval density
We can characterize asymptotically the optimal quantizer for estimation by defining an optimization problem using (4.16) as the function to be maximized w.r.t. λ. To find the optimal
1
The Riemann sum is equal to one
N
PI
i=1
1
∆i
N I ∆i
=1≈
R
λ (y) dy.
184
Chapter 4. High-rate approximations of the FI
λ when NB is large, we must solve the following optimization problem:
2
Z ∂Sc (y;x) f (y; x)
∂y
minimize
dy,
w.r.t. λ (y)
λ2 (y)
Z
subject to
λ (y) dy = 1,
λ (y) > 0,
where the equality and inequality constraints on λ comes from its definition as a density.
This minimization problem can be solved using Hölder’s inequality, which states [Hardy 1988,
p. 140] that for two functions h (y) and g (y)
1 Z
1 Z
Z
p
q
q
p
|g (y)| dy
≥ |h (y) g (y)| dy,
|h (y)| dy
with equality happening when hp (y) ∝ g q (y) and p1 +
#1
"
2
Setting p = 3, q = 32 , h (y) =
∂Sc (y;x)
f (y;x)
∂y
λ2 (y)
1
q
= 1.
3
2
and g (y) = λ 3 (y) in Hölder’s inequality
and using the constraint that the integral of the density must sum to one, we have the following
optimal interval density:
2
∂Sc (y;x) 3 1
2
f 3 (y; x)
∂y
∂Sc (y; x) 3 1
⋆
∝
f 3 (y; x)
(4.18)
λ (y) = R ∂Sc (y;x) 23 1
∂y
f 3 (y; x) dy
∂y
and the corresponding maximum FI given by this density is
#3
"Z 2
1
∂Sc (y; x) 3 1
⋆
f 3 (y; x) dy .
Iq ≈ Ic −
∂y
12NI2
(4.19)
Remark: in standard quantization for minimum MSE measurement reconstruction the optimal interval density is given by [Gersho 1992, p. 186]
1
λ⋆rec (y)
=R
f 3 (y; x)
1
3
f (y; x) dy
1
∝ f 3 (y; x) .
Therefore, the main difference from standard quantization is the additional factor depending
on the derivative of the score function.
Practical approximation of the interval density
From the definition of the interval density, the percentage of intervals until interval qi , NiI
must be equal to the integral of the interval density from τmin to τi . Thus, a practical way of
approximating the optimal thresholds is to set
i
−1
⋆
, for i ∈ {1, · · · , NI − 1} ,
(4.20)
τi = Fλ
NI
4.1. Asymptotic approximation
185
where Fλ−1 is the inverse of the cumulative distribution function (CDF) related to λ.
An important issue for evaluating the τi is that they may depend explicitly on x, which is
the parameter we want to estimate. A possible solution for this problem is to initially set τi
with an arbitrary guess of x, then estimate x using an initial set of measurements and finally
update the thresholds with the estimate. This procedure can be performed in an adaptive
way to get closer and closer to the optimal thresholds. We can use, for example, an adaptive
scheme based on the MLE for doing at the same time estimation and thresholds setting. For
the location parameter estimation problem, it was shown that this adaptive scheme converges,
thus in this case, if we set τi according to τi⋆ , we expect to obtain the optimal asymptotic
performance when N → ∞ and NI is large. Also in the location parameter case, a low
complexity alternative, which gives asymptotically the same performance as the scheme based
on the MLE, is the adaptive algorithm presented in Ch. 3. We will see through simulation
later that the low complexity adaptive algorithm with the thresholds chosen using τi⋆ achieves
asymptotically (N → ∞) a performance close to Iq⋆ given by (4.19), even for a moderate
number of quantization intervals.
186
Chapter 4. High-rate approximations of the FI
We have the following solution to problem (c) (p. 174):
Solution to (c) - Asymptotic approximation of the FI
for fixed rate encoding
(c1) The asymptotic approximation of the FI is given by (4.16)
Iq ≈ Ic −
≈ Ic −
1
12NI2
Z
2−2NB
12
Z
2
∂Sc (y;x)
f
∂y
λ2 (y)
(y; x)
2
∂Sc (y;x)
f
∂y
λ2 (y)
dy
(y; x)
dy,
where Ic and Sc are the FI and the score function for continuous
measurements and λ (y) is the interval density.
• Maximization of Iq gives the optimal interval density (4.18)
⋆
λ (y) =
∂Sc (y;x)
∂y
2
3
R ∂Sc (y;x) 23
∂y
1
f 3 (y; x)
.
1
3
f (y; x) dy
• The corresponding asymptotic approximation of Iq is (4.19)
Iq⋆
1
≈ Ic −
12NI2
"Z ∂Sc (y; x)
∂y
2
3
1
3
f (y; x) dy
#3
.
• A practical approximation of the asymptotically optimal
thresholds using a finite number of quantization intervals is
(4.20)
i
−1
⋆
, for i ∈ {1, · · · , NI − 1} ,
τi = Fλ
NI
where Fλ−1 is the inverse of the CDF related to the interval
density. This CDF may be dependent on the true parameter
x, therefore, it may be necessary to use an adaptive solution to
obtain approximately optimal thresholds.
4.1. Asymptotic approximation
4.1.5
187
Variable rate encoding: dead end A
It is known from information theory that the minimum average length H required for describing a discrete r.v. with a binary word is obtained by encoding its possible values (index j)
with lengths lj given by the negative logarithm of their probabilities pj [Cover 2006, p. 111]
lj = − log2 (pj ) .
For a r.v. with n possible values, this way of encoding the r.v. gives the following average
length
n
X
H=−
pj log2 (pj ) ,
i=1
which is the minimum average length and it is also the entropy of the r.v..
For achieving rate requirements in the problem of estimation based on quantized measurements, instead of using the fixed rate encoding scheme, we can use a scheme with variable
rate, where the outputs of the quantizer are coded with binary words with possibly different
lengths. The lengths of the outputs can be defined as above, leading to the following minimum
average length
NI
X
Hq = −
P (i, x) log2 [P (i, x)] .
(4.21)
i=1
Suppose that the communication channel imposes a constraint on the maximum Hq , so that
for lower (or equal to the maximum) Hq , transmission through this channel occurs without
any error, this constraint which is the capacity of the channel will be denoted R2 . The main
objective now is to set the quantizer thresholds for a given NI so that the FI Iq is maximized
under the constraint Hq ≤ R. As this problem is complicated to solve for finite NI , we will
use again the asymptotic approach to obtain the characterization of the optimal quantizer
through λ.
The asymptotic expression for Iq was already developed above and it is given by (4.16)
2
Z ∂Sc (y;x) f (y; x)
∂y
1
dy.
Iq ≈ Ic −
2
λ2 (y)
12NI
We need now to develop an asymptotic approximation for the entropy Hq . Using the Taylor
series development for P (i, x) given in (4.10) in the expression for Hq (4.21), we have
Hq = −
NI X
f i ∆i +
3
(yy) ∆i
fi
i=1
24
+o
∆3i
log2 fi ∆i +
3
(yy) ∆i
fi
24
+o
∆3i
.
Separating the factor fi ∆i inside the logarithm, using the Taylor expansion for log2 (1 + x)
and multiplying the terms in the resulting expression gives
Hq = −
NI
X
i=1
fi ∆i log2 (fi ) + fi ∆i log2 (∆i ) + ◦ ∆2i
.
2
We have supposed in the Introduction that efficient channel coding is used, so that we can assume no-error
transmission for rates below channel capacity.
188
Chapter 4. High-rate approximations of the FI
Using the interval density ∆i =
Hq = −
NI
X
i=1
1
NI λ i
in the term with log2 (∆i ) leads to
fi ∆i log2 (fi ) − fi ∆i log2 (λi ) − fi ∆i log2 (NI ) + ◦ ∆2i
.
When NI is large and ∆i are small, the sums can be approximated by integrals
Hq ≈ −
Z
f (y; x) log2 [f (y; x)] dy +
Z
f (y; x) log2 [λ (y)] dy + log2 (NI ) ,
where for obtaining the term log2 (NI ), we used the fact that
NI
P
fi ∆i is asymptotically close
R
to one as it is approximately the integral of the PDF. The integral − f (y; x) log2 [f (y; x)] dy
is known [Cover 2006, p. 243] as the differential entropy of the r.v. Y , therefore, from now on
we will denote it hy
H q ≈ hy +
Z
i=1
f (y; x) log2 [λ (y)] dy + log2 (NI ) .
(4.22)
For large NI , using the integral in expression (4.16) and the approximation of the entropy
(4.22), we can define the following optimization problem
minimize
w.r.t. λ (y)
Z
Z
subject to
Z
2
∂Sc (y;x)
f
∂y
λ2 (y)
(y; x)
dy,
f (y; x) log2 [λ (y)] dy ≤ R − hy − log2 (NI ) ,
λ (y) dy = 1,
λ (y) > 0.
The solution for this problem can be adapted from the development presented in [Li 1999].
First, we define the function p (y)
∂Sc (y;x)
∂y
2
p (y) = R 2
∂Sc (y;x)
∂y
f (y; x)
,
f (y; x) dy
then the integral that must be minimized can be rewritten as
Z
2
∂Sc (y;x)
f
∂y
λ2 (y)
(y; x)
dy =
(Z ∂Sc (y; x)
∂y
2
f (y; x) dy
) Z
p (y)
dy ,
λ2 (y)
where we note that only the second factor depends on λ. Thus we can redefine the optimization
4.1. Asymptotic approximation
189
problem as
minimize
w.r.t. λ (y)
Z
Z
subject to
Z
p (y)
dy,
λ2 (y)
f (y; x) log2 [λ (y)] dy ≤ R − hy − log2 (NI ) ,
λ (y) dy = 1,
λ (y) > 0.
To find the optimal λ, we take the logarithm of the integral to be minimized
Z
Z
p (y)
p (y)
log2
dy = log2
f (y; x) dy
λ2 (y)
λ2 (y) f (y; x)
and we apply Jensen’s inequality (the logarithm is a concave function)
Z
Z
p (y)
p (y)
f (y; x) dy ≥ log2 2
f (y; x) dy,
log2
λ2 (y) f (y; x)
λ (y) f (y; x)
now we exponentiate both sides of the inequality
h
nR
i
o
Z
p(y)
p (y)
log2 2
f (y;x) dy
λ (y)f (y;x)
f (y; x) dy ≥ 2
.
λ2 (y) f (y; x)
(4.23)
To obtain equality in the Jensen’s inequality the term in the argument of the logarithm in
the RHS of (4.23) must be a constant, thus
p (y)
λ⋆ (y) ∝
f (y; x)
1
2


= R
2
∂Sc (y;x)
∂y
2
∂Sc (y;x)
f (y; x)
∂y
1
2
dy

 .
Integrating the constraint that λ (y) is a PDF makes the constant in the denominator of the
expression above to disappear, thus giving
∂Sc (y;x) ∂y .
(4.24)
λ⋆ (y) = R ∂S (y;x) c∂y dy
The exponential in (4.23) can be written as a function of the rate constraint. We multiply
the rate constraint by −2 and we add hy in both sides. We have
Z
1
log2 2
f (y; x) dy ≥ −2R + 3hy + 2 log2 (NI ) .
λ (y) f (y; x)
R
Finally, we add log2 [p (y)] f (y; x) dy to obtain
Z
Z
p (y)
log2 2
f (y; x) dy ≥ −2R + 3hy + 2 log2 (NI ) + log2 [p (y)] f (y; x) dy.
λ (y) f (y; x)
(4.25)
190
Chapter 4. High-rate approximations of the FI
The integral in the RHS of (4.25) is
(
)
Z
Z
Z
∂Sc (y; x) 2
log2 [p (y)] f (y; x) dy =
log2
f (y; x) dy + log2 [f (y; x)] f (y; x) dy
∂y
)
(Z Z
∂Sc (y; x) 2
f (y; x) dy f y ′ ; x dy ′
− log2
∂y
(
2 )
Z
∂Sc (y; x)
=
log2
f (y; x) dy − hy
∂y
)
(Z ∂Sc (y; x) 2
f (y; x) dy .
− log2
∂y
Substituting the expression above in (4.25) and the result in (4.23), we obtain the minimum
value of the integral in the optimization problem. This value is
2
i
h
R
R ∂Sc (y;x) 2
∂S (y;x) −2 R−hy − log2 c∂y f (y;x) dy+ 12 log2
f (y;x) dy −log2 (NI )
∂y
.
Substituting this value in the approximation of the FI, we get
n
h
i
o
(y;x) 1 −2 R−hy −R log2 ∂Sc∂y
f (y;x) dy
.
(4.26)
Iq ≈ Ic − 2
12
Notice that again here the FI for quantized measurements tends exponentially to the FI for
continuous measurements, the exponential decay rate is sensible to the randomness of the
continuous measurements and to the derivative of the score function. The difference in the
quantizer characterization w.r.t. to the fixed rate encoding scheme is that now the interval
density is dictated only by the derivative of the score function, we must put more intervals
around values of Y that have a larger score function variation.
Observe that the quantizer interval distribution may depend also on the true parameter
value, as the score function may be a function of it. Thus, similarly to the fixed rate scheme,
it will be necessary to set adaptively the thresholds. The main problem now is that we need
to know the quantizer outputs probabilities to encode the outputs with their proper length,
however as we do not know completely the measurement distribution, we cannot encode the
words properly. As a solution, we can also use an adaptive solution for encoding, using as
distribution for encoding, the distribution with the most recent estimate of the parameter.
The problem with this solution is that we cannot encode correctly at the beginning of the
adaptive estimation procedure, we will be penalized in terms of average length in the initial
part of the procedure and as a consequence we will not respect the rate constraints. Thus,
this solution is still not complete. Further work will be necessary, we can try to quantify the
increase in rate at the beginning of the estimation procedure or we can try to find an encoding
scheme with variable rate that quantize the measurements properly, without knowing the true
parameter value.
4.1.6
Estimation of GGD and STD location and scale parameters
We will apply the results given in solution (c1) (p. 186) for obtaining the approximately
optimal quantization thresholds for estimation of location and scale parameters of the GGD
4.1. Asymptotic approximation
191
and the STD. Notice that even if their support is unbounded, as in standard quantization
theory, it is expected that the error caused by neglecting the extremal regions (overload
region) will be small.
Results for the estimation of a GGD location parameter
The first step for obtaining the approximately optimal thresholds is to evaluate the optimal
interval density given by (4.18). Thus, we start by calculating the derivative of the score
function w.r.t. x and y. Differentiating the logarithm of the GGD PDF (1.39)
!
y − x β
β
exp − f (y; x) =
δ 2δΓ β1
for β > 1, we obtain
∂Scx (y; x)
β (β − 1) y − x β−2
=
.
δ ∂y
δ2
Note that for β ≤ 1, which includes the Laplacian case, the score function is not differentiable
at x. Thus, we cannot evaluate the interval density for these cases. For β > 1, evaluating the
1
power 23 of the expression above and multiplying it by f 3 (y; x), we have the following interval
density:
y−x 2β−4
3 exp − 1 y−x β
δ
3
δ
λxGGD (y) =
,
(4.27)
C
where C is a constant normalizing the density. Using the symmetry of the density, this
constant can be evaluated as the following integral:
"
+∞
2β−4
#
Z
3
1 y−x β
y−x
dy.
exp −
C=2
δ
3
δ
x
An expression for this integral can be obtained by using the change of variables ε =
and identifying the resulting integral factor with the gamma function. This gives
1
1
2δ 13 2− β1
Γ
2−
.
C= 3
β
3
β
1
3
y−x β
δ
Now, we can obtain the CDF related to the interval density. Exploiting again the symmetry
of the distribution, we can obtain the CDF by integrating the PDF only for values of y larger
than x. Also, by using the same change of variables used above for calculating C, we get
h i
1
1 y−x β
1
2
−
,
γ
3
β
3
δ
1 sign (y − x)
x
h i
.
Fλ,GGD
(y) = +
2
2
Γ 1 2− 1
3
β
Using the inverse of this function we can obtain the approximately optimal thresholds (4.20).
For i ∈ {1, · · · , NI }
1
β
2i
1
1
1
2i
⋆,x
−1 1
, (4.28)
−1
3γ
2−
,
− 1 Γ
2−
τi,GGD = x + δsign
NI
3
β
NI
3
β
192
Chapter 4. High-rate approximations of the FI
where γ −1 {, } is the inverse incomplete gamma function.
The interval densities for three GGD (β = 1.5, 2 and 2.5) are shown in Fig. 4.1.
δ × λxGGD
0.6
β = 1.5
β = 2 (Gaussian)
β = 2.5
0.4
0.2
0
−3
−2
−1
0
1
2
3
y−x
δ
Figure 4.1: Interval densities for the estimation of a GGD location parameter. The GGD
shape parameters are β = 1.5, 2 and 2.5. Both axis are normalized to have plots independent
of x and δ.
A few remarks can be done based on the results above:
• in Ch. 1 we saw that the binary quantization is optimal for the Laplacian distribution, as
long as the quantizer threshold is placed at the true parameter. This singular behavior
might be related to the difficulties on defining the optimal interval density in this case.
• Observe that for 1 < β < 2 (see Fig. 4.1 for β = 1.5), the interval density tends
to infinity at zero showing the importance of quantizing around this point for these
distributions.
• Notice that within the subclass of GGD for which the density at x is finite, the Gaussian
distribution is the distribution with the lower β. Notice also that for β > 2 (see Fig.
4.1 for β = 2.5), the maximum of the interval density is not placed exactly at zero,
showing that a possible relation might exist between the multimodality of the threshold
distribution and the asymmetric behavior of optimal binary quantization. It shows also
that the Gaussian distribution is exactly between two subclasses of the GGD family, one
subclass for which quantizing around the true parameter is very informative (1 < β < 2)
and another subclass for which quantizing symmetrically around the parameter, but not
at the parameter, is informative (β > 2).
• Observing the symmetry of the interval density we can see that, asymptotically, the
best quantizer is symmetric around the parameter. Thus if we choose NI to be a large
even number, the optimal central threshold might be placed at x. For β > 2, if we
have a moderate odd number of quantization intervals, the interval density indicates
that the optimal quantizer will be probably asymmetric, as we will have to place more
quantization intervals around one of the modes of the interval density.
4.1. Asymptotic approximation
193
Results for the estimation of a GGD scale parameter
We evaluate now the derivative of the logarithm of f (y; δ) function w.r.t. δ and y. This gives
β 2 y − x β−1
∂Scδ (y; δ)
= 2
.
∂y
δ
δ Differently from the location problem, the derivative above exists for all positive β. Using this
derivative and the expression for f (y; δ) in (4.18), we have
y−x 2β−2
3 exp − 1 y−x β
δ
3
δ
,
(4.29)
λδGGD (y) =
C
where the normalizing constant can be obtained using the symmetry of the numerator
C=2
+∞
Z
y−x
δ
x
Changing the variables ε =
we get
1
3
y−x β
δ
C=
2β−2
3
"
1
exp −
3
y−x
δ
β #
dy.
and using the gamma function for rewriting the result,
2δ 13
3
β
2+ β1
Γ
1
1
2+
.
3
β
Using again the symmetry and a similar development as it was done for obtaining the CDF
for the interval density of the location problem, we have
h i
1
1 y−x β
1
2
+
,
γ
3
β
3
δ
1 sign (y − x)
δ
h i
Fλ,GGD
(y) = +
.
2
2
Γ 1 2+ 1
3
β
Its inverse gives the threshold approximation. For i ∈ {1, · · · , NI }
⋆,δ
τi,GGD
= x + δsign
1
β
2i
2i
1
1
1
−1 1
.
−1
3γ
− 1 Γ
2+
,
2+
NI
3
β
NI
3
β
(4.30)
The main differences w.r.t. the location parameter case is that, now, it is the Laplacian
distribution that is in the border between the distributions for which x is a very informative
point and the distributions for which most of the information is around x but not at x. Note
also that the interval density is still dependent on δ, thus, as it was said before, for placing
optimally the thresholds, an adaptive scheme is necessary.
Results for the estimation of a STD location parameter
Using the STD PDF (3.72)
"
#− β+1
2
Γ β+1
2
1 y−x 2
1+
,
f (y; x) = √
β
δ
δ βπΓ β2
194
Chapter 4. High-rate approximations of the FI
the derivative of the score function is
∂Scx (y; x)
∂y
h
y−x 2
δ
1
β+1 1− β
=
h
βδ 2
1 + β1
y−x 2
δ
i
i2 .
Replacing this expression and the PDF expression above in the interval density (4.18), we
obtain
h
i2
1 y−x 2 3
1
−
β
δ
1
.
(4.31)
λxST D (y) = h
i 9+β
C
1 y−x 2 6
1+ β δ
The constant C and the corresponding CDF cannot be expressed analytically, with known
special functions. For obtaining a general expression for the thresholds, it might be necessary
to use numerical integration of the density for each y and then invert an interpolation of the
numerical integration.
In the special case of a Cauchy distribution (β = 1), we can evaluate analytically the
constant in the density and the CDF. For this distribution the interval density is
h
1 1−
λxC (y) = h
C
1+
From the symmetry, the constant is
h
+∞
Z
1−
C=2
h
1+
x
Using the change of variables tan
C=δ
Zπ 0
θ
2
=
y−x
δ ,
y−x 2
δ
y−x 2
δ
y−x 2
δ
y−x 2
δ
i2
3
(4.32)
i5 .
3
i2
3
i 5 dy.
3
we obtain
π
2
Z2
3
2 1
2 θ
2 θ
cos
− sin
dθ = 2δ
cos (θ) 3 dθ,
2
2
0
where the second equality was obtained using a relation between trigonometric functions and
the periodic pattern of the resulting function in the interval [0, π). Using another change of
variables u = cos2 (θ) and identifying the resulting integral factor with the beta function, we
have
1 5
,
.
C = δB
2 6
Exploiting the symmetry of the interval density and using a similar development, we can
obtain the CDF related to the interval density
Rφ 1 sign (y − x) 0
x
Fλ,C
(y) = +
2
2
cos2 (θ)
B
1
1 5
2, 6
3
dθ
,
4.1. Asymptotic approximation
195
. Using again the change of variables u = cos2 (θ), we can rewrite
with φ = 2 arctan y−x
δ
the integral above using the incomplete beta function
Zφ
0
cos2 (θ)
1
3
 1 5
1 B 1, 5 − I 2
, for φ ∈ 0, π2 ,
cos (φ) 2 , 6
2h
2 6
i
dθ = 1
 2 2B 12 , 65 − Icos2 φ− π 21 , 56 , for φ ∈ π2 , π .
(
)
2
Transforming φ into the initial variable, we have the following CDF







y−x 




1
+
sign
1
−
1
1
1
sign
(y
−
x)
5
5
x
δ
#
"
Fλ,C (y) = +
B
,
,
−
I
2
y−x 2
1−(

2
2
2 6
2 6 
4B 12 , 65 
δ )






y−x 2
1+(
)
δ







 1 5

−1 
1 + sign y−x
5
1
δ
.
B
,
+ I" y−x 2 #2
,
+
1−(


2
2 6
2 6 
δ )





y−x 2
1+(
δ )
Inverting the CDF we can obtain the approximate expression for the thresholds. For i ∈
{1, · · · , NI } and i′ = i − N2I
⋆,x
τi,C
=
v v

u u −1

u 1−u
I


u t B( 1 , 5 )


2 6
u


′) u
v

x
+
δsign
(i

u
u
−1

I

t 1+u

t

1,5
B( 2

6)



v v


u u −1


u 1+u

I

u t B( 1 , 5 )


2 6
u


′) u

v
x
+
δsign
(i

u 1−u


I −1
t u

t

1,5

B( 2
6)
1−
! 1,5
2 6
)
! 1,5
2 6
4| i ′ |
)
4| i ′ |
NI
(
(
1−
,
when |i′ | ≤ 14 ,
NI
(4.33)
! 1,5
2 6
4| i ′ |
−1
NI
)
! 1,5
2 6
4| i ′ |
−1
NI
)
(
(
,
when |i′ | > 14 ,
−1
where I()
(, ) is the inverse incomplete beta function.
An interesting point on the optimal interval density for the estimation of a location pa√
rameter of the STD is that it equals zero exactly at x ± βδ indicating that around this point
not much statistical information can be obtained about the location parameter. If we observe
the score function we will see that it is a function with "∽" shape, the zero derivative point is
then related to the maximum and minimum of the score, in a practical sense, the points larger
than the maximum and smaller than the minimum can be seen as outliers, so for estimation
purposes we might not be interested in quantizing around the transition point. Note however
that from a threshold placement point of view, the only practical way of having a zero interval
density at this point is by placing a threshold at it, therefore in practice, we are interested in
knowing if the measurement is an outlier or not.
196
Chapter 4. High-rate approximations of the FI
Results for the estimation of a STD scale parameter
For estimating the scale parameter of the STD, we have the following derivative of the score
function




y−x
1


2
∂Scδ (y; δ)
β
δ
= 2 (β + 1) h
i2 .

∂y
δ
 1 + 1 y−x 2 

β
δ
This leads to the following interval density
with C given by


1 
δ
λST D (y) =
h

C
C=2

+∞
Z

x
Using a change of variables ε =
beta function, we obtain
h
1+
1
β
h

 1+
1
i
2
1+ β1 ( y−x
δ )
1
β
1
β
1
β
y−x δ
y−x δ
y−x 2
δ
y−x 2
δ
i2
i2 dy



,
(4.34)





.


and identifying the resulting integral with the
5 β+4
,
.
6
6
Exploiting the symmetry of the interval density and using the previous change of variables,
we have the following CDF related to the interval density
"
#
β+4
β+4
5
B 56 , 6 − I
1
6, 6
y−x 2
1+ 1 (
sign
(y
−
x)
1
)
δ
β δ
+
.
Fλ,ST
D (y) =
5 β+4
2
2
B ,
p
C = βδB
6
6
For i ∈ {1, · · · , NI }, the approximately optimal thresholds are then given by
1

 (
)−1
2
β
+
4
2i
5
⋆,δ
−1
−1
β I 5 β+4 2i ,
−β
.
τi,ST D = x + δsign


B ( 6 , 6 ) 1− N −1
NI
6
6
I
(4.35)
Note that similarly to the GGD scale parameter estimation case, the point x is not very
informative. Most of the quantizer intervals must be placed around x but not very close to x.
4.1.7
Location parameter estimation
To check the results, we will now focus on location parameter estimation.
First observe that using the normalized form for the PDF f (y; x) = 1δ fn y−x
, we can
δ
rewrite the interval density given by (4.18)
h
i2
"
#2
(1) 2 y−x 1
y−x (2) y−x 3
1
y−x 3 2
fn
− fn δ fn
∂ log δ fn δ
δ
δ
y−x 3
1
λ⋆ (y) ∝
fn
,
∝
y−x
∂y∂x
δ
δ
fn δ
4.1. Asymptotic approximation
(1)
197
(2)
where fn and fn are the first and second derivatives of fn w.r.t. its argument. For a fn
(1) 2
(2)
with even symmetry, fn
is even, fn is even and consequently λ⋆ (y) is symmetric around
x. This means that for large NI , the optimal quantizer is symmetric around x, indicating
that, asymptotically, the asymmetry of the optimal quantizer for binary quantization under
some distributions (Subsec. 1.3.4, p. 48) might disappear.
The asymptotic approximation of
malized PDF




1
2−2NB
⋆
x
Iq ≈ 2 Ic,n
−
δ 
12


Iq given by (4.19) can also be rewritten using the nor


Z
h
(1) 2
fn (ε)
−
(2)
fn (ε) fn (ε)
fn (ε)
i2
3
3 




dε ,



(4.36)
x is the FI for estimating a location parameter when δ = 1. Note that the FI approxwhere Ic,n
n)
, where κ is a functional depending only on the normalized
imation can be written as κ(f
δ2
PDF and independent of x and δ. Therefore, we can have a characterization of the optimal
estimation performance based on quantized measurements for a family of distributions with
different δ and x only by evaluating κ (fn ).
FI for the Gaussian and Cauchy cases
We will check the results using the Gaussian (GGD with β = 2) and Cauchy (STD with β = 1)
distributions.
For the Gaussian distribution, the interval density (4.27) and the asymptotic approximation of the FI (4.18) are given by
" 2 #
√ −(2N −1) i
2 h
x
B
1
y
−
x
≈
I
1
−
π
32
.
(4.37)
x
q,G
,
λG (y) = √ exp − √
δ2
δ 3π
3δ
We can note that the interval density in this case is exactly the same as for standard quantiza1
tion (proportional to f 3 ). Thus, in the Gaussian case when NI is large, the optimal quantizer
for estimating the location parameter and for recovering the continuous measurement is the
same. This coincidence between the optimal quantizer for estimation and for reconstruction
happens whenever the score function is constant. In the location parameter estimation case,
this happens only for the Gaussian distribution. If we look to the scale parameter case, this
will happen for the Laplacian distribution.
Observe also that if it was possible to implement the variable rate encoder in the Gaussian
case, then the optimal quantizer would be a uniform quantizer and it would coincide with the
optimal variable rate quantizer for reconstruction which is uniform [Gersho 1992, p. 299].
For the Cauchy distribution, the interval density (4.32) and the asymptotic FI approximation are the following:
λxC
(y) =
1
δB
1 5
2; 6
h
h
1−
1+
y−x 2
δ
y−x 2
δ
i2
3
i5 ,
3
x
Iq,C
#
"
3
B 21 ; 56
1
−2NB +1
.
2
≈ 2 1−
2δ
3π
(4.38)
198
Chapter 4. High-rate approximations of the FI
To evaluate the validity of the results, the FI (4.3) under both distributions for δ = 1 was
evaluated for
• the optimal set of thresholds for NB ∈ {1, 2, 3}. The optimal thresholds were obtained
through exhaustive search. For NB ∈ {4, 5, 6, 7, 8} the theoretical results (4.37) and
(4.38) were used as an approximation.
• uniform quantization considering NB ∈ {1, · · · , 8}. After setting the central threshold
to x, the optimal quantization interval step-length ∆⋆ was found by maximizing the FI
also using exhaustive search.
• the approximate optimal set of thresholds given by (4.28) and by (4.33), for NB ∈
{1, · · · , 8}.
The results are given in Tab. 4.1.
NB
1
2
3
4
5
6
7
8
x = 2)
Gaussian (Ic,n
Practical
Optimal
Uniform
approx.
1.27323954†
1.76503630†
1.93090199†
1.97874454⋆
1.99468613⋆
1.99867153⋆
1.99966788⋆
1.99991697⋆
–
1.76503630
1.92837814
1.97841622
1.99353005
1.99807736
1.99943563
1.99983649
1.27323954
1.75128300
1.92740111
1.98038526
1.99489906
1.99869886
1.99967136
1.99991741
x = 0.5)
Cauchy (Ic,n
Practical
Optimal
Uniform
approx.
0.40528473†
0.43433896†
0.48474865†
0.49533850⋆
0.49883463⋆
0.49970866⋆
0.49992716⋆
0.49998179⋆
–
0.43433896
0.45600797
0.48136612
0.49204506
0.49656712
0.49851056
0.49935225
0.40528473
0.40528473
0.47893785
0.49504170
0.49879785
0.49970408
0.49992659
0.49998172
Table 4.1: FI for the estimation of Gaussian and Cauchy location parameters based on quantized measurements. NB is the number of quantization bits. In Optimal† the maximum
FI obtained by exhaustive search of the thresholds is shown. Optimal⋆ is the theoretical
asymptotic approximation of the FI. Uniform shows the value of the FI for optimal uniform
quantization and Practical approx. gives the FI for the practical approximation of the
asymptotically optimal thresholds.
In all cases the fast convergence to the continuous FI with increasing NB is verified. Again
here, 4 or 5 bits are enough for obtaining an estimation performance close to the continuous
measurement performance. The difference of performance between uniform and nonuniform
quantization seems to be higher for the Cauchy distribution. In the Gaussian case, this
difference is negligible, indicating that in practice uniform quantization should be used (as it
is easier to implement). It can also be observed that the asymptotic approximation of the FI
and its true value for the practical approximation of the optimal threshold set are very close,
even for small values of NB (NB = 4).
Verification with the adaptive algorithm
As it was pointed out before, an important issue for evaluating the practical approximation
of the optimal thresholds τi⋆ is that they depend explicitly on x. Thus a possible solution to,
4.1. Asymptotic approximation
199
at the same time, obtain an estimate of the parameter and set the quantizer thresholds is to
use the adaptive algorithm proposed in Ch. 3
X̂k = X̂k−1 +
1
η (ik ) ,
kIq
with the threshold variations set τ ′ given by the practical approximation τ ⋆ with x in (4.20) set
f (τ ⋆ ;x)−f (τi⋆ ;x)
to zero and η (ik ) given by η (i) = − F τi−1
⋆ ;x . If NB ≥ 4, for a large k, the asymptotic
( i⋆ ;x)−F (τi−1
)
variance of the algorithm will be close to optimal and it will be given approximately by
h i
1
Var X̂k ≈ CRBq =
,
kIq
(4.39)
where Iq is the asymptotic approximation given by (4.19).
MSE × k
This algorithm was tested under both distributions for NB = 4 and 5. The MSE for the
algorithm was evaluated using Monte Carlo simulation, 4 × 106 realizations of blocks with
5 × 104 samples were used. The initial error x − X̂0 and δ were both set to be 1 in all
simulations. The MSE for the algorithm and the approximation given by (4.39) are both
given in Fig. 4.2, where they are multiplied by k for better visualization.
10−0.3
CRBq
Algorithm
10−0.35 0
10
101
102
103
Time [k]
104
MSE × k
(a)
100.35
CRBq
Algorithm
100.3
100
101
102
103
Time [k]
104
(b)
Figure 4.2: Simulated MSE for the adaptive algorithm considering Gaussian (a) and Cauchy
(b) measurement distributions. The numbers of quantization bits are NB = 4 and 5. The
initial estimation error and δ were set to 1 in all the cases. The simulated MSE was obtained
through Monte-Carlo simulation, 4 × 106 realizations of blocks with 5 × 104 error samples were
used. The curves that have asymptotically higher values correspond to NB = 4.
200
Chapter 4. High-rate approximations of the FI
We observe that the asymptotic algorithm performance is very close to the approximation.
For small k the CRB is not tight and that seems to be the reason for the algorithm to
perform better than the bound. In other simulations, it was also observed that using uniform
thresholds leads to faster convergence to the asymptotic performance. This indicates that in
practice an algorithm with changing thresholds can be used for obtaining better results. In the
convergence phase, a uniform set of thresholds is used, then after a given number of samples,
the thresholds change to the approximately optimal set.
4.2
Bit allocation for scalar location parameter estimation
The objective now is to solve problem (d) (p. 175). We have Ns sensors measuring independently the same location parameter x and the continuous measurements from one sensor to
another have all the same noise type with normalized PDF fn . The only difference between
the noise distribution from one sensor to another is the scale factor. For the Ns sensors, the
scale factors are denoted {δ1 , · · · , δNs }. Each sensor i quantizes its measurements with a
number of bits NB,i such that the total number of bits among the sensors is constrained to be
NB . The objective then is to find the allocation of bits {NB,1 , · · · , NB,Ns } that maximizes
the estimation performance.
The estimation performance for unbiased estimators in terms of variance can be characterized asymptotically by the CRB, which is related to the inverse of the FI. Thus, by maximizing
the FI, the asymptotic estimation performance is maximized. As the sensors measurements
are independent, the FI for the measurements from all the sensors Iq is the sum of the FI
Iq,i (NB,i ) for each sensor
Ns
X
Iq =
Iq,i (NB,i ) ,
(4.40)
i=1
where we made explicit the dependence of the FI for each sensor on the allocated number of
bits.
We will assume that the thresholds can be chosen so that Iq,i (NB,i ) is maximum. This
can be done for example by using the adaptive algorithm with decreasing gain to set optimally
the central threshold and then by choosing optimally the threshold variations. Thus, we want
to solve the following optimization problem:
maximize
w.r.t. NB,i
subject to
Iq =
Ns
X
Iq,i (NB,i ) ,
i=1
Ns
X
NB,i = NB ,
i=1
NB,i ∈ N,
where Iq,i (NB,i ) is the maximum FI for NB,i .
This problem can be solved exactly by evaluation of Iq for all possible combinations of
the NB,i . The numbers of allocated bits NB,i can assume values from 0 to NB but their sum
4.2. Bit allocation for scalar location parameter estimation
201
must be NB . Therefore, the NB,i form a weak composition of NB into Ns parts. The number
NB + Ns − 1
B +Ns −1)!
of possible allocations is
= (N
(Ns −1)!NB ! . If we have to solve the allocation
Ns − 1
problem for Ns = 20 and NB = 100, then the number of possible allocations that we have to
compare is approximately 4.9 × 1021 , which indicates that in practice the exact solution for
this problem is difficult to be obtained by exhaustive search.
4.2.1
Unconstrained numbers of bits
If we neglect the constraint that NB,i must be a non-negative integer and we suppose that
the asymptotic approximation of Iq (4.36) is valid for all real NB,i , then we can define a
maximization problem that can be solved analytically. Using the approximation (4.36), we
have that the total FI can be approximated by


 h
i 2 3 

3
(1)
2
(2)


Z fn (ε) − fn (ε) fn (ε)
Ns

X
1  x
2−2NB,i 

Iq ≈
.
(4.41)
dε
I
−


c,n

12
fn (ε)
δ2 

i=1 i 


Maximizing the approximation in the RHS of (4.41) is equivalent to minimizing
Ns
P
i=1
2
−2NB,i
δi2
as
x and the integral are constants if all the sensor noise types are equal. Thus the relaxed
Ic,n
form (without the integer constraints) of the bit allocation problem is the following:
Ns −2NB,i
X
2
minimize
w.r.t. NB,i
i=1
Ns
X
subject to
δi2
,
NB,i = NB .
i=1
We can solve this optimization problem by integrating the constraint in the function to be
minimized using a Lagrange multiplier. The Lagrangian (the function to be minimized) for
this minimization problem is
"N
#
" N
!
#
s
s
X
X
2−2NB,i
L=
+λ
NB,i − NB ,
δi2
i=1
i=1
where λ is the Lagrange multiplier. As the function is convex (it is a sum of negative exponentials plus a sum of linear terms), the zero gradient point of the Lagrangian w.r.t. the NB,i
gives a global minimum. The derivative of the Lagrangian w.r.t. NB,i is
−2 ln (2) 2−2NB,i
∂L
=
+ λ,
∂NB,i
δi2
which is zero for
NB,i =
log2
h
λδi2
2 ln(2)
−2
i
.
(4.42)
202
Chapter 4. High-rate approximations of the FI
To find λ it is necessary to use (4.42) in the sum constraint
Ns
X
NB,i =
Ns log2 (λ) − Ns log2 [2 ln (2)] + 2
−2
i=1
N
Ps
i=1
log2 (δi )
= NB ,
thus,
"N
#
s
2 X
NB
+ log2 [2 ln (2)] −
log2 (δi ) .
log2 (λ) = −2
Ns
Ns
(4.43)
i=1
Using (4.43) in (4.42) gives
"N
#
s
log2 [2 ln (2)]
1 X
NB
log2 [2 ln (2)]
−
+
log2 (δi ) − log2 (δi ) +
Ns
2
Ns
2
i=1
"N
#
s
1 X
NB
+
log2 (δi ) − log2 (δi ) ,
Ns
Ns
NB,i =
=
i=1
which can be rewritten as
NB,i



δ
NB
s i
− log2 
=

Ns
Qs
N N
s
j=1

δj


.


(4.44)
This is a correction on the uniform bit allocation that depends on the weight of the distribution
scale parameter in the geometric mean of the scale factors.
Note that the approximate allocation depends only on δi and no other information about
the distribution is required. In practice we can estimate δi for each sensor with an arbitrary
allocation and then we can use the estimates in (4.44) and round the results in a proper way
for obtaining for obtaining integer NB,i .
If we use the approximate solution from (4.44), we obtain
Iq ≈
=
"
"
x
Ic,n
x
Ic,n
= Ns
"
Ns
X
1
2
δ
i=1 i
"N
#)
s
2−2NB,i
κ′ (fn ) X
−
=
2
12
δ
i
i=1












 NB



−2


−log


2



 Ns




!#


Ns
Ns


′
X
X
2
κ (fn )
1
−
2

12 
δ
δi2


i=1
i=1 i
















!#
(
δ
v i
uN
u Qs
Nt
s
δj
j=1
#
2−2N̄B
κ′ (fn )
−
,
2
2
12 GM δ12 , · · · , δN
HM δ12 , · · · , δN
s
s
x
Ic,n
  


 

 


 


 

























=
(4.45)
4.2. Bit allocation for scalar location parameter estimation
2
where HM δ12 , · · · , δN
s
=
Ns
N
Ps
1
2
i=1 δi
2
and GM δ12 , · · · , δN
s
=
s
Ns
203
N
Qs
j=1
δj2 are the harmonic
and geometric means of the squared scale factors, κ′ (fn ) is the integral factor in (4.36) and
B
N̄B = N
Ns is the number of allocated bits per sensor that would be obtained if we had used a
uniform bit allocation.
If we compare this result to uniform bit allocation
Ns
κ′ (fn ) −2N̄B
x
Ic,n −
,
Iq ≈
2
2
12
HM δ12 , · · · , δN
s
we can verify that, as the geometric mean is larger than the harmonic mean, the approximate
optimal bit allocation performs better than or equal to the uniform bit allocation.
If it was possible to implement this allocation scheme, an interesting point for future study
would be to study the influence of the variability of the precision of the sensors δ12 on the
i
estimation performance. This might be done for example by considering that the δ12 are i.i.d.
i
r.v. with a given distribution (a gamma distribution for example) with known parameters, then
by assuming large Ns for a fixed N̄B , we can apply the law of large numbers to the harmonic
and the geometric means in the approximation of Iq (4.45) to obtain a characterization of the
approximately optimal FI as a function of the parameters of the precision distribution. This
approach, even if approximate, might give some insight on the performance of estimation of
asymptotically large heterogeneous sensor arrays under communication rate constraints.
204
Chapter 4. High-rate approximations of the FI
We have the following solution to problem (d) (p. 175):
Solution to (d) - Unconstrained approximate optimal bit allocation
for location parameter estimation
(d1) For i ∈ {1, · · · , Ns }, the approximate optimal bit allocation is
given by (4.44)


NB,i


δ
NB
s i
=
− log2 

Ns
Qs
N N
s
j=1
δj


.


Appropriate rounding can be used to obtain NB,i ∈ N.
• For the approximate optimal bit allocation, the FI is given by
(4.45)
"
#
x
Ic,n
κ′ (fn )
2−2N̄B
−
,
Iq ≈ Ns
2
2
12 GM δ12 , · · · , δN
HM δ12 , · · · , δN
s
s
x
where Ic,n
is the continuous FI for δ = 1, κ′ (fn ) =
 h
3
i2
(1) 2
(2)
R
fn (ε)−fn (ε)fn (ε) 3
1 
B
dε , N̄B = N
12
Ns is the average number
fn (ε)
of bits per sensor and HM and GM are the harmonic and geometric means of the scale parameters.
4.2. Bit allocation for scalar location parameter estimation
4.2.2
205
Positive numbers of bits
For obtaining a more realistic solution, we can constrain the numbers of bits to be nonnegative
reals. This gives the following optimization problem:
minimize
w.r.t. NB,i
subject to
Ns −2NB,i
X
2
i=1
Ns
X
δi2
,
NB,i = NB ,
i=1
NB,i ≥ 0.
The Lagrangian is the same as for the unconstrained problem. Using the zero gradient
condition, we have
h 2 i
λδi
log2 2 ln(2)
= ν − log2 (δi ) ,
NB,i =
−2
where ν is a constant to be chosen. Note that the positivity constraint imposes the following
form for NB,i
NB,i = [ν − log2 (δi )]+ ,
(4.46)
with [x]+ = max (x, 0). The sum constraint gives
Ns
X
i=1
NB,i =
Ns
X
i=1
[ν − log2 (δi )]+ = NB .
(4.47)
Thus, the constant ν is chosen so that (4.47) is satisfied and then the number of bits can be
chosen according to (4.46). Again here, appropriate rounding might be used to obtain integer
numbers of bits.
Observe that this approximate bit allocation is equivalent to water-filling, a common solution to allocate power to carriers in multicarrier modulation. The main difference is that in
this case the channel noise is replaced by log2 (δi ) and the "water depths" are the number of
bits instead of the power levels.
In Fig. 4.3, both water-filling solutions are shown, for power allocation in multicarrier
systems and for approximate bit allocation in constrained rate sensing systems.
When the δi are also unknown, we can mix the two extensions of the adaptive algorithm
with decreasing gain presented in Ch. 3 (fusion center + joint estimation of the scale) to have
estimates of the scale parameters. Then, we can use the estimates to obtain the approximate
allocation. In practice, the value of ν can be evaluated at the fusion center and broadcasted to
the sensors with the location parameter. The sensors can use the broadcasted ν with a local
estimate of the location parameter for obtaining the optimal NB,i . The critical point with this
approach will be the final rounding step, which will require an agreement (and consequently
communication) between the sensors to respect the total bandwidth constraint.
206
Chapter 4. High-rate approximations of the FI
Noise
level
Power
Total "water volume"
sums to P
P5
P3
N1
P4
P2
N5
N3
N4
N2
1
2
ν–"water level"
3
4
5
Carrier
(a)
Number
of bits NB,i
log2 (δi )
Total "water volume"
sums to NB
ν–"water level"
NB,2
NB,1
NB,4
log2 (δ3 )
NB,5
log2 (δ2 )
log2 (δ1 )
1
log2 (δ4 )
2
3
4
log2 (δ5 )
5
Sensor
(b)
Figure 4.3: Both water-filling solutions for multicarrier modulation power allocation (a) and
for rate constrained sensing system bit allocation (b).
This gives the following solution to problem (d) (p. 175):
Solution to (d) - Constrained approximate optimal bit allocation
for location parameter estimation
(d2) For i ∈ {1, · · · , Ns }, the approximate optimal bit allocation is
obtained by choosing ν so that (4.47) is satisfied
Ns
X
i=1
NB,i =
Ns
X
i=1
[ν − log2 (δi )]+ = NB .
With the value of ν satisfying (4.47), the numbers of bits can
be obtained using (4.46)
NB,i = [ν − log2 (δi )]+ .
Integer NB,i can be obtained with appropriate rounding. The
corresponding FI can be approximated by substituting the optimal NB,i in (4.41).
4.3. Generalization with the f –divergence
4.3
207
Generalization with the f –divergence
In this section, we will discuss a generalization of the asymptotic results to different inference
problems. The generalization that we will study is based on the generalized f –divergence
(GFD), which is presented in [Poor 1988]. The objective of this section is to show the main
differences between the asymptotically optimal quantizers for different inference problems.
4.3.1
Definition of the generalized f –divergence
The GFD is a generalization of the f –divergence (also known as the Ali–Silvey distance)
studied in [Ali 1966] (cited in [Poor 1988]). For a continuous r.v. Y , the GFD Df is defined
as
Df,c = E {f [l (Y )]} ,
(4.48)
where l is a measurable function and f is a continuous convex function. For a quantized
measurement i from Y the GFD is defined as
Df,q = E f EY |i [l (Y )] = E [f (lqi )] .
(4.49)
Developing the conditional expectation and supposing that Y accepts a PDF p (y), we can
rewrite (4.49) as
NI
X
Df,q =
f (lqi ) P (i) ,
(4.50)
i=1
where
P (i) =
Z
p (y) dy,
(4.51)
qi
and
l qi =
R
qi
l (y) p (y) dy
R
.
p (y) dy
(4.52)
qi
4.3.2
Generalized f –divergence in inference problems
The performance of some important inference problems can be written as a function of the
f –divergence. Three examples are given below.
Classical estimation
For classical estimation, we want to estimate a deterministic parameter x embedded in noisy
independent measurements Y1:N . The quantized version of this problem is the main problem
treated in thesis.
Under some regularity conditions, we know that the asymptotic MSE of the optimal unbiased estimator of x attains the CRB which is given by the inverse of the FI. The FI for N
independent measurements is given by N times the FI for one measurement.
208
Chapter 4. High-rate approximations of the FI
If we look to the forms of Ic and Iq we can see that the FI for one measurement is a GFD
with l (y) = Sc (y; x) and f (l) = l2 . Therefore, the GFD is directly linked to the asymptotic
performance of classical estimation.
Bayesian estimation
Consider that instead of estimating a deterministic parameter, we want to estimate a random
parameter X based on a noisy measurement Y .
From Ch. 2 (2.7) (p. 78), we know that MSE = EY VarX|Y (X) , which can be rewritten
o
n
as EY EX|Y X 2 − E2X|Y (X) . This gives
h
i
MSE = E X 2 − EY E2X|Y (X) .
This function is decreasing w.r.t. the second term, which is a GFD. Proceeding similarly for
the quantized measurement version of the problem, we can conclude that the performance
depends on a GFD with l (y) = EX|Y =y (X) and f (l) = l2 .
For N identically distributed measurements, the MSE for Bayesian estimation can also
be rewritten as a GFD, but in this case a generalization to the non-scalar case is needed. In
[Marano 2007], we can find details for this case with a variable rate approach for quantization.
We can also approach Bayesian estimation for N measurements as a sequential single measurement problem, where at each new observation we can use the last posterior as the new
prior. Using this approach for each measurement, the MSE will be given by the scalar version
of the GFD explained above.
Neyman–Pearson detection
We consider now the detection problem. We have N i.i.d. measurements Y1:N . The measurements are all obtained from one of two distributions with PDF p0 (y) or p1 (y). Based on the
N measurements we want to decide from which of the two distributions the measurements are
obtained. The index of the true measurements distribution will be denoted H ∈ {0, 1} and
the decision that we make based on the N measurements will be denoted Ĥ.
For specifying the performance of the decision procedure we can consider a Neyman–
Pearson strategy [Van Trees 1968, p. 33]. In the Neyman–Pearson strategy, we set an upper
bound on the probability of deciding Ĥ = 1 when H = 0 to a fixed constant α and the
performance of the decision procedure is given by the minimum probability β of deciding
Ĥ = 0 when H = 1. When N is asymptotically large, the limit of β can be characterized
using Stein’s lemma [Blahut 1987] (cited in [Gupta 2003])
1
lim β N = exp {−DKL [p0 (y) ||p1 (y)]},
N →+∞
where DKL [p0 (y) ||p1 (y)] is the Kullback–Leibler divergence (KLD)
Z
p0 (y)
dy.
DKL [p0 (y) ||p1 (y)] = p0 (y) log
p1 (y)
4.3. Generalization with the f –divergence
209
For quantized measurements, a similar theorem can be stated by replacing p0 (y) and p1 (y)
by the corresponding probabilities of the quantizer outputs. Therefore, for this problem, the
KLD is the criterion to be maximized to increase detection performance. If we take the
(y)
and
opposite of the KLD, we can see that it is a GFD with l the likelihood ratio l (y) = pp10 (y)
f (l) = − log (l). The expectation in the GFD in this case is evaluated w.r.t. the probability
measure for H = 0.
Detection of weak signals
We can also consider the detection of a low amplitude signal. We follow a similar presentation
as in [Poor 1988]. For this problem, the Yk for k ∈ {1, · · · , N } are i.i.d. and distributed
according to p (yi ) or p (yi − θxi ), where p is the noise marginal PDF and xi is a known signal
N
P
with finite average power x¯2 = N1
x2k . If we consider a large number of measurements
k=1
N → ∞ and small signal amplitude θ → 0, then the performance of the optimal detector in
terms of β in the Neyman–Pearson strategy is related to the efficacy
ρ = x¯2
Z


dp(y)
dy
p (y)
2
 p (y) dy = x¯2
Z d log [p (y)]
dy
2
p (y) dy.
When this quantity is maximized, we maximize asymptotic detection performance. Note that
the integral factor is exactly the FI for estimating a location parameter of the PDF p. Thus in
this case, the inference performance can also be written as a GFD with l (y) =
and f (l) = l2 .
4.3.3
dp(y)
dy
p(y)
=
d log[p(y)]
dy
Asymptotic results
Similarly to the asymptotic development for the FI, we will write asymptotic approximations
for the loss of GFD incurred by quantization. After obtaining the asymptotic loss for the GFD,
we will obtain the optimal interval densities for the fixed rate and variable rate encoding cases.
Asymptotic GFD loss
The loss of GFD due to quantization can be defined as
Lf = Df,c − Df,q =
NI
X
Lf,i ,
(4.53)
i=1
where Lf,i is the loss for each quantization interval
Lf,i =
Z
qi
{f [l (y)] − f (lqi )} p (y) dy.
(4.54)
210
Chapter 4. High-rate approximations of the FI
For obtaining the asymptotic approximation, we write the Taylor series expansions of l
and p around the central point yi and of f around a point li
l (y) = li +
(y)
li (y
(yy)
− yi ) +
li
(y − yi )2 + ◦ (y − yi )2 ,
2
(4.55)
(yy)
pi
(y − yi )2 + ◦ (y − yi )2 ,
2
(ll)
f
(l)
f (l) = fi + fi (l − li ) + i (l − li )2 + ◦ (l − li )2 .
2
(y)
p (y) = pi + pi (y − yi ) +
(4.56)
(4.57)
Using (4.57) and (4.55), the function f [l (y)] on the interval qi can be written as
"
#
(yy)
(ll) li
fi
(y) 2
(l)
(y)
2
2
(y − yi ) +
li
(y − yi ) + ◦ (y − yi )2 .
f [l (y)] = fi + fi li (y − yi ) +
2
2
(4.58)
We use (4.55) and (4.56) in (4.52) to evaluate lqi
l qi =
R
qi
(y)
li + li
(yy)
pi
(y)
2
2
2
2
p
+
p
(y
−
y
)
+
dy
(y
−
y
)
+
◦
(y
−
y
)
(y
−
y
)
+
◦
(y
−
y
)
i
i
i
i
i
i
i
2
2
(yy)
R
p
(y)
pi + pi (y − yi ) + i 2 (y − yi )2 + ◦ (y − yi )2 dy
(yy)
(y − yi ) +
li
qi
(yy)
=
l i pi ∆ i + l i
pi
2
∆2i
12
(yy)
li
∆3i
(y) (y) ∆3i
12 + 2 pi 12
(yy)
∆3i
pi
3
2 12 + ◦ ∆i
+ l i pi
pi ∆ i +
+ ◦ ∆3i
.
(4.59)
To evaluate f [lqi ] we will replace (4.59) in (4.58). We proceed first by evaluating lqi − li
l qi − l i =
(yy)
∆3
pi 12i + ◦ ∆3i
.
(yy)
∆3
p
pi ∆i + i2 12i + ◦ ∆3i
(y) (y) ∆3i
12
l i pi
li
+
2
Note that lqi − li has a factor ∆2i , thus (lqi − li )2 = ◦ ∆3i . This leads to

(l)
f [lqi ] = fi + fi 
(yy)

∆3i
3
p
+
◦
∆
i 12
i 
3
2
.
+
◦
∆
i
(yy)
∆3i
pi
3
pi ∆i + 2 12 + ◦ ∆i
(y) (y) ∆3i
12
l i pi
+
li
(4.60)
Now, we will evaluate the two terms in Lf,i (4.54). Multiplying the expansion of f [l (y)]
(4.58) by the expansion of p (y) (4.56) and integrating, we obtain
#
"
Z
(yy)
(yy)
3
li
pi ∆3i
∆3i
(l)
(y) (y) ∆i
+ f i l i pi
+
pi
f [l (y)] p (y) dy = fi pi ∆i + fi
2 12
12
2
12
qi
+
(ll)
fi (y) 2 ∆3i
li
pi + ◦ ∆3i .
2
12
(4.61)
4.3. Generalization with the f –divergence
Using (4.60) and integrating the expansion for p (y), we get
#
"
Z
(yy)
(yy)
3
3
pi ∆3i
l
∆
∆
(l)
(y) (y) i
f [lqi ] p (y) dy = fi pi ∆i + fi
+ f i l i pi
+ i pi i + ◦ ∆3i .
2 12
12
2
12
211
(4.62)
qi
Subtracting (4.62) from (4.61), we get the loss in the interval qi
Lf,i =
Therefore, the total loss is
(ll)
fi (y) 2 ∆3i
pi + ◦ ∆3i .
li
2
12
#
" (ll)
NI
X
fi (y) 2 ∆3i
3
pi + ◦ ∆ i .
li
Lf =
2
12
(4.63)
i=1
Similarly to the asymptotic development for the FI, we have
lim
NI →∞
NI2 Lf
1
=
24
Z
2
f (ll) [l (y)] l(y) (y) p (y)
dy.
λ2 (y)
(4.64)
The optimal interval density for fixed rate encoding is then given by
2 1
1
f (ll) 3 [l (y)] l(y) (y) 3 p 3 (y)
.
λ (y) = R
2 1
1
f (ll) 3 [l (y)] l(y) (y) 3 p 3 (y) dy
⋆
(4.65)
If the PDF of the measurements is completely known and given by p (y), then a similar
development as it was done for the FI leads to the following optimal variable rate encoding
interval density:
p
f (ll) [l (y)] l(y) (y)
⋆
λvr (y) = R p
.
f (ll) [l (y)] l(y) (y) dy
4.3.4
Interval densities for inference problems
We will compare now the different interval densities for the inference problems described
above. In Tab. 2 we give the different functions defining the GFD for each problem and the
corresponding optimal interval density. We also give the optimal interval density for variable
rate encoding, whenever variable rate encoding is possible.
212
Chapter 4. High-rate approximations of the FI
Inference problem
l (y)
f (l)
Classical estimation
Sc (y; x)
l2
Bayesian estimation
EX|Y =y (X)
l2
N–P detection
p0 (y)
p1 (y)
− log (l)
Weak signal detection
dp(y)
dy
p(y)
l2
λ⋆ (y) ∝
∂Sc (y;x)
∂y
2
3
λ⋆vr (y) ∝
1
p 3 (y; x)
2
dE
1
3
X|Y =y (X)
p 3 (y)
dy
2
i
h
p (y) 3
1
d log p0 (y)
1
p03 (y)
dy
n
d2 log[p(y)]
dy 2
o2
3
1
p 3 (y)
–
dEX|Y =y (X) dy
–
2
d log[p(y)] dy2 Table 4.2: Functions characterizing the GFD for different inference problems and interval densities maximizing the inference performance based on quantized measurements. The interval
density λ⋆ (y) is the density optimizing the performance when encoding is done with fixed
rate, λ⋆vr (y) is the density for variable rate encoding.
Notice that for Bayesian estimation and weak signal detection, we give expressions for
the variable rate optimal density. In Bayesian estimation, as we have a prior on the true
parameter, we know the probabilities of the quantizer outputs, thus we can define correct
lengths for the outputs. In weak signal detection, as the amplitude of the signal is small, we
can consider that the encoding can be done approximately by using the noise distribution.
While in classical estimation, we have the effect of the score function derivative, in Bayesian
estimation the optimal interval density is affected by the optimal estimator x̂ = EX|Y =y (X)
function. Note also that, differently from the classical estimation case, where the interval
density is affected directly by the true parameter value, in Bayesian estimation the influence
of the parameter appears only through its prior. Thus even if x is unknown in the Bayesian
case, an optimal quantizer can be implemented in practice3 .
Observe that classical estimation for a location parameter with value x = 0 and weak
signal detection have exactly the same interval density. Actually, the performance of weak
signal detection can be seen equivalently as the performance of estimating a small constant
with i.i.d noise and marginal PDF p (y). Thus it is not surprising that the optimal interval
densities are the same.
The optimal density for Neyman–Pearson detection that we have obtained here is exactly
the same as the one obtained in [Gupta 2003] in the scalar case. Note that similarly to Bayesian
estimation, where the sensibility of the key element for inference, the optimal estimator, has
a direct impact on the interval density, in detection, the sensibility of the logarithm of the
continuous measurement
ilikelihood ratio plays an important role. Note also that the logh
p0 (y)
likelihood ratio log p1 (y) = log [p0 (y)] − log [p1 (y)] for two distributions parametrized by x
and x + ε with small ε can be rewritten using an expansion around x
log [p0 (y)] − log [p1 (y)] = log [p (y; x)] − log [p (y; x + ε)] = ε
∂ log [p (y; x)]
+ ◦ (ε) .
∂x
The optimal interval density is then approximately given by the optimal density for classical
estimation. This makes explicit the link between the density for weak signal detection and for
3
Optimal in this case for a given prior, if the prior does not represent well the reality, then the Bayesian
setting is not useful and optimality is meaningless.
4.4. Chapter summary and directions
213
classical estimation.
4.4
Chapter summary and directions
We summarize the main points from this chapter and possible directions for future work:
• We developed an asymptotic high-rate approximation for the FI for quantized measurements. The approximation shows that the FI for quantized measurements tends to the
FI for continuous measurements when the number of quantization intervals tends to
infinity. When the quantizer outputs are all coded with binary words with the same
length (fixed rate encoding), the approximation of the FI tends exponentially to the FI
for continuous measurements as a function of the number of quantization bits.
• The asymptotic performance approximation obtained depends on the specific choice of
the quantizer intervals through the quantizer interval density. For fixed rate encoding,
1
the optimal interval density is shown to depend not only on the PDF through f 3 , as it
is common in standard quantization, but also on the derivative of the score function.
In practice for finite number of bits, the optimal interval density can be approximated by
setting the quantization thresholds using the inverse of the CDF related to the interval
density. As the CDF depends on the parameter that we want to estimate, a recursive
procedure for joint estimating and resetting the thresholds is necessary for obtaining
asymptotically optimal performance (asymptotic both in N and NI ). For example, we
can use the adaptive algorithms presented in Ch. 3 when we want to solve a location
estimation problem. In general we can use the adaptive MLE approach.
When the length of the binary words are chosen to minimize the mean length of the
quantizer output, the optimal density is shown to depend directly on the derivative
of the score function. The problem with this approach is that not only setting the
quantizer thresholds depends on the measurement distribution, but also the encoding
method depends on it. Even if we can attain the best asymptotic performance by using
an adaptive technique for setting the thresholds, we will not respect the rate constraint
during all the initial time of the estimation procedure, when the parameter estimate is
far from the true parameter value.
• The practical approximation of the asymptotically optimal quantization thresholds was
obtained for the estimation of location and scale parameters of the GGD. For the STD,
we obtained the practical approximation in the Cauchy case for location and in general
for scale.
The asymptotic results were tested in the location problem with the Gaussian and
Cauchy distributions. We compared the asymptotic approximation of the FI with the FI
for optimal uniform quantization and with the FI for the practical approximation of the
optimal thresholds. We observed that, with only 4 bits, the FI obtained with the practical approximation is very close to the asymptotic approximation. We also observed that,
in the Gaussian case, the gain of performance obtained with nonuniform quantization is
214
Chapter 4. High-rate approximations of the FI
negligible, while in the Cauchy case it is small. This indicates that in practice uniform
quantization might be a better solution, as it requires a lower complexity.
By using the adaptive algorithm, we have shown that the asymptotically optimal results
can be obtained in practice. During the simulation of the adaptive algorithm, it was
observed that uniform quantization leads to faster convergence when compared with
nonuniform quantization. An interesting point for future research is then to study
adaptive algorithms that start with a threshold set optimized for faster convergence and
then change the threshold set, so that asymptotically the performance is also optimal.
• By using the asymptotic results, we have obtained approximations for the optimal bit
allocation when we estimate a location parameter using multiple sensors and the total
number of quantization bits is constrained.
The first approximate solution was given by considering unconstrained numbers of bits
(positive and negative reals), the approximate optimal bit allocation is shown to be
a correction on the uniform bit allocation (equal number of bits for each sensor) that
depends on the weight of the noise scale parameter on the geometric mean of all the scale
parameters. The FI given by this approximate optimal bit allocation is shown to depend
on the harmonic and geometric means of the noise scale parameters. An interesting
point for future work is the analysis of the approximate FI for the optimal allocation
when the number of sensors is very large and the sensors scale parameters are random
with a given known distribution. As the approximate FI depends on the geometric and
harmonic means of the scale parameters, by using the law of large numbers, we expect to
obtain an approximation of the FI depending on the parameters of the scale distribution.
The second approximation was given by considering a more realistic scenario, with the
numbers of bits constrained to be positive. The approximate optimal bit allocation is
given by a water-filling solution, which is a well known solution for the problem of power
allocation in multicarrier modulations. For the bit allocation problem, the logarithm of
the scale parameter plays the equivalent role of the noise power in multicarrier power
allocation and the number of bits plays the role of the power to be allocated.
The water-filling solution depends on the scale parameters, in a fusion center approach,
the fusion center can use the estimates of the scale parameters to obtain an approximate
solution. The solution of the problem is mainly determined by only one parameter,
the "water level", after obtaining the approximate "water level", the fusion center can
broadcast it to the sensors so that they can set their quantizer resolution. A problem
that still need to be solved in this case is how the sensors will coordinate their final
choice on the numbers of bits (which are constrained to be integers) so that the total
rate constraint is respected.
• As a final point of this chapter, we revisited the asymptotic approximation for the f –
divergence loss due to quantization presented in [Poor 1988]. The objective of this part
was to show that the asymptotic approximation of the FI presented in this chapter can be
seen as a special case of the asymptotic approximation of a general performance measure
for inference problems and to show the links between the asymptotic characterization of
the quantizers for different inference problems.
4.4. Chapter summary and directions
215
We saw that there is a close link between quantization for weak signal detection and
for classical estimation of a location parameter. In practice, as we will use an adaptive
algorithm with static quantizer centered at zero for classical estimation, the quantizer
thresholds for these two problems are exactly the same. The link between Neyman–
Pearson detection and Bayesian estimation is that, for both, the quantizer depends on
the sensibility of their key quantities: the Neyman–Pearson detection optimal interval
density depends on the sensibility of the log-likelihood ratio and the Bayesian estimation
optimal density depends on the sensibility of the estimator.
• Additional to the points for further study presented above many other points can also
be investigated:
– The vector quantizer extension of the asymptotic approximation of the FI can
be considered: vector quantization is the most natural extension of the results
presented here.
– Further study of the Bayesian case: for one sample asymptotic characterization,
we saw that a recursive approach can be used. In practice, this solution may be
too complex to be implemented as we need to evaluate completely the continuous
measurement estimator and consequently the posterior for obtaining the optimal
quantizer at each sample. For obtaining a simple solution, we can consider high
resolution quantizers that are designed to optimize the asymptotic (large number
of samples) performance of Bayesian estimation.
– Dealing with the overload region: a main point that was neglected in the analysis is
that, in practice, most noise PDF that are used for modeling have infinite support.
In this chapter we considered explicitly that the noise PDF have bounded support,
so that it would not be necessary to deal with the overload region. In future work,
we can try to deal with the overload region.
– Asymptotic approximation of the optimal uniform quantizer for estimation: during
all this thesis we considered the explicit optimization through a grid search of the
optimal quantization step in uniform quantization. We can try to obtain an analytic
characterization of the optimal step by considering an asymptotic approach.
Conclusions of Part II
In Part II, we studied the asymptotic performance of estimation of a scalar deterministic
parameter based on quantized measurements. Asymptotically in this case means:
• that the number of samples tends to infinity N → ∞, so that we can use the FI to
characterize the estimation performance.
• That the number of quantization bits tends to infinity NB → ∞, so that we can use
high-rate approximations of the FI to determine analytically the loss of performance
induced by quantization.
We obtained the following conclusions:
• The asymptotic loss of performance due to quantization decreases exponentially as a function of the number of bits. The loss of FI due to quantization is
shown to decrease exponentially with increasing numbers of bits. Even if the results are
asymptotic, they indicate that it is probably not useful to increase the sensor quantizer
resolution when a target performance is not met. Probably, it is more reasonable, as we
saw in Part I, to increase the number of sensors, or if it is possible, to increase sampling
frequency and to use sensors with smaller noise amplitude (smaller noise scale factor).
• Asymptotic may be low to medium resolution in practice. Using a practical
approximation of the asymptotically optimal thresholds for finite number of quantization
intervals in the location estimation problem (Gaussian and Cauchy cases), we have
shown that the corresponding FI is very close to the asymptotic approximation of the
FI for numbers of bits as low as 4. For 1,2 and 3 bits the optimal threshold variations
can be found easily by grid search and the central threshold can be adjusted in all
cases with an adaptive algorithm. This means that in practice, for all numbers of bits,
we can set, at least approximately, the quantizer thresholds to have asymptotically
optimal quantization for location parameter estimation under Gaussian and Cauchy
distributions.
A question that still remains unanswered is if this is true in general, for different types
of measurement distribution and for the estimation of other types of parameters.
• Uniform is not bad at all. Although we can use, in practice, nonuniform quantization
of the measurements to have asymptotically optimal performance. The gap between
the performance for optimal uniform quantization and the performance for nonuniform
quantization in location parameter estimation is small. As uniform quantization is easier
to be implemented, it seems that, in practice, uniform quantization may be a better
solution.
217
Conclusions
Main conclusions
In this thesis, we have studied the problem of estimation based on quantized measurements,
a problem that has attracted increasing attention of the signal processing community due to
the emergence of sensor networks. More specifically, we treated the problem of estimating a
scalar parameter, either constant or varying with a Wiener model, based on quantized noisy
measurements of the parameter.
We observed that for most commonly used noise models, the estimation performance
degrades when the quantizer dynamic range is far from the true parameter value, indicating
that a good solution can be obtained by adaptively setting the quantizer range using the most
recent estimate of the parameter.
Using the adaptive scheme, the loss of estimation performance due to quantization seems to
be small. For all the tested cases (different noise PDF, constant or slowly varying parameter),
a small loss is observed when we use 1–3 quantization bits and a negligible loss is observed
for 4 or 5 quantization bits. This indicates that the solution of the remote sensing problem
under constrained communication rate is linked to low resolution sensor networks:
• If we consider that the problem is constrained to be solved with a sensor network approach, then from the results above, we can see that quantization with low resolution is
a solution to this problem.
• If we constrain the problem to be a remote sensing problem based on quantized measurements, then a low resolution sensor network approach seems to be an appropriate
solution.
As the standard estimation algorithms for attaining the small loss of performance have high
complexity, we proposed a low complexity adaptive algorithm that achieves asymptotically the
same performance. Extensions of the algorithm were proposed for the cases when the noise
scale factor is unknown and when multiple sensors are available.
We also studied the problem of how to set the quantization thresholds for obtaining optimal
estimation performance when a large number of quantization intervals is available. We used the
asymptotic approach (the quantizer intervals tend to zero) to obtain an approximation of the
optimal thresholds, this approach also allowed to obtain an approximate analytical expression
for the estimation performance (the FI) as a function of the number of quantization bits. The
approximation of the FI for quantized measurements is shown to converge exponentially to the
FI for continuous measurements. The approximate analytic expression was shown to be valid
in the location estimation problem even for small numbers of bits (4 in this case), indicating
that the result, which is expected to be exact only when the number of bits tends to infinity,
can be useful in practice, if we consider non uniform quantization.
219
220
Conclusions
From the asymptotic approach, we show that the optimal thresholds may depend on the
parameter, which is unknown. This reinforces the importance of the adaptive approach,
which allows to set the thresholds asymptotically according to their optimal values, leading
to asymptotically optimal estimation performance.
We also want to point out that the difference between using the optimal general threshold
scheme (non uniform) and the optimal uniform scheme for the location problem is small. In
practice, if low complexity is needed, then uniform quantization may be a better solution.
Perspectives
We finish this "conversation" between quantization and estimation, highlighting some subjects
for future discussion. Some details of these subjects were already discussed at the end of the
chapters, therefore here we give only the main lines.
• Vector parameter and vector quantization: this is the direct extension of the problem,
while the vector quantization extension might be straightforward to study, both in terms
of proposing algorithms and studying their asymptotic (in terms of numbers of samples
and quantization intervals) behavior, the vector parameter extension seems to be less
straightforward, specially because it would require a redefinition of the estimation performance and it would require a full extension of the algorithms to vectors, for exploiting
correctly the correlation between the components.
• Noisy channels: in the "DSP party", most of the time, communication is not invited,
we can propose to invite it to the next party by adding the communication channel
in the problem. A noisy communication channel can be considered in multiple ways.
The simplest way for introducing it in the problem is by indexing the quantized measurements with binary words and then considering the channel as an extension of the
binary symmetric channel. While for a fixed indexing the extensions of the algorithms,
specially the low complexity one proposed here, might be simple, the problem of optimal
estimation/indexing can be difficult.
Different extensions can be considered by introducing a continuous channel, for example
additive and fading channels. In this case we might consider the problem of indexing,
by assigning real values to the quantized measurements, this will generate again a joint
problem of estimation/codebook design.
• Estimation under unknown noise distribution: we supposed that the noise distribution
is known, at least up to a scale factor, however, in practice, this assumption cannot be
always satisfied and we will need to look for different approaches to estimate the location
parameter based on quantized measurements.
There are other topics that were not discussed explicitly in this thesis, but they are interesting subjects for future research. They are the following:
Conclusions
221
• Fast variations: to develop some parts of this thesis, we considered that the parameter to
be estimated was a slowly varying Wiener process, under this hypothesis we have shown
that the loss of performance due to quantization is small. The unanswered question here
is whether this conclusion is true or false for the estimation of fast varying processes.
• Distributed problem: in this thesis, we treated the simplified remote sensing problem,
where we have only one sensor. In the only case where a multiple sensor approach
was treated, we used the fusion center approach. Thus, we still need to generalize the
concepts and algorithms developed here to a partially or completely distributed setting,
where a cluster head or each sensor wants to obtain estimates based on the information
from all the sensors.
• Continuous time: for a varying parameter, we considered that the parameter model was
inherently discrete and we did not discuss sampling issues. Thus a subject to be studied
is the estimation of a continuous process based on sampled and quantized measurements.
A
Appendices
A.1
A.1.1
Why? - Proofs
Proof that E [Sc Sq ] = E Sq2
We will consider a general parameter estimation problem in the proof. The density of the measurement will be f (y; x) instead of f (y − x). Adding the dependence of Sq on the quantizer
output index i, y and x, the expectation of the product is
Z
∂ log f (y; x)
Sq (i (y) ; x) f (y; x) dy.
(A.1)
E [Sc Sq ] =
∂x
R
Separating the integral in (A.1) in a sum of integrals on the different quantization intervals
qi :
X Z ∂ log f (y; x)
Sq (i (y) ; x) f (y; x) dy.
E [Sc Sq ] =
∂x
i∈I qi
Sq is a constant function inside an interval qi , thus, in an interval, it does not depend on y
and it can leave the integral
Z
X
∂ log f (y; x)
f (y; x) dy.
E [Sc Sq ] =
Sq (i; x)
∂x
i∈I
qi
Rewriting the continuous measurement score function in ratio form gives
Z ∂f (y;x)
X
∂x
Sq (i; x)
E [Sc Sq ] =
f (y; x) dy,
f (y; x)
i∈I
qi
supposing that we can change the order of integral and the partial derivative leads to
X
∂P (i; x)
.
E [Sc Sq ] =
Sq (i; x)
∂x
i∈I
Multiplying and dividing each term of the sum by its corresponding P (i; x), we have
E [Sc Sq ] =
X
Sq (i; x)
i∈I
∂P(i;x)
∂x
P (i; x)
P (i; x) .
We can identify the score function as the second factor, leading to
X
E [Sc Sq ] =
Sq2 (i; x) P (i; x) = E Sq2 .
i∈I
223
(A.2)
224
A.1.2
A. Appendices
Proof of the upper bound on F (ε) [1 − F (ε)] for the Gaussian distribution
We can write F (ε) [1 − F (ε)] as the probability of two i.i.d. Gaussian r.v. X1 and X2 to be
in the respective intervals [−∞, x] and [x, ∞]. Thus, this probability can be written as the
integral of their joint PDF (see (1.24) for the marginal Gaussian PDF form)
"
#
x21 + x22
1
f1,2 (x1 , x2 ) = f (x1 ) f (x2 ) = 2 exp −
πδ
δ2
on the area A0 + A1 of Fig. A.1. From the i.i.d. assumption, the integral on the area A1 is
equal to the integral on the area A′1 . Therefore, F (ε) [1 − F (ε)] is equal to the integral of
f1,2 (x1 , x2 ) on A0 + A′1 . It is easy to see that the area outside the quarter circle C1 in the
fourth quadrant is not smaller than the area of A0 +A′1 . Denoting the area outside the quarter
¯
circle in the fourth quadrant by C¯1 , we can say that P (X1 , X2 ∈ A0 + A1 ) ≤ P Xp
1 , X2 ∈ C 1 .
2
2
Changing the coordinates from
rectangular (x1 , x2 ) to polar (r, θ), where r = x1 + x2 is
x1
x2
the radius and θ = arctan
P X1 , X2 ∈ C¯1 =
is the angle, we have that
Z0 Z∞
− π2 x
"
"
#
#
Z∞
r2
r2
1
1
r exp − 2
drdθ = 2 r exp − 2
dr.
πδ 2
δ
2δ
δ
x
Changing variables one more time r′ = rδ , we obtain
P X1 , X2 ∈ C¯1
1
=
2
Z∞
x
δ
′
r exp −r
′2
1
dr = −
4
′
Z∞
x
δ
x 2
1
.
−2r′ exp −r′2 dr′ = exp −
4
δ
Consequently,
1
x 2
¯
F (ε) [1 − F (ε)] = P (X1 , X2 ∈ A0 + A1 ) ≤ P X1 , X2 ∈ C1 = exp −
.
4
δ
A.1.3
Proof that the FI for estimating a Laplacian location parameter with
noise scale δ is δ12 .
The score function (1.15) for the location parameter of the Laplacian distribution is (PDF
given by (1.27)):
∂ log 2δ12 − y−x
1
∂ log f (y − x)
δ
Sc =
=
= sign (y − x) ,
∂x
∂x
δ
where we used the fact that the derivative of the absolute value function is the sign function.
The FI is then given by
Ic = E
Sc2
=
+∞
Z
−∞
y − x
1 1
dy.
exp − δ 2 2δ
δ A.1. Why? - Proofs
225
x1
(0, x)
A1
C1
x2
(x, 0)
(0, − x)
A′1
A0
Figure A.1: Geometric scheme to show that the probability of the interval A0 + A1 is less than
the probability of the exterior region of the left quarter circle C1 .
Changing variables y ′ =
y−x
δ
and using the symmetry of exp (− |y ′ |), we get
+∞
Z
1
exp −y ′ dy ′ = 2 .
δ
1
Ic = 2
δ
0
A.1.4
Proof that the FI for estimating a Cauchy location parameter with
noise scale δ is 2δ12 .
The score function (1.15) for the location parameter of a Cauchy distribution (PDF given by
(1.33)) is the following:
∂ log f (y − x)
Sc =
=
∂x
h
h
∂ − log (πδ) − log 1 +
∂x
y−x 2
δ
ii
=h
The FI can be evaluated then with the following integral
Ic = E
=
Sc2
8
πδ 3
4
= 2
πδ
+∞
Z
x
+∞
Z
−∞
h
1+
h
y−x 2
δ
1+
y−x 2
δ
y−x 2
δ
y−x 2
δ
2
δ
1+
y−x δ
i.
y−x 2
δ
i3 dy
i3 dy,
where the second equality comes from the symmetry of the integrand. Changing variables
2
tan (θ) = y−x
δ . We must change dy = δ sec (θ) dθ and the integration limits also change to 0
226
and
A. Appendices
π
2.
Using the trigonometric identity 1 + tan2 (θ) = sec2 (θ), we have
π
Ic =
8
πδ 2
Z2
π
tan2 (θ)
sec6 (θ)
sec2 (θ) dθ =
8
πδ 2
0
Z2
sin2 (θ) cos2 (θ) dθ.
0
2
Using trigonometric identities, we have that
(θ) cos2 (θ) = 81 [1 − cos (4x)]. The integral
sin
of the term cos (4x) is zero on the interval 0, π2 . Therefore, we finally obtain
π
Ic =
1
πδ 2
Z2
dθ =
1
.
2δ 2
0
A.1.5
Proof that the FI for N measurements quantized adaptively with NI
N
P
quantization intervals is IqNI =
E [Iq (εk )].
k=1
Making more explicit the dependence of P (i1:N ; x) on the adaptive central thresholds τ0,0:N −1
by the conditional probability P (i1:N |τ0,0:N −1 ; x) and exploiting the independence between
the measurements conditioned on the central threshold used to obtain them, we can write
that the joint probability used in the score function evaluation factorizes as follows:
P (i1:N |τ0,0:N −1 ; x) =
N
Y
k=1
P (ik |τ0,k−1 ; x) .
Thus, the log-likelihood is given by
log L (x; i1:N ) =
N
X
k=1
log P (ik |τ0,k−1 ; x) .
The FI is then given by
IqNI
= E
(
∂ log L (x; i1:N )
∂x
2 )
=E
 2 
N
P

 






log P (ik |τ0,k−1 ; x) 
∂


 







"
#2 
N
 X
∂ log P (ik |τ0,k−1 ; x) 
= E
,


∂x
k=1
∂x






 
k=1
where the expectation is evaluated w.r.t. the joint probability measure of the r.v. i1:N and
τ0,0:N −1 . We can decompose the joint expectation in a composition of two expectations using
conditioning. For 2 r.v. X and Y and a function h, this is
EX,Y [h (Y, X)] = EX EY |X [h (X, Y )] .
A.1. Why? - Proofs
227
The subscripts indicate the corresponding probability measure used for the evaluation. For
example, X|Y corresponds to the conditional probability measure of X given Y . Using this
decomposition on the FI above:
"

#2  
N
 X

∂ log P (ik |τ0,k−1 ; x) 
IqNI = Eτ0,0:k−1 Ei1:k |τ0,0:k−1
.



∂x
k=1
By expanding the square of the inner sum, we have that the inner expectation is a sum of
2
∂ log P(ik |τ0,k−1 ;x)
expectations of squared score functions
and products of score functions for
∂x
∂ log P(ik |τ0,k−1 ;x) ∂ log P(ij |τ0,j−1 ;x)
with j 6= k. As the samples are conditionally
different samples
∂x
∂x
independent given their central thresholds, the conditional expectations of the squared scores
are equal to the sum of conditional expectations, each conditional expectation will be evaluated
with the probability measure of its corresponding ik |τ0,k−1 . For the crossed terms the same
happens, but now each conditional expectation will be evaluated with respect to ik,j |τ0,k,j , as
the pairs of measurements are conditionally independent, the conditional expectation of the
product of scores is the product of conditional expectations. Finally, as the expectation of
each score function is zero [Kay 1993, pp. 67], the expectation of the sum of cross products is
zero. Therefore, we have
(N
(
2 ))
X
∂
log
P
(i
|τ
;
x)
k 0,k−1
IqNI = Eτ0,0:k−1
Eik |τ0,k−1
.
∂x
k=1
The terms in the inner sum depends each on a different τ0,k−1 , thus by marginalization (integration w.r.t. others τ0,k−1 ), we get
IqNI
=
N
X
k=1
Eτ0,k−1
(
Eik |τ0,k−1
(
∂ log P (ik |τ0,k−1 ; x)
∂x
2 ))
.
Observe that the inner expectation is the FI for each observation ik parametrized by τ0,k−1
and x. We can re-parametrize it by the difference εk = τ0,k−1 −x, writing it using the notation
of (1.13). Therefore, we obtain
IqNI =
N
X
k=1
A.1.6
Eεk {Iq (εk )} .
Proof that the posterior PDF can be written in recursive form using
prediction and update expressions
For obtaining a relation between the PDF p (xk |i1:k−1 ), that we will call prediction PDF,
and the posterior for instant k − 1, p (xk−1 |i1:k−1 ), we will use conditioning on the joint
density/distribution (PDF for X and probability for i) of the variables Xk , Xk−1 and i1:k−1
p (xk , xk−1 , i1:k−1 ) = p (xk |xk−1 , i1:k−1 ) p (xk−1 |i1:k−1 ) P (i1:k−1 ) .
228
A. Appendices
Exploiting the fact that conditioned on Xk−1 the r.v. Xk is independent of all the past
measurements, we have
p (xk , xk−1 , i1:k−1 ) = p (xk |xk−1 ) p (xk−1 |i1:k−1 ) P (i1:k−1 ) .
On the other hand, conditioning only on the measurements, we obtain
p (xk , xk−1 , i1:k−1 ) = p (xk , xk−1 |i1:k−1 ) P (i1:k−1 ) .
Equating the last expressions gives
p (xk , xk−1 |i1:k−1 ) = p (xk |xk−1 ) p (xk−1 |i1:k−1 ) .
Marginalization of Xk−1 gives the prediction expression
Z
p (xk |i1:k−1 ) = p (xk |xk−1 ) p (xk−1 |i1:k−1 ) dxk−1 .
R
As it was stated before, we can notice that for obtaining the prediction PDF we must use the
last posterior and the transition PDF p (xk |xk−1 ) that characterizes the dynamical model.
For obtaining the update expression, we will start by conditioning the joint density/distribution
function p (xk , ik , i1:k−1 )
p (xk , ik , i1:k−1 ) = P (ik |xk , i1:k−1 ) p (xk |i1:k−1 ) .
As ik given xk is independent from all the other r.v., we have
p (xk , ik , i1:k−1 ) = P (ik |xk ) p (xk |i1:k−1 ) P (i1:k−1 ) .
Now, conditioning on the entire set of measurements
p (xk , ik , i1:k−1 ) = p (xk |i1:k ) P (i1:k ) .
Using both last expressions, we get
p (xk |i1:k ) =
P (ik |xk ) p (xk |i1:k−1 ) P (i1:k−1 )
,
P (i1:k )
this result can be simplified by applying conditioning on the denominator. Absorbing the
factor P (i1:k−1 ), we have
p (xk |i1:k ) =
P (ik |xk ) p (xk |i1:k−1 )
.
P (ik |i1:k−1 )
The conditional probability on the denominator can be expressed using marginalization of
p (ik , xk−1 |i1:k−1 ) = P (ik |xk ) p (xk |i1:k−1 ) ,
which finally gives the update expression
p (xk |i1:k ) = R
R
P (ik |xk ) p (xk |i1:k−1 )
.
P ik |x′k p x′k |i1:k−1 dx′k
Note that for updating the prediction to the posterior distribution, we introduced the information from the measurement through P (ik |xk ).
A.1. Why? - Proofs
A.1.7
229
Proof that Ic for the GGD is
1
1 β(β−1)Γ(1− β )
.
2
1
δ
Γ( β )
The continuous measurement FI for the GGD distribution is obtained using the PDF expression (1.39) in the integral (3.58)
Ic,GGD =
h
Z
R
(1)
fGGD (x)
i2
β3
dx =
fGGD (x)
δ 3 Γ β1
+∞
Z
x 2β−2
x β
dx,
exp −
δ
δ
0
where we used the fact that the function to be integrated is an even functionfor obtaining an
β
integral on [0, +∞). We can now change the integration variable to z = xδ , this produces
1
dx = βδ z β
−1
dz, leading to the following integral
Ic,GGD
β2
=
δ 2 Γ β1
+∞
Z
1− 1
z β exp (−z) dz.
0
The integral is equal to Γ 2 − β1 , thus using the property of the gamma function Γ (1 + z) =
zΓ (z), we have finally
1
β
(β
−
1)
Γ
1
−
β
1
Ic,GGD = 2
.
δ
Γ 1
β
A.1.8
Proof that Ic for the STD is
1 β+1
.
δ 2 β+3
For the STD, the continuous measurement FI is obtained using its PDF expression (3.72) in
(3.58). As the function to be integrated is an even function, we can integrate it only in the
positive real semi-axis. This gives
Z
h
(1)
fST D (x)
i2
dx
fST D (x)
R
+∞
2 "
2 #− β+5
Z
2
Γ β+1
2
1
(β + 1)2
x
x
√
√
√
dx.
1
+
=
2
√
3
β
δ β
δ β
Γ β δ β π
Ic,ST D =
2
0
For evaluating this integral, we can change the integration variable to θ using tan (θ) = δ√x β ,
2
√
this produces dx = βδ cos12 (θ) , 1+ δ√x β = cos12 (θ) and an integration interval 0, π2 , leading
to
π
Z2
Γ β+1
2
1 (β + 1)2
2√
2 sin2 (θ) cos (θ)β+1 dx.
Ic,ST D =
β
β
δ
π
Γ 2
0
230
A. Appendices
The integral factor multiplied by 2 can be identified to the beta function B
beta function can be written using the gamma function
Γ 3 Γ β + 1
2
2
3 β
,
B
, +1 =
β+1
2 2
+2
Γ
3 β
2, 2
+ 1 . The
2
which can be rewritten using the fact that Γ
Γ (1 + z) = zΓ (z). This gives
B
3 β
, +1
2 2
leading finally to
3
2
√
π
2
and the property of the gamma function
β
Γ
√
2
β
,
= π
(β + 3) (β + 1) Γ β+1
2
Ic,ST D =
A.1.9
=
1 β+1
.
δ2 β + 3
Minimization of the asymptotic variance w.r.t. η under the asymptotic zero mean constraint.
For simplifying the notation we will use η and fd , suppressing the subscripts and superscripts.
The problem we want to solve is
η ⊤ Fd η
,
η ⊤ fd fd⊤ η
minimize
w.r.t. η
2
σ∞
=
subject to
Fvec,⊤
η = 0,
d
(A.3)
η 6= 0,
where Fvec
d is the diagonal of Fd in vector form. This problem can be also cast as a maximization problem
maximize
w.r.t. η
subject to
η ⊤ fd fd⊤ η
1
=
,
2
σ∞
η ⊤ Fd η
(A.4)
Fvec,⊤
η = 0,
d
η 6= 0.
As Fd is a diagonal matrix it can be decomposed as the product of diagonal matrices
formed with the square roots of the diagonal terms
1
1
Fd = Fd2 Fd2 .
Thus using the change of variables
−1
η = Fd 2 η ′ ,
A.1. Why? - Proofs
231
the problem (A.4) becomes
−1
−1
maximize
w.r.t. η ′
η ′ ⊤ Fd 2 fd fd⊤ Fd 2 η ′
subject to
Fvec,⊤
Fd 2 η ′ = 0,
d
η′⊤η′
,
−1
η ′ 6= 0.
This problem can be solved by constraining η ′ ⊤ η ′ to be equal to one and then maximizing
the numerator
maximize
w.r.t. η ′
η ′ Fd 2 fd fd⊤ Fd 2 η ′ ,
subject to
η ′ η ′ = 1,
⊤
−1
−1
(A.5)
⊤
−1
Fvec,⊤
Fd 2 η ′ = 0,
d
η ′ 6= 0.
−1
Note that Fvec,⊤
F 2 is a transposed vector with the square roots of Fvec
d . This term will be
d1 ⊤d
,vec
from now on. This problem has been treated in [Golub 1973] and we will
denoted Fd2
apply here the same development.
The Lagrangian of the maximization problem (A.5) is given by
1
1
−1
,vec
⊤ −
⊤
L = η ′ Fd 2 fd fd⊤ Fd 2 η ′ − λ η ′ η ′ − 1 + 2µη ′ Fd2 ,
where λ and µ are Lagrange multipliers. The zero derivative point of the Lagrangian w.r.t.
η ′ is given as the solution of the following equation:
1
−1
−1
Fd 2 fd fd⊤ Fd 2 η ′ − λη ′ + µFd2
Multiplying by
1
Fd2
1
,vec
2
Fd
,vec
⊤
⊤
,vec
(A.6)
= 0.
gives
−1
−1
Fd 2 fd fd⊤ Fd 2 η ′
1
,vec
2
− λ Fd
⊤
′
1
,vec
2
η + µ Fd
⊤
1
Fd2
,vec
= 0.
As Fd are quantizer output probabilities, we have
1
Fd2
,vec
⊤
1
Fd2
,vec
= 1.
Now, using the expression above and the second equality constraint (that the asymptotic mean
is zero) on the factor that multiplies λ, we obtain
1
,vec
2
µ = − Fd
⊤
−1
−1
Fd 2 fd fd⊤ Fd 2 η ′ .
232
A. Appendices
Substituting this expression for µ in (A.6), we get
"
1 ⊤ # 1 ⊤
− 12
−1
,vec
,vec
−1
I − Fd
Fd2
Fd2
Fd 2 fd fd⊤ Fd 2 η ′ = λη ′ ,
where I is the identity matrix. Clearly,
and the optimal
η′
is the eigenvector of
"
I−
P′
−1
Fd 2
1
,vec
2
Fd
1
,vec
2
Fd
⊤
−1
⊤ #
= P′ is a projection matrix
− 21
Fd 2 fd fd⊤ Fd
that gives the maximum
λ. For a squared matrix A and projection matrix P′ , we know that the maximum eigenvalue
function λ () respects the following equality:
2
λ P′ A = λ P′ A = λ P′ AP′ .
This means that the optimal η ′ can also be found as the eigenvector of
−1
−1
P′ Fd 2 fd fd⊤ Fd 2 P′
−1
−1
related to the maximum eigenvalue λ. As the only non zero eigenvector of P′ Fd 2 fd fd⊤ Fd 2 P′
−1
is P′ Fd 2 fd , this is the optimal η ′ . Changing back to the initial vector η, we have
"
1 ⊤ #
,vec
−1
− 21
− 12
Fd2
F d 2 fd .
I − Fd
η ∝ Fd
The proportional ∝ comes from the fact that the solution of (A.3) is defined up to a proportional factor. Expanding the expression gives
1 ⊤
− 12 12 ,vec
,vec
−1
η ∝ F−1
f
−
F
F
Fd2
F d 2 fd
d d
d
d
∝ F−1
d fd − 1fd ,
where 1 is a squared matrix filled with ones.
A.1.10
Proof that
1
fdT F−1
d fd
=
1
(j) 2 (j)
f˜
i
d
(j) (j)
i
j=1 i(j) ∈I (j) F̃d
N
Ps
P
[ ]
[ ]
in the fusion center approach.
h i
h ′i
(l′ ) (l′ )
(l′ )
For simplifying notation the sensor superscript in F̃d
i
and f˜d i(l ) will not be written,
the dependence
h ′ ion the sensor
h ′ i number will be done implicitly through the argument of the
(l
)
function F̃d i
and f˜d i(l ) . Using the fact that that Fd is diagonal, we can write
2












P
h
i
N
s
Q
′
Ns ˜
(j
)
(j)
F̃
i
i
f
d
j=1 d






j′ = 1






′
X
j
=
6
j
T −1
fd F d fd =
,
N
Qs
(j)
i∈I ⊗Ns
F̃d i
j=1
A.1. Why? - Proofs
233
where I ⊗Ns is the set of all possible i. Developing the quadratic term, the sum above is equal
to the sum of two terms
fdT F−1
d fd = I 1 + I 2 ,
with
I1 =


Ns n
(j) o2 N
Qs n h (j ′ ) io2 
P



˜


F̃d i
fd i






j=1
′


j
=
1






′

X 
j =
6 j
i∈I ⊗Ns
and
I2 =











N
Qs
j=1
F̃d i(j)

















































h
i
h
i
N
N
N
N


s
s
s
s
Q
P
P
Q


′
′
(m)
(l)
(l
)
(m
)


˜
˜


i
F̃
i
f
i
f
F̃
i
d
d
d
d










l=1




′
′






l
=
1
m
=
1
m
=
1


















′
′


X
l =
6 l
m=
6 l
m =
6 m
i∈I ⊗Ns

















N
Qs
j=1
F̃d
i(j)
.

















Dividing the common factors in I2 and rewriting the sum, we obtain





















N
N
N
s
s
s


h i h
h i
i Y
X X X
(l)
(p)
(m)
˜
˜
I2 =
f i
F̃d i
fd i

 d



i∈I ⊗Ns l=1 m = 1 


p
=
1








m 6= l 
p
=
6
l





p 6= m











































N
N
N
s
s
s




h
i
h
i
h
i
X X
X
X
Y
X
(l)
(m)
(p)
,
=
f˜d i
f˜d i
F̃d i








⊗N
⋆
(l)
(l)
(m)
(m)
s
l=1



i∈I



m = 1 i ∈I i ∈I 
p=1
















m 6= l
p
=
6
l











p 6= m
(l)
(m)
where I ⊗Ns ⋆ is the set of all combinations of i, without considering
(p) i and i . The interior
is a probability. Thus I2
sum in the RHS of the last equality equals to one because F̃d i
234
A. Appendices
is given by
I2 =
=
Ns
X
l=1
Ns
X
m=1
m 6= l
i(l) ∈I (l) i(m) ∈I (m)
n
Ns
X
Ns
X

 X

i 
l=1
m=1
m 6= l
X
 (l)
i
X
h
f˜d i(l)
∈I (l)
h i h
io
f˜d i(l) f˜d i(m)
X
  (m)
i
∈I (m)
h
f˜d i(m)

i
.

From the symmetry assumptions f˜d i(j) is an odd function of i(j) , therefore I2 = 0.
The term I1 can be rewritten by dividing common factors from the numerator and denominator. This gives














N
N
s
s

h ′ i
Y
X X f˜2 i(j)
d
(j )
I1 =
F̃d i
.
 F̃d i(j)




′
i∈I ⊗Ns j=1 


j =1






′
j 6= j
Changing the order of summation and separating the sum for the sensor index j from the
others, we obtain



























N
N
s
s




h
i
2
(j)
X
Y
X X
f˜d i
′
(j )
F̃d i
I1 =
,
(j)




i
F̃




d
⊗N
⋆
(j)
(j)
s
j=1 i ∈I




i∈I




j′ = 1







 j ′ 6= j



where now I ⊗Ns ⋆ is the set of all possible i without considering i(j) . As the inner sum is equal
to one, we finally have
Ns
X
X f˜2 i(j)
d
T −1
fd F d fd =
i(j)
F̃
d
j=1 (j) (j)
i
and consequently
1
fdT F−1
d fd
=
Ns
P
∈I
1
P
j=1 i(j) ∈I (j)
f˜d2 [i(j) ]
F̃d [i(j) ]
.
A.2. More? - Further details
A.2
A.2.1
235
More? - Further details
Discussion on the issues of finding the MLE
Binary quantization. In Subsection 1.3.6, we give an analytic expression for the MLE in
the binary quantization case. In this case the MLE depends on the noise distribution mainly
through the inverse of the CDF, thus existence and unicity of the MLE are guaranteed by the
monotonicity of noise CDF (implicitly stated in the assumption AN2).
Multibit and dynamic quantization: log-concave distributions. In the multibit case,
or even in the binary case when the threshold is not static, we cannot write a closed-form
expression for the the MLE. In this case, we have to use a numerical method for the evaluation
of the maximum.
In the case of log-concave distributions (the Gaussian distribution is an example), we can
show that, as it is explained in Subsection 1.4.4, the log-likelihood with quantized measurements is concave. Thus, in this case the log-likelihood has only one maximum which can be
found very efficiently using the Newton’s algorithm.
Multibit and dynamic quantization: general distributions. If the distribution is not
log-concave, then the Newton’s algorithm does not necessarily converge. If it converges, it can
converge very slowly when compared to the log-concave distribution case. It can also happen
that the likelihood has multiple maxima, in this case, any technique based on the gradient
may fail to find the global maxima and other types of maximization techniques must be used.
As a simple example of non log-concave noise distribution, we can consider the Cauchy
distribution with PDF and CDF given by (1.33) and (1.34) respectively. The log-likelihood
for estimating x with δ = 1, τ = [−3 − 2 − 1 0 1 2 3]⊤ and ik = {−3, −3, −4, 3, 3, 3} is
shown in Fig. A.2.
log (L)
−16
−18
−20
−22
−4
−2
0
x
2
4
Figure A.2: Log-likelihood function for estimating x based on the quantized measurements
ik = {−3, −3, −4, 3, 3, 3}. The quantizer has NI = 8 and τ = [−3 − 2 − 1 0 1 2 3]⊤ . The
distribution of the noise is Cauchy with δ = 1.
We can clearly note the multimodality of the log-likelihood function.
236
A.2.2
A. Appendices
MLE for estimation of a constant based on binary quantized measurements: uniform/Gaussian noise case.
The MLE for binary quantized measurements is given by (1.45)
!#
"
N
1 X
−1 1
1−
ik
.
X̂M L = τ0 − F
2
N
k=1
The function F −1 () is the inverse of the noise CDF. For the uniform/Gaussian case, the CDF
is given by (1.37)
α

ε+ 2
1

Φ
,
for ε < − α2 ,

C
σ






 h
i
1
F (ε) = C1 12 + √2πσ
ε + α2 , for − α2 ≤ ε ≤ α2 ,






α i
h


 1 √ α + Φ ε+ 2
, for ε > α ,
C
where C = 1 +
2
σ
2πσ
√α .
2πσ
As the CDF is decomposed in three parts, for inverting the CDF we
N
1 P
1
ik , the
might distinguish three possible cases. Using the notation 1 − P̂M L = 2 1 − N
k=1
cases are the following:
• 1 − P̂M L <
•
1
2C
1
2C ,
≤ 1 − P̂M L ≤
• 1 − P̂M L >
1
C
1
2
1
C
+
1
2
+
√α
2πσ
√α
2πσ
,
.
Using the inverse of F () for each case in the expression of the estimator above, we get

i
h α
1
−1 C 1 − P̂

τ
+
−
σΦ
,
for 1 − P̂M L < 2C
,

0
M
L
2


h
i
√
α
1
,
X̂M L = τ0 + α2 − 2πσ C 1 − P̂M L − 21 ,
for 2C
≤ 1 − P̂M L ≤ C1 12 + √2πσ

i
h 

τ0 − α − σΦ−1 C 1 − P̂M L − √ α , for 1 − P̂M L > 1 1 + √ α
.
2
C 2
2πσ
2πσ
The function Φ−1 [] is the inverse of the standard Gaussian CDF.
A.2. More? - Further details
A.2.3
237
MLE for estimation of a constant based on binary quantized measurements: generalized Gaussian noise case.
For binary quantized measurements with a fixed threshold, the MLE is given by (1.45)
X̂M L = τ0 − F −1
"
N
1 X
1−
ik
N
1
2
k=1
!#
,
where F −1 () is the inverse of the noise CDF. In the GGD case the CDF is the following
(1.40):


1 ε β
γ
β, δ
1
.
F (ε) =
1 + sign (ε)
2
Γ 1
β
Therefore, denoting the average of the binary observations by ī =
1
N
following MLE:
X̂M L = τ0 + δsign (ī) γ
−1
1
, |ī| Γ
β
1
β
1
,
β
N
P
ik , we have the
k=1
where γ −1 [, ] is the inverse of the incomplete gamma function.
A.2.4
Adaptive binary threshold asymptotic probabilities when the threshold is defined in a grid.
We consider here that the parameter lies in an interval [−A, A], where A is a positive real.
For assimilating this information, we are going to change the update of the binary threshold.
The following is assumed:
• The step size γ is chosen so that
A
= N,
γ
with N a positive integer.
• The initial threshold τ0,0 is chosen to be an integer multiple of γ, τ0,0 = jγ, so that
τ0,0 ∈ [−A, A].
• The threshold cannot leave the interval [−A, A]. This means that when τ0,k−1 = A and
ik = 1, we will set τ0,k = A. When τ0,k−1 = −A and we have ik = −1, we will set
τ0,k = A. This changes the adaptive update of the threshold (1.49) to
τ0,k =



−A,
if τ0,k−1 = −A and ik = −1,
τ0,k = τ0,k−1 + γik ,


A,
if τ0,k−1 = A and ik = 1.
(A.7)
238
A. Appendices
The threshold is now defined in a finite grid
A
A
(N − 1)
(N − 1)
, · · · , − , 0,
, ··· , A
, A .
τ0,k ∈ −A, −A
N
N
N
N
An iteration of the threshold update is depicted in Fig. A.3.
τ0,k−1
Time k − 1
yk (measure)
···
−A
···
A Ni
−A (NN−1)
A (i+1)
N
A
−A (NN−1)
A
τ0,k
Time k
···
−A
−A (NN−1)
−A (NN−1)
···
A Ni
A (i+1)
N
Figure A.3: An iteration of the binary threshold update in the grid where it is defined. The
values of the finite grid where the threshold is defined are indicated by the black squares.
Asymptotic probability distribution
In a similar way as for the infinite grid, we will define a transition matrix for the finite grid
Tf g . In this case the matrix will have size (2N + 1) × (2N + 1). Using the following notation
for the CDF elements
j
j
,
aj = F A − x = 1 − F x − A
N
N
the transition matrix is given by

a−N
a−(N −1)

 1 − a−N
0



0
1 − a−(N −1)


..

.
0


..

.


Tf g = 











..

.
..
.
0
a0
0
..
.
1 − a0
0
..
.
..
.
0
..
.
aN −1
0
0
aN
1 − aN −1 1 − aN












.












The Markov chain formed by the sequence τ0,k is an ergodic chain, as all threshold values
can be reached from all other threshold values and the borders −A and A make the chain to
A.2. More? - Further details
239
be aperiodic1 . Thus, the sequence of thresholds admit a unique asymptotic distribution p∞
[Gallager 1996, Ch. 4]. The asymptotic distribution is then the solution of
p∞ = Tf g p∞ ,
or equivalently
(Tf g − I) p∞ = Rp∞ = 0,
(A.8)
where I is a (2N + 1) × (2N + 1) identity matrix and 0 is the zero vector. The problem is
then to find a vector from the null space of R


a−N − 1
a−(N −1)


..
 1 − a−N

.
−1




.
..
..


.
0
1
−
a
−(N −1)




..
.
.


.
0
.
0




..


.
a0


,

R=
−1



.
..


1
−
a
0




.
.
.
.

.
. 
0
0




..
.
.

.
aN −1
0 
.




..

.
−1
aN 
1 − aN −1 −aN
0
0
under the constraint that the vector is a probability vector: it sums to one
1⊤ p∞ = 1,
where 1 is a vector with all elements equal to one, and all its elements are nonnegative
p∞ 0.
For solving (A.8), we start by solving its last line (the line at the bottom). We have
(1 − aN −1 ) pN −1,∞ − aN pN,∞ = 0,
which gives
pN −1,∞ =
For the next line (above), we obtain
aN
pN,∞ .
(1 − aN −1 )
aN pN,∞ − pN −1,∞ + (1 − aN −2 ) pN −2,∞ = 0
and solving it, we have
pN −2,∞ =
pN −1,∞ − aN pN,∞
.
(1 − aN −2 )
1
This is not the case for the thresholds defined in an infinite grid. In this case the thresholds must be
separated in two periodic classes [Fine 1968].
240
A. Appendices
Using the expression for pN −1,∞ above, we get
pN −2,∞ =
aN −1 aN
.
(1 − aN −2 ) (1 − aN −1 )
Clearly, from the similarity of the equations for the other lines, we can proceed in the same
way to obtain

 i−1
Q
aN −j 


 j=0
(A.9)
pN −i,∞ =  i
 pN,∞ , for i ∈ {−N, · · · , N − 1} .

Q
(1 − aN −j )
j=1
If we denote
i−1
Q


aN −j 

 j=0

ci =  i
,
Q

(1 − aN −j )
j=1
then, pN,∞ can be found by using the constraint that the vector must sum to one
!
2N
X
pN −i,∞ + pN,∞ = 1.
i=1
Separating the factor pN,∞ which appears in all terms (see (A.9)), we get
pN,∞ =
1+
1
2N
P
(A.10)
.
ci
i=1
Using this and (A.9), we can obtain a general expression for the probabilities
pN −i,∞ =
1+
c′i
2N
P
,
ci
for i ∈ {0, · · · , 2N } ,
(A.11)
i=1
where
c′i =
(
1,
if k = 0,
ci ,
otherwise.
By substituting the c′i in (A.11), we get the following expressions for the asymptotic probabilities
2N
pN,∞ =
p−N,∞ =
pN −i,∞ =
1 Y
(1 − aN −i ) ,
P (x)
1
P (x)
i=1
2N
−1
Y
i=0

1 
P (x)
i−1
Y
j=0
(A.12)
aN −i ,

aN −j  
2N
Y
j=i+1

(1 − aN −j ) ,
A.2. More? - Further details
241
where the normalization factor P (x) is


 


!  2N
2N
2N
−1
2N
−1  j−1

Y
Y
Y
X
Y
P (x) = 
(1 − aN −j ) + 
aN −j  +
aN −k 
(1 − aN −l ) .


j=1
j=0
j=1
l=j+1
k=0
(A.13)
With the expressions above for the asymptotic probabilities of the thresholds, it is possible
to obtain exact values for the asymptotic FI using (1.62).
Maximum of the probability distribution
We are going to verify that the asymptotic threshold is indeed around the true parameter.
We will analyze the position of the maximum probability threshold and the increasing and
decreasing patterns of the asymptotic probabilities. For doing so, we will obtain the expressions for the signs of the differences between neighboring (in threshold position) asymptotic
probabilities.
Starting at the negative extremum of the interval, the difference is the following:



! 2N −2
2N −1
Y
1  Y
aN −j  (1 − a−N ) .
aN −i − 
p−N,∞ − p−(N −1),∞ =
P (x)
j=0
i=0
Making explicit the common factor, we have
2N −1
Q
p−N,∞ − p−(N −1),∞ =
The factor
2N
Q−1
aN −i
i=0
P (x)
!
is positive because
i=0
aN −i
P (x)
2N
Q−1
i=0
a−(N −1) − (1 − a−N ) .
aN −i is positive, it is a product of probabilities,
and P (x) is also positive, it is a sum of products of probabilities. Therefore, to obtain the
sign of p−N,∞ − p−(N −1),∞ as a function of x, we need only to analyze the sign of the difference
a−(N −1) − (1 − a−N ). Using the expressions for the ai terms, we have
sign p−N,∞ − p−(N −1),∞ = sign
1−F
(N − 1)
x+A
N
− F (x + A) .
The difference in the sign on the RHS is the difference
between a complementary CDF
(N −1)
parametrized by x and centered on −A N , 1 − F x + A (NN−1) , and a CDF centered
on −A, F (x + A). Using the facts that the complementary CDF is a decreasing function
(from one to zero), the CDF is an increasing function (from zero to one) and that CDF is
simply a reversed and shifted version of the complementary CDF, we obtain the following
conclusions:
(N − 12 )
.
• the sign of the probability difference is positive for x ∈ −A, −A N
242
A. Appendices
• The sign is negative for x > −A
(N − 21 )
N
.
• The probability difference is zero when x = −A
(N − 12 )
N
.
The difference between probabilities for i ∈ {1, · · · , 2N − 1} is



i−1
2N

Y
Y
1

aN −j  
(1 − aN −j ) −
p(N −i),∞ − p(N −i+1),∞ =
P (x) 
j=0
j=i+1



i−2
2N

Y
Y
aN −j   (1 − aN −j )
−

j=0
j=i
and after factorization
p(N −i),∞ − p(N −i+1),∞ =
i−2
Q
j=0
aN −j
!"
2N
Q
j=i+1
(1 − aN −j )
P (x)
#
[aN −i+1 − (1 − aN −i )] .
As the first factor on the RHS is positive, the sign of the difference is determined by
(N − i + 1)
(N − i)
sign p(N −i),∞ − p(N −i+1),∞ = sign 1 − F x − A
−F x−A
.
N
N
The analysis of the sign above is similar to the negative extremum case. Thus we have the
following conclusions:
• the sign of the difference is positive for x < A
• We have a negative sign for x > A
• The difference is zero when x = A
(N −i+ 21 )
N
.
(N −i+ 21 )
.
N
(N −i+ 12 )
N
.
Using a similar procedure for the positive extremum, we have that the sign of the difference
is given by
(N − 1)
sign p(N −1),∞ − pN,∞ = sign [1 − F (x − A)] − F x − A
,
N
which leads to the same conclusions as above, the exception is that i = 0 in this case.
Joining all the results, we can see that the maximum of the asymptotic probability vector
always occurs at the point of the grid that is closest to x. Moreover, the distribution always
decreases when we consider thresholds with increasing distance to the maximum probability
threshold. This means that the distribution is unimodal with its maximum close to the
parameter, thus justifying the statement that the thresholds will be placed asymptotically
around the parameter.
A.2. More? - Further details
243
Small noise approximation
The analytical asymptotic probabilities expressions (A.13) are quite cumbersome to be evaluated when N is large. As the CDF are almost step functions (zero/one functions) for large
arguments and as the asymptotic probabilities are products of CDF, in the case when the
noise level is small compared with γ, we can obtain very simple approximate expressions for
the asymptotic probabilities that involve only a few CDF terms.
The small noise approximations for the complementary CDF and CDF are the following:


1,
x < A (N −i−1)
,

N

(N − i)
(N −i+1)
0,
x>A N ,
=
aN −i = 1 − F x − A

N

(N
−i)
1 − F x − A
< x < A (N −i−1)
,
, A (N −i+1)
N
N
N


,
x < A (N −i−1)
0,
N

(N − i)
(N −i+1)
1,
x>A N ,
=
1 − aN −i = F x − A
(A.14)

N

(N −i−1)
(N −i+1)
(N
−i)
F x − A
<x<A
.
, A
N
N
N
Independently of the value of x, we can get the following approximations of the CDF products
using (A.14):
2N
Y
i=1
(1 − aN −j ) ≈ 1 − aN −1 ,
2N
−1
Y
i=0
j−1
Y
i=0
aN −i ≈ a−N +1 ,

!  2N
Y
(1 − aN −i ) ≈ aN −j+1 (1 − aN −j−1 ) .
aN −i 
i=j+1
We can now apply these approximations to the asymptotic probabilities
(A.13).i Note that
h
the approximations will be dependent on the value of x. For x ∈ −A, −A (NN−1) we have
a−N +1
2 − a−N
1 − a−N
≈
2 − a−N
p−N,∞ ≈
p−(N −1),∞
p−(N −2),∞ ≈
=
=
1 − F x + A (NN−1)
1 + F (x + A)
F (x + A)
,
1 + F (x + A)
F x + A (NN−1)
,
a−N +1
=
2 − a−N
1 + F (x + A)
p(N −i),∞ ≈ 0, for i ∈ {0, · · · , 2N − 3} .
244
A. Appendices
i
(N −i)
, we obtain 4 nonzero terms, all the others are approximately
For x ∈ A (N −i−1)
,
A
N
N
zero. The nonzero terms are
p(N −i−2),∞ ≈
p(N −i−1),∞
p(N −i),∞
p(N −i+1),∞
aN −i−1
2
aN −i
≈
2
aN −i+1
≈
2
aN −i+2
≈
2
=
aN −1
1 − aN
aN
≈
1 + aN
=
p(N −1),∞
=
pN,∞ ≈
2
1 − F x − A (NN−i)
=
2
F x − A (N −i+1)
N
=
2
F x − A (NN−i)
=
Finally, for the positive extremum, x ∈
following:
p(N −2),∞ ≈
1 − F x − A (N −i−1)
N
2
,
,
,
.
i
−A
−A (N −1),
, the approximations give the
N
1 − F x − A (NN−1)
2 − F (x − A)
1 − F (x − A)
,
2 − F (x − A)
F x − A (NN−1)
,
1 − aN −1
=
1 + aN
2 − F (x − A)
p(N −i),∞ ≈ 0, for i ∈ {3, · · · , 2N } .
Under the small noise assumption, these approximations are not only useful for evaluating
the FI, but they can also be used for estimating the parameter when the number of measurements is very large. Suppose that after a number of samples M , the threshold probabilities
reach approximately the asymptotic distribution, then, from this point on, we start to store
the measurements forming an histogram of the threshold values that were used. After a large
number of measurements, the histogram will be very close to the asymptotic threshold probabilities. We can then search for the two largest values of the histogram and using one of the
correspondent empirical frequencies in the place of the true probability, we can inverse the
corresponding approximate expression for the probability to obtain x.
For example, suppose we have obtained the largest empirical frequencies at the points
N − i − 1 and N − i. The empirical frequency at N − i − 1 is p̂(N −i−1),∞ , then inverting the
corresponding p(N −i−1),∞ we get the estimate x̂
x̂ = A
(N − 1)
+ F −1 2p̂(N −i−1),∞ − 1 .
N
A.2. More? - Further details
A.2.5
245
Particle filter using rejection sampling for tracking
a scalar Wiener process.
The optimal sampling distribution p (xk |xk−1 , ik ) can be rewritten as
p (xk |xk−1 , ik ) =
P (ik |xk ) p (xk |xk−1 )
p (xk , xk−1 , ik )
=
∝ P (ik |xk ) p (xk |xk−1 ) ,
p (xk−1 , ik )
P (ik |xk−1 )
(A.15)
where the proportional relation comes from the fact that, for a given ik , the probability
P (ik |xk−1 ) is a constant independent of xk . Note that as P (ik |xk ) is a probability, it can
be bounded above by one, as a consequence P (ik |xk ) p (xk |xk−1 ) can be bounded above by
(j)
p (xk |xk−1 ) which is a Gaussian PDF. Therefore, for each previous xk−1 , a standard rejection
sampling
method [Robert 1999, pp. 50] can be applied to generate a sample from
(j)
p xk |xk−1 , ik . This can be done by sampling independently from the Gaussian distribution
p (xk |xk−1 ) and from the uniform distribution U [0, 1]. The rejection sampling method that
(j)
gives the optimal samples xk is the following:
Rejection sampling for the optimal sampling distribution
(app1) For j = 1 to NS
(j)
(j)
• Set uk = 1 and lk = 0.
(j)
(j)
• While lk < uk , do
– Sample the Gaussian distribution
(How? - App. A.3.3)

!2 
(j)
x
−
x
−
u
1
1
k
k
(j)
k−1
.
p xk |xk−1 = √
exp −
2
σw
2πσw
(j)
– Evaluate lk
(j)
(j)
lk = P ik |xk .
(j)
– Sample, independently from xk , the uniform
distribution U [0, 1].
(j)
(j)
Note that we accept a sample xk only when the its likelihood P ik |xk
is larger than the
uniform sample.
By replacing (A.15) in the place of q (xk |x0:k−1 , i1:k ) in the recursive expression for the
weights (2.25), we have the following update equation for the weights
(j)
(j)
(j)
w x1:k = P ik |xk−1 w̃ x1:k−1 .
246
A. Appendices
(j)
(j)
Observe that we might evaluate P ik |xk−1 , which can be obtained similarly as P ik |xk
with
 F ′ τi ,k − x(j) − uk − F ′ τi −1,k − x(j) − uk , if ik > 0,
k−1
k−1
(j)
k
k
P ik |xk−1 =
(A.16)
F ′ τi +1,k − x(j) − uk − F ′ τi ,k − x(j) − uk , if ik < 0,
k
k
k−1
k−1
where F ′ is the CDF for the r.v. that is the sum of the noise r.v. Vk and the centered Xk
increment Wk − uk .
The procedure for tracking the Wiener process starts by sampling independently NS times
the prior distribution p (x0 ) and setting the initial weights all to N1S . After obtaining the first
measurement i1 , both the sampling with p (x1 |xk−1 , i1 ) and the updates of the weights can be
done. Then, after normalizing the weights, the estimate x̂1 can be obtained with the weighted
mean. The procedure is then repeated for each time k in a sequential way.
This procedure may also suffer from the degeneracy problem explained in Sec. 2.3.4 (p.
85), thus a resampling step (How? - App. A.3.4) (app4) must be carried out each time the
number of effective samples is too low.
The performance of this sequential importance sampling algorithm can be obtained through
a lower bound, as it is discussed in Sec. 2.4.
Remark: to reduce the complexity of this algorithm, we could use a technique based on
local linearizations of the optimal proposal distribution [Doucet 2000]. The problem with this
approach is that it requires the logarithm of the optimal proposal to have a positive second
derivative and this cannot be guaranteed for all noise distributions considered here.
A.2. More? - Further details
247
The sequential procedure with the resampling step (particle filter) for solving (b) (p. 29)
is the following:
Solution to (b) - Particle filter with rejection for
a fixed threshold set sequence τ 1:k
(b1.2) 1) Estimator
(j)
• Set uniform normalized weights w̃ x0
= N1S and initialize
n
o
(1)
(N )
NS particles x0 , · · · , x0 S by sampling the prior
"
2 #
1 x0 − x′0
1
exp −
.
p (x0 ) = √
2
σ0
2πσ0
For each time k,
(j)
• for j from 1 to NS , sample the r.v. Xk with rejection sampling (app1).
• for j from 1 to NS , evaluate and normalize the weights
(j)
w
x
1:k
(j)
(j)
(j)
(j)
w̃ x1:k = N
,
w x1:k = P ik |xk−1 w̃ x1:k−1 ,
S
P
(j)
w x1:k
j=1
(j)
where P ik |xk−1 is given by (A.16).
• Obtain the estimate with the weighted mean
x̂k ≈
NS
X
j=1
(j)
(j)
xk w̃ x1:k .
• Evaluate the number of effective particles
Neff =
N
PS
j=1
1
(j)
w̃2 x1:k
,
if Neff < Nthresh , then resample using multinomial resampling (How? - App. A.3.4) (app4).
2) Performance (lower bound)
The MSE can be lower bounded as follows
MSEk ≥
1
,
Jk′
with Jk′ given recursively by
Jk′ =
1
1
+ Iq (0) − 4 2
σw
σw
1
1
2
σw
′
+ Jk−1
.
248
A. Appendices
A.3
How? - Algorithms and implementation issues
A.3.1
How to sample from a uniform/Gaussian distribution.
We are going to consider that we can generate easily and independently uniform and Gaussian
variates. For generating uniform variates, one can use linear congruential generators (see
[Knuth 1997, Sec. 3.2] for details), while for generating Gaussian variates one can use the
Box-Muller transform which requires a pair of independent uniform variates [Box 1958].
By looking to the specific form of the PDF (1.36)

α 2 
ε+ 2
1
1


, for ε < − α2 ,
fGL (ε) = C √2πσ exp − 2

σ


f (ε) = fU (ε) = C √12πσ ,
for − α2 ≤ ε ≤ α2 ,


α 2


1 ε− 2
1

, for ε > α2 ,
fGR (ε) = C √2πσ exp − 2
σ
α
where C = 1 + √2πσ
, we can see that we can generate samples from it by generating samples
independently from the half Gaussian distributions

α  √ 2 exp − 1 ε+ 2 2 , for ε < − α ,
2
σ
2
′
2πσ
(ε) =
fGL

0,
otherwise,
′
(ε) =
fGL

√ 2
2πσ

exp
0,
and from the central uniform distribution
(
fU′ (ε) =
1
α,
0,
− 21
ε− α
2
σ
for −
α
2
2 , for ε > α2 ,
otherwise
≤ ε ≤ α2 ,
otherwise
and then choosing one of the samples randomly. For having the samples distributed correctly,
1
, or the sample
we will choose the sample from the left Gaussian r.v. with probability 2C
α
√
from the uniform distribution with probability 2πσC or from the right Gaussian also with
probability
1
2C .
A.3. How? - Algorithms and implementation issues
249
This gives the following algorithm for generating a sample from the uniform/Gaussian
distribution with parameters α and σ:
Uniform/Gaussian sample generator
(app2) To generate a sample v do the following
• evaluate
C =
p1 =
p2 =
1
1+
√α
2πσ
,
1
,
2C
1 1
α
.
+√
C 2
2πσ
• Generate 2 independent uniform variates (from U [0, 1])
u0 and u1 and 2 standard (zero mean and σ = 1) Gaussian
variates g1 and g2 .
• If u0 < p1 , then
α
,
v = − σ |g1 | +
2
else if p1 ≤ u0 ≤ p2 , then
v = σ |g2 | +
α
.
2
1
v = α u1 −
2
else
A.3.2
,
How to sample from a GGD.
We consider that an easy method for generating independent binary samples (samples with
values −1 or 1 that have equal probability) and gamma samples is available. For obtaining
binary samples one can simply take the sign of a sample from a uniform U [−0.5, 0.5] distribution and for obtaining gamma variates one can use a rejection method [Marsaglia 2000].
It can be shown that a generalized Gaussian r.v. V ′ with shape parameter β and unit
scale parameter can be obtained with the following transformation of two independent r.v.
[Nardon 2009]:
1
β
,
V ′ = B Γ1
β
where B is a binary r.v. and Γ 1 is a gamma r.v. with shape parameter
β
1
β.
If we want a
generalized Gaussian r.v. V with scale parameter δ, we need only to multiply V ′ by δ.
250
A. Appendices
This gives the following algorithm for generating a sample from a GGD with parameters
β and δ:
Generalized Gaussian sample generator
(app3) To generate a sample v do the following
• generate independently a uniform sample u from U [0, 1]
and a gamma sample γ 1 from Γ 1 with unitary scale paβ
β
rameter.
• Transform the uniform sample u into a binary sample b
with
1
b = sign u −
.
2
• Apply the transformation
1
β
.
v = δb γ 1
β
A.3.3
(j)
How to sample from the distribution p xk |xk−1 using a Gaussian
standard variate.
Suppose we can generate a Gaussian standard variate Wn ∼ N (0, 1), for example using the
Box-Muller transform on a pair of independent uniform variates [Box 1958]. We want to
generate a Gaussian variate with PDF

!2 
(j)
x
−
x
−
u
1
1
k
k
(j)
k−1
,
exp −
p xk |xk−1 = √
2
σ
2πσw
w
(j)
where xk−1 , uk and σw are known. Using the following properties of Gaussian r.v.:
• the product of a Gaussian r.v. by a constant gives a Gaussian r.v. with variance given
by the initial variance multiplied by the square of the constant;
• the sum of a Gaussian r.v. and a constant gives a Gaussian r.v. with mean shifted by
the value of the constant.
(j)
(j)
We have that the r.v. Xk distributed according to p xk |xk−1 can be generated as follows
(j)
(j)
Xk = σw Wn + xk−1 + uk .
A.3.4
Multinomial resampling algorithm.
In order to sample from
P (xk ) =
 w̃ x(j) ,
1:k
0,
(j)
if xk = xk
otherwise,
A.3. How? - Algorithms and implementation issues
251
we can create an increasing sequence of cumulative weights
(j)
w+ =
j
X
i=1
(0)
(i)
w̃ x1:k ,
where we define w̃ x1:k = 0. Thus, the intervals defined by the neighboring pairs of the
i
(j)
(j−1)
sequence w+ , w+ form a partition of the interval [0, 1] and their lengths equal the cor
(j)
responding w̃ x1:k . If we sample from the uniform distribution defined on [0, 1], U [0, 1],
(j)
and choose xk with j corresponding to the interval in the sequence w+ in which the uni(j)
form sample is contained, then the chosen xk are distributed according to the probability
distribution P (xk ) above.
Resetting equal sample weights at the end of the procedure, we have the multinomial
resampling algorithm:
Multinomial resampling
(app4) For j = 1 to NS
• store the particle values in a sequence of auxiliary vari(j)
ables x̃k
(j)
(j)
x̃k = xk ,
• create the sequence of cumulative weights
(j)
w+ =
j
X
i=1
(0)
(i)
w̃ x1:k ,
with w̃ x1:k = 0.
n
o
• Create a sequence u′1 , · · · , u′NS by sampling independently NS times from the distribution U [0, 1].
For j = 1 to NS ,
(j)
(l )
• set xk = x̃k j , where lj is chosen so that
i
(l −1)
(l )
uj ∈ w+j , w+j ,
• reset the normalized weights to a uniform distribution
1
(j)
.
w̃ x1:k =
NS
252
A.3.5
A. Appendices
How to sample from a STD.
For generating samples from the STD, we consider that a simple method for generating uniform
U [0, 1] samples is available.
It is possible to show that a Student’s-t r.v. V ′ with shape parameter β and unit
scale parameter can be obtained with the following transformation of two independent r.v.
[Bailey 1994]:
1
2
−2
cos (2πU2 ) ,
V ′ = β U1 β − 1
where U1 and U2 are independent r.v. with uniform U [0, 1] distribution. If we want a
Student’s-t r.v. V with scale parameter δ, we need only to multiply V ′ by δ.
Thus we have the following algorithm for generating a sample from a STD with parameters
β and δ:
Student’s-t sample generator
(app5) To generate a sample v do the following
• generate independently two uniform samples u1 and u2
from U [0, 1].
• Apply the transformation
2
1
2
−β
cos (2πu2 ) .
v = δ β u1 − 1
B
Résumé détaillé en français (extended
abstract in French)
Ceci est un résumé détaillé en français des travaux réalisés dans cette thèse. L’introduction
et les conclusions des travaux sont traduites directement du manuscrit en anglais pour une
meilleure compréhension du contexte, les chapitres concernant les développements et résultats
théoriques seront présentés sous forme synthétique, avec seulement les principaux développements et résultats.
Contents
B.1 Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
B.2 Estimation et quantification : algorithmes et performances . . . . . . 261
B.2.1 Estimation d’un paramètre constant . . . . . . . . . . . . . . . . . . . . . 261
B.2.2 Estimation d’un paramètre variable . . . . . . . . . . . . . . . . . . . . . 270
B.2.3 Quantifieurs adaptatifs pour l’estimation . . . . . . . . . . . . . . . . . . . 274
B.3 Estimation et quantification : approximations à haute résolution
. . 286
B.3.1 Approximation à haute résolution de l’information de Fisher . . . . . . . 286
B.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
B.4.1 Conclusions principales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
B.4.2 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294
253
254
B.1
B. Résumé détaillé en français (extended abstract in French)
Introduction
Quantification : une inconnue dans la salle
Ouvrez un livre, un livre quelconque sur les fondements du traitement numérique du signal,
et comptez le nombre de pages dédiées au théorème de l’échantillonnage et au traitement du
signal à temps discret : la transformée de Fourier rapide, la transformée en Z, le filtrage à
réponse impulsionnelle finie et infinie. Maintenant, comptez le nombre de pages dédiées à la
quantification. Même si la moitié du « monde numérique » est un résultat de la quantification,
si on lit quelques livres fondamentaux en traitement numérique du signal, on a l’impression
qu’elle est un sujet sans importance.
Toutefois, une personne curieuse peut se demander : la quantification est-elle un sujet
vraiment dépourvu d’importance ? Peut-être qu’elle est si difficile à étudier et à expliquer de
façon simple, que la plupart des références de base en traitement numérique du signal préfèrent
omettre une explication plus détaillée. Nous croyons que cette explication est à l’origine de
l’omniprésence de la quantification fine des signaux dans la plupart des livres sur le traitement
numérique des signaux. En la considérant fine, les auteurs de ces livres peuvent reléguer la
quantification à une note de bas de page. On constate que la quantification semble être
l’étrange participant de la « fête du traitement numérique des signaux » et que personne ne
veut discuter avec elle (même si elle est un des organisateurs de la fête). Quelques domaines
du traitement du signal trouvent utile (et dans certaines circonstances ils n’ont pas tort)
de refuser tout contact avec la quantification. Chaque fois qu’ils ont besoin de traiter des
problèmes induits par la quantification, ils l’appellent de façon dépréciative – « le bruit de
quantification ».
Dans cette thèse, nous espérons faire « discuter » de façon respectueuse, sans termes
amoindrissants, un des participants de la fête du traitement du signal avec la quantification.
Le sujet que nous avons choisi est l’estimation.
Dans la suite, on expliquera la motivation et les points principaux de cette « discussion ».
Quantification et réseaux de capteurs : l’invitée d’honneur
Bien que nous ne traitions pas explicitement de la conception d’algorithmes d’estimation avec
une architecture du type réseau de capteurs, avec cette thèse nous espérons contribuer au
développement de techniques qui peuvent être utilisées ou étendues aux réseaux de capteurs.
L’essor des réseaux de capteurs. Avec la réduction des coûts et de la taille des dispositifs
électroniques, tels que les capteurs et les émetteurs-récepteurs, un nouveau domaine a émergé
sous le nom de « Réseaux de capteurs ». Ce terme, en général, désigne un groupe de capteurs
capables de communiquer et de traiter des données pour réaliser une tâche donnée, e.g. : faire
de l’estimation, de la détection, du suivi d’un signal, de la classification, etc.
Les réseaux de capteurs sont intéressants en pratique pour plusieurs raisons, parmi les
plus mentionnées dans la littérature on peut trouver [Akyildiz 2002], [Intanagonwiwat 2000],
B.1. Introduction
255
[Zhao 2004, pp. 7–8] :
• tolérance aux défaillances et flexibilité.
• Déploiement facile.
• Possibilité d’utilisation en environnement dangereux.
• Possibilité d’utilisation sans maintenance.
• Utilisation de la communication pour réduire la quantité d’énergie utilisée.
• Rapport signal à bruit amélioré pour le suivi et détection d’événements dans une zone
donnée.
Applications des réseaux de capteurs. Les avantages cités plus haut ouvrent la voie pour
l’utilisation des réseaux de capteurs dans un très large spectre de domaines [Arampatzis 2005],
[Chong 2003], [Durisic 2012], [Puccinelli 2005]: surveillance de l’environnement, surveillance
pour l’agriculture, génie civil, surveillance urbaine, applications en santé, applications commerciales etapplications militaires.
Le besoin de quantifier. Même si le progrès des technologies de conception des capteurs
et des dispositifs de communication nous amène à l’utilisation de réseaux à grand nombre de
capteurs, des considérations pratiques tels que l’utilisation de batteries et des contraintes sur
la taille maximale des capteurs imposent trois contraintes majeures pour la conception d’un
réseau de capteur : la contrainte énergétique, la contrainte sur le débit de communication et
la contrainte sur la complexité.
Pour respecter ces contraintes, on peut quantifier les mesures au niveau des capteurs. Ceci
permet de :
• réduire la complexité des opérations grâce à des recherches dans des tableaux pré-stockés
et limiter la quantité de mémoire utilisée.
• réduire directement le débit binaire en sortie des capteurs par le réglage du nombre
d’intervalles de quantification.
• réduire la quantité d’énergie utilisée, comme conséquence de la réduction de la complexité
et du débit.
Voilà les principales raisons pour lesquelles nous avons choisi d’étudier la quantification
dans cette thèse.
256
B. Résumé détaillé en français (extended abstract in French)
Différents objectifs et précisions sur le sujet de la thèse
Dans un réseau de capteurs, on s’intéresse principalement à l’inférence d’une certaine information enfouie dans les mesures. Les deux classes principales de problèmes d’inférence étudiées
en traitement du signal sont la détection et l’estimation. Si on regarde la littérature sur les
problèmes conjoints détection/quantification et estimation/quantification, on constate que, en
comparaison avec la littérature pour les problèmes isolés (seulement détection ou seulement
quantification), sa taille n’est pas importante, en revanche, comme conséquence de l’essor des
réseaux de capteurs, elle ne cesse pas de grandir.
Quelques références sur ces problèmes conjoints sont :
• Détection: [Benitz 1989], [Gupta 2003], [Kassam 1977], [Longo 1990], [Picinbono 1988],
[Poor 1977], [Poor 1988], [Tsitsiklis 1993], [Villard 2010], [Villard 2011].
• Estimation: [Aysal 2008], [Fang 2008], [Gubner 1993], [Luo 2005], [Marano 2007],
[Papadopoulos 2001], [Poor 1988], [Ribeiro 2006a], [Ribeiro 2006b], [Ribeiro 2006c],
[Wang 2010].
Estimation à partir de mesures quantifiées. Dans cette thèse on s’intéresse au second
problème, l’estimation à partir de mesures quantifiées. On commence par la définition générale
du problème d’estimation dans un réseau de capteurs pour, après une suite de simplifications,
arriver au sujet précis de la thèse.
Dans le schéma général, chaque capteur : mesure une quantité à amplitude continue X (i) ,
puis la mesure est traitée et transmise au point où l’estimation sera faite. Ce point peut être
un centre de fusion, un des capteurs ou tous les capteurs. Dans le dernier cas, tout les capteurs
diffuseront leurs mesures après traitement. Ce schéma est montré en Fig. B.1. La quantité
mesurée peut être une suite de vecteurs, une suite de scalaires, un vecteur constant ou un
scalaire constant.
Comme première hypothèse de travail, on considère que seulement un des terminaux (capteurs) est utilisé dans le réseau de capteurs, éventuellement on peut considérer plusieurs terminaux, mais dans ce cas la quantité à estimer sera la même pour tous les capteurs. On
considère aussi que la quantité à estimer est une séquence de scalaires ou un seul scalaire, on
utilise la notation Xk pour cette quantité dans les deux cas, l’indice k désigne l’échantillon en
question ou le temps discret. Dans le cas où Xk est une constante scalaire, on a Xk = x. Le
problème simplifié, qui peut être appelé problème d’estimation scalaire à distance, est montré
en Fig. B.2.
Le paramètre Xk est mesuré avec du bruit additif Vk . La mesure à amplitude continue est
notée Yk = Xk + Vk . Le problème que nous traitons dans cette thèse est donc un problème
d’estimation d’un paramètre de centrage.
En raison des contraintes de conception discutées plus haut, le bloc de traitement est
remplacé par un quantifieur scalaire. Par conséquent, chaque mesure continue Yk génère
une mesure quantifiée ik au travers d’une fonction de quantification Q (). Chaque mesure
quantifiée est définie dans un ensemble fini de valeurs, ceci permet de fixer le débit binaire en
B.1. Introduction
X(1)
Capteur
257
Traitement
Transmission

Capteur 1
X(2)
Capteur
Traitement
Transmission
Estimation




X̂(1)
X̂(2)
..
.
X̂(Ns )





Capteur 2
..
.
X(Ns )
Capteur
Traitement
Transmission
Capteur Ns
Figure B.1: Estimation avec un réseau de capteurs. Plusieurs capteurs transmettent des
informations pré-traitées à l’estimateur final qui doit récupérer les quantités d’intérêt.
Xk
Mesure
Traitement
Transmission
Estimation
X̂k
Capteur
Figure B.2: Problème d’estimation scalaire à distance. Simplification scalaire et à un seul
capteur du problème montré en Fig. B.1.
sortie du capteur. On suppose que le débit en bits par unité de temps est choisi de façon à ne
pas dépasser la capacité du canal de transmission, de cette manière on peut considérer qu’un
code suffisamment performant peut être mis en œuvre pour rendre le canal parfait.
A chaque instant k, on est intéressé par l’estimation de Xk à partir d’un bloc de mesures
passées i1 , i2 , · · · , ik . Ce problème est illustré en Fig. B.3.
Vk
Xk
Yk
Bruit
Q (Yk )
Quant.
ik
Canal
parfait
g (i1 , · · · , ik )
X̂k
Estimation
Mesure
Figure B.3: Estimation à partir de mesures quantifiées. Un paramètre est mesuré avec du bruit
additif, les mesures sont alors quantifiées et transmises à travers un canal de communication
parfait. A partir de mesures passées, l’objectif est d’estimer Xk à chaque instant k avec la
suite de fonctions g ().
En Fig. B.3, on voit que la structure du quantifieur peut dépendre aussi de mesures
quantifiées passées.
258
B. Résumé détaillé en français (extended abstract in French)
Ce que l’on veut étudier. On veut proposer des algorithmes pour l’estimation de Xk à
partir des ik . Le paramètre Xk , qui sera défini de façon plus précise dans la suite, peut être
déterministe et constant ou aléatoire et lentement variable.
Après avoir proposé des algorithmes, on veut étudier leurs performances. Etant données
les performances des algorithmes, on veut aussi étudier les effets de différents paramètres du
quantifieur : seuils de quantification et résolution du quantifieur.
Pour évaluer l’impact de la quantification sur la performance d’estimation, on comparera
la performance des algorithmes proposés avec leurs pendants à mesures continues.
L’objectif ici est d’estimer Xk seulement à partir des informations sur les intervalles où ses
versions bruitées se trouvent.
Ce que l’on ne veut pas étudier. On ne veut pas reconstruire la mesure Yk à partir de la
mesure quantifiée pour ensuite estimer Xk à partir des mesures reconstruites comme si elles
étaient continues. En faisant cela, on se ramènerait au groupement des solutions optimales des
deux problèmes séparés, ces solutions ont déjà été abondamment étudiées dans la littérature.
On ne veut pas non plus considérer la quantification comme du bruit additif. On veut
étudier le problème dans sa forme originale, c’est-à-dire, le problème d’estimation à partir des
informations contenues dans des intervalles et pas dans des valeurs continues.
Ce que l’on veut étudier mais que l’on n’étudiera pas. Pour spécifier de façon plus
précise le problème traité dans cette thèse, on doit aussi mentionner les problèmes que l’on a
négligé sciemment pour rendre le sujet plus simple à traiter. Ces problèmes sont les suivants :
paramètres vectoriels et quantification vectorielle, canaux de communication bruités et codage
canal, signaux à variations rapides, estimation de signaux à temps continu et estimation
Bayésienne d’une constante aléatoire.
B.1. Introduction
259
Plan du résumé
Le plan de ce résumé est le suivant :
• Estimation à partir de données quantifiées : algorithmes et performances
On détaille le problème à traiter (présentation des modèles de signaux à estimer, du bruit
et du quantifieur), puis on étudie les algorithmes d’estimation et leurs performances.
– Estimation d’un paramètre constant.
D’abord, on se concentrera sur l’estimation d’un signal constant. On présentera
un estimateur du maximum de vraisemblance pour deux types de quantification :
binaire et multibit. Par l’analyse de sa performance asymptotique, donnée par
la borne de Cramér–Rao (BCR) ou de façon équivalente par l’information de
Fisher, on regardera l’impact du réglage de la dynamique de quantification. Comme
conséquence de cette analyse, on montrera l’importance d’une approche adaptative
pour le réglage du quantifieur. Finalement, on présentera des algorithmes adaptatifs de haute complexité qui, conjointement, estiment la constante et règlent le
quantifieur. On montrera qu’asymptotiquement une de ces méthodes est équivalente à un algorithme récursif de basse complexité.
– Estimation d’un paramètre variable.
On passera ensuite au cas du paramètre variable. Après la présentation du modèle
de variation utilisé, on définira le critère de performance d’estimation et l’estimateur
optimal. Pour réaliser l’estimateur optimal, on utilisera une méthode numérique
d’intégration, dans ce contexte (estimation Bayésienne) cette méthode est connue sous le nom de filtrage particulaire. On étudiera ses performances avec la
borne de Cramér–Rao Bayésienne (BCRB) et on montrera encore une fois
l’importance de l’adaptativité du réglage du quantifieur. Avec l’approche adaptative, on montrera qu’asymptotiquement l’estimateur optimal ainsi obtenu pour un
signal lentement variable peut être mis, lui aussi, sous une forme récursive simple.
– Quantifieurs adaptatifs pour l’estimation.
En se basant sur l’optimalité asymptotique des estimateurs vus précédemment, on
proposera des algorithmes adaptatifs de basse complexité pour l’estimation et le
réglage conjoint du quantifieur. On étudiera la performance de ces algorithmes pour
deux modèles d’évolution de la quantité à estimer (constant ou lentement variable)
et on les optimisera par rapport à ses paramètres libres. Pour la performance optimale, on étudiera la perte de performance d’estimation par rapport à des schémas
équivalents pour des mesures continues.
On proposera deux extensions de l’algorithme adaptatif : une extension où l’on
estime le paramètre x sans connaître l’échelle du bruit (équivalent de l’écart type)
et une autre où plusieurs capteurs obtiennent des mesures quantifiées en parallèle
et les transmettent à un centre de fusion qui applique un algorithme adaptatif pour
l’estimation et diffuse son estimateur aux capteurs pour le réglage des quantifieurs.
260
B. Résumé détaillé en français (extended abstract in French)
• Estimation à partir de données quantifiées : approximations à haute résolution. Contrairement aux développements précédents où le réglage du quantifieur n’est fait qu’en
fonction du seuil central, on se concentrera ici sur le placement de tous les seuils de
quantification pour maximiser la performance d’estimation d’un paramètre arbitraire
(pas seulement de centrage). Vu que ce problème est difficile à résoudre directement,
on utilisera une approche asymptotique, i.e. on trouvera des approximations pour le
quantifieur optimal quand le nombre d’intervalles de quantification est très grand.
– Approximation à haute résolution de l’information de Fisher. Après avoir montré
l’importance de l’information de Fisher dans la performance d’estimation des algorithmes proposés, on appliquera cette approche asymptotique pour la maximiser
en fonction des caractéristiques du quantifieur. Cette approche asymptotique permettra de trouver une caractérisation optimale du quantifieur et une expression
analytique de l’information de Fisher optimale. On testera les résultats sur le
problème d’estimation d’un paramètre de centrage. Pour avoir une approximation pratique des seuils de quantification optimaux, on proposera l’utilisation de
l’algorithme adaptatif présenté précédemment.
Avec les expressions analytiques de l’information de Fisher, on pourra aussi étudier
de façon approchée le problème d’allocation optimale de bits dans un réseau de
capteurs, i.e. le nombre total de bits que les capteurs peuvent envoyer à un centre
de fusion étant fixé, combien de bits faut-il allouer à chaque capteur ?
• Conclusions
On présentera les principaux points qui découlent des résultats de la thèse et on regardera
les travaux qui peuvent être développés dans le futur : des extensions de problèmes traités
ici ou des problèmes qui n’ont pas été traités pour avoir une première approche la plus
simple possible.
B.2. Estimation et quantification : algorithmes et performances
B.2
B.2.1
261
Estimation et quantification : algorithmes et performances
Estimation d’un paramètre constant
Pour commencer cette section, on présente les modèles de mesure et de bruit utilisés.
Modèle de mesure
Le paramètre inconnu, constant et scalaire est
x ∈ R,
il est mesuré N fois, N ∈ N⋆ , avec du bruit indépendant et identiquement distribué
(i.i.d.) Vk . Pour k ∈ {1, · · · , N }, les mesures continues sont données par
Y k = x + Vk .
(B.1)
Modèle de bruit, hypothèses sur la distribution du bruit
Pour simplifier la suite, on considérera les hypothèses suivantes sur la distribution du bruit :
AN1 La fonction de répartition marginale du bruit, notée F , admet une densité de probabilité (d.d.p.) f par rapport à la mesure de Lebesgue standard en (R, B (R)).
AN2 La d.d.p. f (v) est une fonction paire, strictement positive et elle décroît strictement
avec |v|.
Modèle du quantifieur
La sortie du quantifieur est donnée par
ik = Q (Yk ) ,
où ik est choisi dans un ensemble fini de valeurs I de R, cet ensemble possède NI éléments.
Le nombre d’intervalles de quantification est par conséquent noté NI . Un exemple simple de
quantifieur Q avec seuils uniformes est donné en Fig. B.4.
A l’exception de la quantification uniforme, que l’on n’imposera pas, cet exemple illustre
les principales hypothèses de travail sur la structure du quantifieur :
Hypothèses (sur le quantifieur) :
AQ1 NI est un nombre naturel pair et l’ensemble I, auquel ik appartient, est
NI
NI
I = − , · · · , −1, 1, · · · ,
.
2
2
262
B. Résumé détaillé en français (extended abstract in French)
ik = Q (Yk )
...
NI
2
..
.
..
.
2
1
τ− NI = −∞
2
τ0 −
...
NI
2
− 1 ∆ τ0 − 2∆
...
τ0 − ∆
τ0
...
τ1
= τ0 + τ1′ τ0 + 2∆ τ0 +
−1 = τ0 + ∆
...
NI
2
−1 ∆
τ NI = +∞ Yk
2
−2
..
.
..
.
...
− N2I
Figure B.4: Fonction de quantification Q (Yk ) avec NI intervalles de quantification uniformes
de taille ∆. Le nombre d’intervalles de quantification NI est pair, le quantifieur est symétrique
autour d’un seuil central τ0 et ses indices de sortie sont des entiers non nuls.
AQ2 Le quantifieur est symétrique autour d’un seuil central. Par conséquent le vecteur de
seuils τ peut être écrit sous la forme suivante (⊤ est l’opérateur transposé)
′
· · · τ−1 = τ0 − τ1′
τ = τ − NI = τ 0 − τ N
I
2
′
τ1 = τ0 + τ1′ · · · τ NI = τ0 + τ N
I
τ0
2
2
2
⊤
.
Les éléments de ce vecteur forment une séquence strictement positive et le vecteur de
variations de seuil par rapport au seuil central est donné par
′
τ = 0
τ1′
···
′
τ NI = +∞
2
⊤
.
Avec les variations de seuils τi′ , on peut écrire la relation entrée–sortie du quantifieur sous
une forme plus compacte :
ik = i sign (Yk − τ0 ) ,
′
pour |Yk − τ0 | ∈ τi−1
, τi′ .
(B.2)
Maximum de vraisemblance, borne de Cramér–Rao et information de Fisher
On veut estimer x à partir de i1:N = {i1 , · · · , iN }, on cherche donc un estimateur
X̂ (i1:N ) - qui est aléatoire, vu que les i1:N sont aléatoires aussi,
le plus proche de x. Proche dans ce cas peut être traduit de façon quantitative par un
critère de performance. Dans notre cas on considère comme critère de performance l’erreur
quadratique moyenne (EQM)
2 .
EQM = E X̂ − x
B.2. Estimation et quantification : algorithmes et performances
263
Si l’on impose que l’estimateur soit non biaisé i.e.
h i
E X̂ = x,
au moins quand N → ∞, on sait que l’estimateur qui minimise l’EQM asymptotiquement
(et donc qui maximise la performance asymptotiquement) est l’estimateur du maximum
de vraisemblance (MV) [Kay 1993, p. 160]. Le MV consiste à maximiser la fonction
de vraisemblance par rapport au paramètre inconnu. La vraisemblance est la distribution
conjointe des mesures (celles-ci étant figées après observation) et elle est une fonction du
paramètre inconnu (celui-ci considéré comme une variable). Pour le problème que l’on traite
ici, la vraisemblance pour un bloc de mesures indépendantes i1:N est
L (x; i1:N ) =
N
Y
P (ik ; x) ,
k=1
où P (ik ; x) est la probabilité d’avoir une valeur quantifiée ik à l’instant k pour un paramètre
x. On peut réécrire cette probabilité en fonction des seuils et de la fonction de répartition :
P (ik ; x) =
(
P (τik −1 6 Yk < τik ) , si ik > 0,
P (τik 6 Yk < τik +1 ) , si ik < 0,
avec la définition Yk = x + Vk donnée par (B.1)
(
P (τik −1 6 x + Vk < τik ) , si
P (ik ; x) =
P (τik 6 x + Vk < τik +1 ) , si
(
F (τik − x) − F (τik −1 − x) ,
=
F (τik +1 − x) − F (τik − x) ,
ik > 0,
ik < 0,
si ik > 0,
si ik < 0.
L’estimateur du MV est donné par
X̂M V,q = X̂M V (i1:N ) = argmax L (x; i1:N ) ,
x
ou de façon équivalente par
X̂M V,q = argmax log L (x; i1:N ) .
x
On se concentre maintenant sur les performances de cet estimateur, qui , à cause du
manque de résultats à taille d’échantillon finie, ne sont connues qu’en régime asymptotique.
L’EQM du MV peut être écrit, en général, sous la forme suivante
i2
2 h + Var X̂M V,q = biais2 + variance.
= E X̂M V,q − x
E X̂M V,q − x
Comme mentionné auparavant, le MV est asymptotiquement non biaisé:
i
h
= x.
E X̂M V,q
N →∞
264
B. Résumé détaillé en français (extended abstract in French)
Par conséquent, son EQM n’est caractérisée que par sa variance.
La variance asymptotique du MV atteint la BCR [Kay 1993, p. 160] (qui est aussi une
borne inférieure sur la variance des estimateurs non biaisés dans un contexte non asymptotique
[Kay 1993, p. 30]) :
∼
Var X̂M V,q
où le symbole
∼
N →∞
N →∞
BCRq ,
est utilisé pour représenter une équivalence.
La BCR est l’inverse de l’information de Fisher [Kay 1993, p. 30] Iq , qui est la variance
de la fonction score Sq . En partant de la fonction score pour N mesures quantifiées, on a les
expressions suivantes
Sq,1:N
=
Iq,1:N
=
Var X̂M V,q
∼
N →∞
∂ log L (x; i1:N )
∂x
(
)
2 ∂ log L (x; i1:N ) 2
E Sq,1:N = E
∂x
BCRq =
- fonction score,
- Fisher,
1
1
=
Iq,1:N
E
h
∂ log L(x;i1:N )
∂x
i2 - variance et BCR.
L’indice 1 : N est utilisé pour indiquer que ces quantités sont relatives à N mesures. Pour
simplifier, on utilisera la notation Sq et Iq dans le contexte d’une mesure quantifiée arbitraire.
Sous l’hypothèse de mesures indépendantes on a
Var X̂M V,q
∼
N →∞
BCRq =
1
.
N Iq
La fonction score pour une mesure Sq est
∂P(i ;x)
Sq =
k
∂ log L (x; ik )
∂x
=
∂x
P (ik ; x)
et l’information de Fisher correspondante est
Iq = E
(
∂ log L (x; ik )
∂x
2 )
=
X
ik ∈I
=
X
ik ∈I
"
∂P(ik ;x)
∂x
#2
P (ik ; x)
h
i2
∂P(ik ;x)
∂x
P (ik ; x)
P (ik ; x) ,
.
Si on note ε = τ0 − x la différence entre le seuil central et le paramètre, on peut réécrire Iq
sous la forme suivante :
N 
2 2 
ik = 2I  ′
′
′
′
X
f ε + τik −1 − f ε + τik
f ε − τik − f ε − τik −1 
+ .
(B.3)
Iq =
 F ε + τ′ − F ε + τ′
F ε − τ′
− F ε − τ′ 
ik =1
ik
ik −1
ik −1
ik
B.2. Estimation et quantification : algorithmes et performances
265
Influence du quantifieur sur la performance
La performance de l’estimateur est donc caractérisée par BCRq ou de façon équivalente par
Iq , par conséquent, pour étudier l’influence du quantifieur sur la performance d’estimation on
peut, de façon quantitative, étudier comment BCRq ou Iq se comportent en fonction de NI et
τ . On commence par quelques propriétés générales de Iq :
Perte induite par la quantification : si on note Sc et Ic la fonction score et l’information
de Fisher du problème d’estimation équivalent avec des mesures continues, on peut montrer
que
h
i
Ic − Iq = E (Sc − Sq )2 > 0.
Ce qui veut dire que Iq est majorée par Ic et qu’il existe hune perte de
i performance inhérente
2
à la quantification donnée de manière quantitative par E (Sc − Sq ) .
Monotonicité de Iq : on peut montrer aussi que si l’on ajoute un seuil à un vecteur de
seuils τ , alors l’information de Fisher correspondant au nouveau vecteur de seuils est toujours
plus grande ou égale à l’information de Fisher précédente. Cela veut dire que l’information
de Fisher croît de façon monotone en fonction de NI (pour une séquence de seuils construite
en ajoutant des seuils).
Une question qui se pose pour la suite est : comme on peut construire une séquence de
seuils telle que Iq croît de façon monotone en NI et comme on sait que Iq est majorée par Ic ,
est-ce que Iq converge vers Ic ? On répondra à cette question plus loin dans ce résumé.
Maintenant, on passe à l’étude de la performance d’estimation en fonction de la position
des seuils. On commence par le cas binaire.
Cas binaire : dans le cas binaire on peut utiliser l’expression de l’information de Fisher
(B.3) pour obtenir la BCR suivante
BCRB
q =
F (ε) [1 − F (ε)]
.
N f 2 (ε)
L’analyse de la performance se réduit alors à l’analyse de la fonction
B (ε) = N BCRB
q =
F (ε) [1 − F (ε)]
.
f 2 (ε)
h
2 i
L’étude de cette fonction dans le cas Gaussien (f (ε) = √1πδ exp − δε ) a été réalisée par
[Papadopoulos 2001] et [Ribeiro 2006a], son comportement est illustré en Fig. B.5.
On peut noter que la valeur minimale de B est atteinte lorsque ε = 0 et que B (ε) augmente
lorsque |ε| augmente. Par conséquent la valeur optimale du seuil τ0⋆ est égale à x et la valeur
2
minimale de B est B ⋆ = 4f 21(0) = πδ4 . Si on compare cette valeur avec la BCR pour les mesures
2
continues, BCRc ×N = δ2 , on peut constater que la perte produite par la quantification binaire
est d’environ 2dB, ce qui est, de façon surprenante, très peu.
266
B. Résumé détaillé en français (extended abstract in French)
B×
1
δ2
4
2
π/4
0
−1.5
−1
−0.5
0
0.5
1
1.5
ε
δ
Figure B.5: BCR normalisée B en fonction de la différence normalisée δε entre le seuil et le
paramètre. La distribution du bruit est Gaussienne et le facteur de normalisation δ est le
paramètre d’échelle de la Gaussienne. Des normalisations sont réalisées sur les deux axes pour
que la courbe affichée soit indépendante de δ.
Notez que pour avoir cette petite perte, il faut que τ0 = x, ce qui est impossible en pratique,
puisque x est le paramètre inconnu à estimer. Notez aussi que B est une fonction assez sensible
par rapport à la position du seuil, si l’on place τ0 loin du paramètre la performance d’estimation
est très dégradée.
On peut montrer que pour d’autres distributions couramment utilisées comme modèle de
bruit, tels que la distribution de Laplace et la distribution de Cauchy, des conclusions similaires
peuvent être obtenues :
• La valeur optimale du seuil de quantification est τ0⋆ = x.
• La perte due à la quantification est petite, si on utilise τ0⋆ .
• La performance se dégrade lorsque τ0 s’éloigne de x.
Cas asymétriques : même si pour plusieurs distributions de bruit couramment utilisées la
fonction B a un comportement symétrique en forme de « u », ce comportement ne se généralise
pas à toutes les distributions symétriques, comme on s’y attend intuitivement. Il suffit que la
condition suivante ne soit pas satisfaite
− f (2) (0) > 4f 3 (0) ,
pour que la fonction B ait ε = 0 comme maximum local. Ceci veut dire que pour des
densités ne respectant pas cette condition, le point de quantification optimal n’est pas x et la
quantification optimale doit être faite de manière asymétrique par rapport à la distribution
des mesures.
B.2. Estimation et quantification : algorithmes et performances
267
Un cas simple de distribution symétrique qui ne respecte pas cette condition est la distribution ad hoc suivante

α 2 
ε+ 2
1
1


, pour ε < − α2 ,
fGL (ε) = C √2πσ exp − 2

σ


f (ε) = fU (ε) = C √12πσ ,
pour − α2 ≤ ε ≤ α2 ,

α 2



1
1 ε− 2

, pour ε > α2 .
fGR (ε) = C √2πσ exp − 2
σ
Un exemple de BCR obtenue avec cette distribution (et de la performance pratique du
MV) est donné en Fig. B.6
1.2
·10−2
BCRB
q
Sim. – MV
EQM
1
0.8
0.6
−2
−1
0
ε
1
2
Figure B.6: BCRB
q et EQM simulée du MV pour un bruit distribué selon la loi ad hoc. La
borne et l’EQM simulée ont été évaluées pour N = 500 et ε dans l’intervalle [−2, 2]. L’EQM
du MV a été évaluée par une simulation Monte Carlo avec 105 réalisations de blocs de 500
échantillons. On a utilisé aussi : α = 1 et σ = 1.
Cas multibit : pour l’estimation avec MV et une quantification multibit, la performance
en fonction du quantifieur peut être étudiée au travers de l’analyse de l’information de Fisher
(B.3). Comme résultat de cette analyse on trouve que :
• la dynamique de quantification doit être proche du paramètre pour maximiser la performance d’estimation.
• Pour des variations de seuils symétriques bien choisies, le choix τ0 = x est optimal pour
plusieurs types de bruit (pour une classe plus large que dans le cas binaire).
• La performance se dégrade rapidement quand la dynamique de quantification est placée
loin du paramètre à estimer.
• Le problème d’optimisation de Iq en fonction de τ ′ est difficile à résoudre pour NB =
log2 (NI ) > 3.
268
B. Résumé détaillé en français (extended abstract in French)
Quantification adaptative : l’approche à haute complexité
La conclusion directe des résultats précédents est la suivante : on doit placer le seuil central
le plus proche possible du paramètre x. Or comme x est inconnu, on peut se baser sur les
dernières mesures quantifiées pour estimer x, et comme on s’attend à ce que l’estimateur soit,
au moins après un certain moment, proche de x, on placera le seuil central de quantification
exactement sur cette dernière estimation. Ceci équivaut donc à une approche d’estimation où
le processus de mesure, le quantifieur, est à tout instant adapté pour améliorer la performance
d’estimation.
Dans la littérature cette approche adaptative a été proposée en [Li 2007] et [Fang 2008],
dans le cas binaire et Gaussien.
La première méthode, proposée en [Li 2007], consiste à générer des estimations simples du
paramètre au niveau du capteur avec la mise à jour du seuil central donnée par
τ0,k = τ0,k−1 + γik ,
où γ est un pas d’adaptation. Les mesures quantifiées sont donc transmises à l’estimateur
distant qui possède suffisamment de puissance de calcul pour générer des estimations plus
précises en utilisant le MV. Dans ce cas, le MV consiste à maximiser la vraisemblance suivante
L (x; i1:N ) = P(i1:N ; x) =
N
Y
k=1
=
=
log L (x; i1:N ) =
N
Y
k=1
N
Y
P(ik |τ0,k−1 ; x)
[1 − F (τ0,k−1 − x)]
k=1
N X
k=1
P(ik |ik−1 , · · · , i1 ; x)
1+ik
2
F (τ0,k−1 − x)
1−ik
2
,
(B.4)
1 − ik
1 + ik
log [1 − F (τ0,k−1 − x)] +
log F (τ0,k−1 − x) .
2
2
Du fait de la symétrie du problème, on espère que le seuil τ0,k va tendre en moyenne vers le
point x, de cette façon le seuil central va fluctuer autour du vrai paramètre et donnera une
performance d’estimation proche de l’optimum.
La performance asymptotique de l’algorithme a été étudiée plus en détail dans [Fang 2008].
Elle est obtenue à partir de l’inverse de l’information de Fisher
N
X
f 2 (τ0,k−1 − x)
,
Iq,1:N =
E
F (τ0,k−1 − x) [1 − F (τ0,k−1 − x)]
k=1
où l’espérance est évaluée par rapport à la distribution de τ0,k−1 , qui maintenant n’est plus fixé,
ni déterministe. Sachant que la distribution des seuils tend vers une distribution asymptotique,
quand N → ∞, cette information de Fisher peut être approchée par l’information de Fisher
avec la distribution asymptotique des seuils p∞ :
lim
N →∞
Iq,1:N
⊤
= Ĩ′ q p̃∞ ,
N
B.2. Estimation et quantification : algorithmes et performances
269
où
I′ q = · · · ,
f 2 (−γ − x)
,
F (−γ − x) [1 − F (−γ − x)]
f 2 (0 − x)
f 2 (γ − x)
,
, ···
F (0 − x) [1 − F (0 − x)] F (γ − x) [1 − F (γ − x)]
⊤
.
p̃∞ étant de taille infinie, pour avoir la performance asymptotique en pratique, [Fang 2008]
propose de tronquer le vecteur p̃∞ .
Un problème avec l’approche adaptative qui vient d’être présentée réside dans la présence
d’une fluctuation asymptotique sur l’emplacement de la dynamique de quantification, ceci
entraîne une sous optimalité asymptotique vu que l’on devrait avoir τ0,k = x pour une performance optimale. Ce problème peut être résolu en utilisant un algorithme qui converge vers x
quand N → ∞. En ajoutant de la complexité au niveau du capteur ou un lien de retour de
l’estimateur vers le capteur, une idée assez directe consiste à utiliser la dernière estimation du
MV comme nouveau seuil central :
τ0,k = X̂M V,k .
Cette idée a été proposée initialement pour le cas binaire Gaussien dans [Fang 2008] où les
auteurs prétendent qu’asymptotiquement la performance en terme de variance est équivalente
à N I1q (0) . Ce qui équivaut à dire que l’algorithme est asymptotiquement optimal.
On peut étendre de façon assez naturelle cette approche adaptative au cas multibit et
non Gaussien. Moyennant certaines contraintes sur la fonction Iq (ε), on peut montrer que la
performance asymptotique est aussi optimale
1
BCRq ∼
.
N →∞ N Iq (0)
Pour réduire la complexité de l’algorithme, qui résout un problème d’optimisation à chaque
nouvelle mesure, dans [Papadopoulos 2001] une étude heuristique de la forme asymptotique du
MV dans le cas Gaussien binaire avec le seuil adaptatif a été réalisée, elle montre qu’asymptotiquement
le MV adaptatif a une forme récursive de très basse complexité :
√
δ π
ik .
X̂k = τ0,k = X̂k−1 +
2k
En se basant sur les propriétés asymptotiques du MV et en considérant que l’estimateur
est suffisamment proche du vrai paramètre, on peut montrer que, même dans un contexte non
Gaussien, le MV adaptatif est équivalent à une forme asymptotique simple
ik
.
X̂k ≈ X̂k−1 +
2kf (0)
On peut se poser quelques questions sur la forme équivalente simple donnée ci-dessus :
• peut-elle converger quand l’erreur initiale |ε| = |τ0 − x| est arbitraire (pas nécessairement
petite) ?
• Peut-on étendre cet algorithme à basse complexité au cas NI > 2 ?
On donnera les réponses à ces questions en Sous-section B.2.3.
270
B.2.2
B. Résumé détaillé en français (extended abstract in French)
Estimation d’un paramètre variable
On passe maintenant à l’estimation d’un paramètre variable.
Modèle du paramètre
Le paramètre à estimer est défini comme un processus stochastique, il n’est donc pas seulement
variable mais aussi aléatoire. A chaque instant k ∈ N⋆ , la variable aléatoire (v.a.) Xk est
donnée par le modèle de Wiener suivant :
Xk = Xk−1 + Wk ,
k > 0,
où Wk est le k-ème élément d’une séquence indépendante de v.a. Gaussiennes. Sa moyenne
2 . Si u = 0, alors X forme un
est donnée par uk et sa variance est une constant connue σw
k
k
processus de Wiener à temps discret classique, sinon, on l’appelle processus de Wiener avec
dérive. La distribution initiale Xk est supposée Gaussienne de moyenne x′0 et de variance
connue σ02 .
Modèle du quantifieur
Pour poursuivre le paramètre, on suppose que le quantifieur peut être dynamique avec un
vecteur de seuils donné par τ k :
h
τ k = τ− NI ,k · · · τ−1,k
2
τ0,k
τ1,k · · · τ NI ,k
2
i⊤
.
Les mesures quantifiées sont encore données par Q () définie en (B.2)
ik = Q (Yk ) ,
mais dans ce cas, cette fonction peut varier dans le temps.
Estimateur optimal
De façon analogue au cas constant, on veut un estimateur
X̂ (i1:k )
qui minimise l’EQM
EQMk = E
X̂k − Xk
2 .
Il est connu que l’estimateur optimal minimisant l’EQM est donné par la moyenne a posteriori
Z
X̂k = EXk |i1:k (Xk ) = xk p (xk |i1:k ) dxk ,
(B.5)
R
B.2. Estimation et quantification : algorithmes et performances
271
où p (xk |i1:k ) est la densité a posteriori. Cet estimateur est non biaisé et sa performance est
donnée par
EQMk = Ei1:k VarXk |i1:k (Xk )



X Z
2
=
xk − EXk |i1:k (Xk ) p (xk |i1:k ) dxk P (i1:k ) .
(B.6)


⊗k
i1:k ∈I
R
Filtrage particulaire
La solution donnée par (B.5) est difficile à mettre en œuvre de façon analytique car, dans la
plupart des cas, l’évaluation directe de la densité p (xk |i1:k ) et de l’intégrale n’est pas possible.
On doit donc utiliser une méthode numérique pour l’évaluation de la densité et de l’intégrale.
Dans notre cas, on peut utiliser la méthode de Monte Carlo, qui est une méthode d’intégration
numérique basée sur la simulation.
Dans le contexte du problème étudié, l’application de la méthode de Monte Carlo est
connue sous le nom d’échantillonnage d’importance avec rééchantillonnage, populairement,
elle est aussi connue sous le nom filtrage particulaire (FP), que l’on utilise pour la suite.
Un algorithme particulaire pour l’estimation du paramètre variable à partir de données
quantifiées est donné ci-dessous :
(j)
0. Définition des poids w̃ x0
= N1S et initialisation de NS particules (échantillons)
n
o
(1)
(N )
x0 , · · · , x0 S par l’échantillonnage de la densité
"
#
1
1 x0 − x′0 2
p (x0 ) = √
exp −
.
2
σ0
2πσ0
A chaque instant k,
(j)
1. pour j de 1 à NS , l’échantillonnage de Xk
est réalisé avec la d.d.p.

!2 
(j)
x
−
x
−
u
1
1
k
k
(j)
k−1
,
exp −
p xk |xk−1 = √
2
σ
2πσw
w
2. pour j de 1 à NS , on évalue et on normalise les poids
w
où P
(j)
x1:k
(j)
ik |xk
=P
(j)
ik |xk
w̃
(j)
x1:k−1
,
(j)
w̃ x1:k
(j)
w x1:k
,
= N
PS (j) w x1:k
j=1
est donnée par
(
F (τik ,k − xk ) − F (τik −1,k − xk ) , si ik > 0,
P (ik |xk ) =
F (τik +1,k − xk ) − F (τik ,k − xk ) , si ik < 0.
272
B. Résumé détaillé en français (extended abstract in French)
3. L’estimation est donc donnée par
x̂k ≈
NS
X
j=1
(j)
(j)
xk w̃ x1:k .
4. Finalement, on évalue le nombre effectif de particules utilisées
Neff =
N
PS
j=1
1
w̃2
(j)
x1:k
,
si Neff < Nseuil , alors une procédure de rééchantillonnage
multinomial doit être réalisée
(j)
(j)
(rééchantillonnage des valeurs x1:k avec les poids w x1:k comme probabilités de tirage).
Evaluation de la performance
Quand NS tend vers l’infini, il est connu que le filtrage particulaire converge vers l’estimateur
optimal. Dans ce cas, si l’on considère qu’un NS suffisamment grand est utilisé, on peut utiliser
(B.6) pour obtenir la performance du filtrage particulaire. Cependant, l’expression (B.6)
souffre du même problème que (B.5), l’impossibilité d’être évaluée analytiquement dans la
plupart des cas. Comme solution, on pourrait donc avoir recours à une procédure d’intégration
numérique similaire à celle utilisée pour obtenir l’estimateur. Le problème avec cette dernière
approche réside dans le fait que si l’on voulait utiliser la performance de simulation pour
la conception d’un système de mesure (choix de la qualité du capteur, choix de NI , etc)
on serait obligé de réaliser des simulations pour plusieurs valeurs possibles des paramètres.
Ceci demanderait un temps de simulation très long. Comme alternative, on utilise une borne
inférieure sur l’EQM, qui peut être obtenue de façon analytique. Cette borne est la version
Bayésienne de la BCR, la BCRB.
EQMk ≥ BCRBk =
1
,
Jk
(B.7)
où Jk est l’information Bayésienne donnée sous forme récursive par
Jk =
1
1
+ E [Iq (εk )] − 4 2
σw
σw
1
1
2
σw
+ Jk−1
.
(B.8)
L’innovation quantifiée
Pour les modèles symétriques de bruit couramment utilisés (Gaussien, Laplacien et Cauchy),
on sait que Iq (ε) est maximisée quand ε = 0 et que Iq (ε) décroît quand |ε| augmente. Or,
d’après (B.8), on voit que plus τ0,k est proche de la réalisation du paramètre xk , plus grande
est l’information Bayésienne et, par conséquent, plus petite est la BCRB. Si l’on suppose
que la BCRB est une borne suffisamment serrée, de façon à ce que l’on puisse accepter son
comportement comme une approximation de l’EQM, alors plus proche τ0,k est de xk , plus
B.2. Estimation et quantification : algorithmes et performances
273
petite est l’EQM. Ceci indique que pour avoir une performance d’estimation améliorée, la
dynamique de quantification doit se déplacer dans le temps de façon à suivre le paramètre.
L’approche –τ0,k = xk – est, encore une fois, impossible à mettre en œuvre car on ne connaît
pas xk . On doit alors accepter une perte de performance et utiliser la valeur la plus proche de
xk disponible, dans notre cas, la prédiction de xk . Ceci consiste donc à quantifier l’innovation
apportée par la nouvelle mesure. Avec le modèle d’évolution de Xk utilisé ici, on peut montrer
que cette prédiction est
X̂k|k−1 = τ0,k = X̂k−1 + uk .
(B.9)
On peut donc modifier l’algorithme particulaire, pour inclure la mise à jour adaptative du
centre du quantifieur (B.9). La performance du nouvel algorithme peut être approchée par la
BCRB (B.7).
Si l’on suppose que le signal est lent, i.e. que σw est petit devant l’écart type du bruit
de mesure, alors on peut s’attendre à ce que l’erreur d’estimation soit petite après un certain
temps, vu que l’estimateur a le temps de « moyenner » les mesures avant un changement
important d’amplitude du signal. On peut donc remplacer E [Iq (εk )] par sa borne supérieure
Iq (0) dans l’expression récursive pour l’information Bayésienne. Si on calcule le point fixe de
l’expression résultante pour σw petit, on trouve une expression asymptotique simple pour la
borne sur l’EQM :
σw
+ ◦ (σw ) ,
(B.10)
EQM∞ ≥ p
Iq (0)
où la notation ◦ (σw ) est utilisée pour représenter un terme qui est négligeable devant σw
quand σw → 0, c’est-à-dire, quand le signal est lent.
Estimateur asymptotique optimal d’un signal lent
On peut se demander si, de la même façon que pour le MV, il existe une forme asymptotique
simple pour l’estimateur optimal d’un paramètre variable quand on utilise τ0,k = X̂k−1 . En
effet, si l’on suppose encore une fois que σw est petit devant l’écart type du bruit, on peut
montrer que l’estimateur asymptotique optimal est donné par la forme récursive suivante :
fd ik , X̂k|k−1 , Xk X̂k|k−1 =Xk
σw
X̂k ≈ X̂k−1 + uk − p
,
(B.11)
P (ik |Xk )|X̂
Iq (0)
=Xk
k|k−1
où P (ik |Xk )|X̂
est la probabilité d’avoir la sortie ik quand X̂k|k−1 = Xk . Sa dérivée
par rapport à l’erreur εk évaluée au point εk = 0 est fd ik , X̂k|k−1 , Xk . Notez que
k|k−1 =Xk
X̂k|k−1 =Xk
cette forme a une complexité encore plus basse que celle du MV adaptatif, car maintenant le
gain qui corrige l’estimateur à chaque instant est constant.
On peut montrer qu’une approximation de l’EQM asymptotique de cet algorithme est
σw
.
EQM ≈ p
Iq (0)
274
B. Résumé détaillé en français (extended abstract in French)
Cette performance pour σw petit se raccorde bien avec la BCRB asymptotique donnée en
(B.10). On constate aussi que, de la même façon que pour le MV adaptatif, la performance
de l’estimateur optimal avec seuil central adaptatif dépend des caractéristiques de la mesure
(bruit et vecteur de seuils τ ′ ) au travers de l’information de Fisher Iq (0). Par conséquent,
pour caractériser complètement le quantifieur optimal pour l’estimation on doit, encore une
fois, maximiser Iq (0) par rapport à τ ′ . Comme on l’a mentionné précédemment, ce problème
est difficile à résoudre de façon directe pour NB > 3, on doit donc essayer de trouver une
approximation de la solution, cette approximation sera le sujet de la Section B.3.
Comme dans le cas du paramètre constant, on peut se poser la question suivante :
• est-ce
que la forme récursive (B.11) peut converger quand l’erreur initiale sur X̂k , |ε0 | =
X̂0 − X0 , est arbitraire (pas nécessairement petite) ?
On répondra à cette question dans la suite.
B.2.3
Quantifieurs adaptatifs pour l’estimation
Dans cette sous-section on traite des questions posées précédemment au sujet de l’application
des algorithmes asymptotiques de basse complexité. Pour cela, on impose d’abord la structure
de quantification adaptative, puis on définit un algorithme d’estimation général qui a comme
cas spécifiques les formes asymptotiquement optimales vues précédemment. Après la définition de l’algorithme, on s’intéresse à l’analyse de sa performance : son biais et sa variance
asymptotique. Suite à l’optimisation de sa performance par rapport aux paramètres libres
de l’algorithme, on analyse la perte de performance d’estimation par rapport au cas continu
(mesures continues). A la fin, on présente aussi des extensions de l’algorithme à d’autres problèmes : estimation conjointe de x et de l’échelle du bruit δ et estimation à partir de mesures
obtenues par plusieurs capteurs.
Modèle du signal
Dans la suite, le paramètre à estimer est considéré soit constant Xk = x, soit lentement
variable Xk = Xk−1 + Wk (avec σw petit et uk = u petite ou nulle).
Modèle du quantifieur
On a vu que le seuil central du quantifieur doit être mis à jour dynamiquement pour améliorer
la performance d’estimation. Pour rendre explicite cette caractéristique du quantifieur, on
imposera un biais réglable bk à l’entrée du quantifieur, pour régler l’amplitude de l’entrée on
1
. La fonction de quantification est donc donnée par
appliquera aussi un gain ∆
Y k − bk
ik = Q
.
∆
Avec un biais réglable et un gain d’entrée, on peut fixer la structure du quantifieur avec un
seuil central statique à zéro et d’autres seuils qui seront égaux aux décalages τ ′ .
B.2. Estimation et quantification : algorithmes et performances
275
La sortie du quantifieur réglable est donnée par
Y k − bk
|Yk − bk | ′
ik = Q
∈ τi−1 , τi′ .
= i sign (Yk − bk ) , pour
∆k
∆
A partir des mesures quantifiées, l’objectif est d’estimer le paramètre Xk , un objectif
secondaire est de régler les paramètres bk et ∆ pour avoir une performance d’estimation
améliorée. Comme l’estimateur X̂k de Xk peut être utilisé dans des applications temps réel,
il serait intéressant de l’estimer en ligne.
Dans les sous-sections précédentes on a vu que :
• dans le cas où Xk = x, si on place le centre du quantifieur sur la dernière estimation, on
peut avoir un algorithme asymptotiquement optimal.
• dans le cas où Xk est variable, la performance peut être améliorée si on place le seuil
central sur la prédiction du signal. Quand le signal a pour modèle un processus de
Wiener, la prédiction et donc le seuil central sont donnés par X̂k−1 et quand le modèle
a une dérive uk , le seuil est donné par X̂k−1 + uk .
Etant données ces observations et pour simplifier le problème (avoir une seule forme d’algorithme
pour tous les signaux), on posera bk = X̂k−1 .
Notez que ce choix entraîne une possible perte de performance quand le modèle a une
dérive. En réalité, si l’on utilise la prédiction X̂k−1 + uk au lieu de la dernière estimation,
les deux cas, sans et avec dérive, peuvent être traités de façon conjointe (ici on les traitera
sans perte de généralité comme étant le cas sans dérive). Le choix bk = X̂k−1 nous permet
d’étudier le comportement d’une approche sous-optimale.
Le schéma général d’estimation est donné en Fig. B.7. L’objectif maintenant est de définir
l’algorithme qui sera placé dans le bloc Mise à jour.
Algorithme d’estimation
On utilise comme estimateur l’algorithme adaptatif suivant :
"
!#
Yk − X̂k−1
,
X̂k = X̂k−1 + γk η Q
∆
(B.12)
où γk est une séquence de gains réels positifs et η[·] est une application de I vers R
η: I → R
j → ηj
n
o
caractérisée par NI coefficients η− NI , . . . , η−1 , η1 , . . . , η NI . Les coefficients η[·] peuvent être
2
2
vus comme des équivalents pour l’estimation des niveaux de sortie des quantifieurs dans un
contexte de quantification classique (quantification pour la reconstruction des mesures).
276
B. Résumé détaillé en français (extended abstract in French)
Quantifieur
réglable
Vk
Xk
τ2′
τ1′
Yk
1
∆
0
−τ1′
−
ik
Mesures
quantifiées
−τ2′
∆
Mise à jour
X̂k−1
X̂k
Estimateur
Figure B.7: Schéma général d’estimation. L’algorithme d’estimation est réalisé dans le bloc
Mise à jour.
Cet algorithme a les avantages d’être un algorithme en ligne, d’avoir une basse complexité
et d’inclure comme des cas spéciaux les formes récursives optimales des estimateurs avec
quantification adaptative.
A cause de la symétrie du problème et pour simplifier les développements présentés dans
la suite, on impose ce qui suit :
Hypothèse (sur les niveaux de sortie du quantifieur) :
AQ3 Les niveaux ont une symétrie impaire en i:
ηi = −η−i ,
avec ηi > 0 pour i > 0.
La non linéarité non différentiable en (B.12) rend difficile l’analyse directe de l’algorithme.
Pour s’en sortir, on peut utiliser les techniques présentées en [Benveniste 1990]. Ces techniques d’analyse sont basées sur des approximations de la moyenne de l’algorithme et sont
valables pour une classe assez générale d’algorithmes adaptatifs. Dans le contexte des algorithmes étudiés en [Benveniste 1990], la fonction η peut être une fonction non linéaire et non
différentiable et il est montré que les gains γk qui optimisent l’estimation de Xk ont les formes
suivantes :
• γk ∝
1
k
pour Xk constant.
• γk est constant pour Xk avec un modèle de Wiener.
2
• γk est une constante proportionnelle à u 3 pour Xk avec un modèle de Wiener qui contient
une dérive.
B.2. Estimation et quantification : algorithmes et performances
277
Dans ce qui suit, on utilise en (B.12) les séquences de gains données ci-dessus et on obtient la
performance de l’algorithme avec les techniques présentées en [Benveniste 1990].
Performance d’estimation
L’analyse de la performance de l’algorithme adaptatif est séparée en deux parties : l’analyse de
la trajectoire moyenne de l’algorithme et, par conséquent, de son biais et l’analyse de l’EQM
ou de la variance asymptotique.
Analyse de la moyenne
: cas constant et Wiener. Une approximation de la moyenne
de l’algorithme E X̂k dans le cas constant et Wiener est donnée par x̂ (tk ), où x̂ (t) est la
solution de l’équation différentielle ordinaire (EDO) suivante :
dx̂
= h (x̂) .
dt
La correspondance entre temps discret et continu est donnée par la relation tk =
k
P
γj et
j=1
h (x̂) est
x − x̂ + V
,
h (x̂) = E η Q
∆
où l’espérance est évaluée par rapport à la distribution marginale du bruit V .
Cette approximation est valable lorsque les gains γk sont petits, ce qui veut dire que
l’approximation est valable après un certain temps dans le cas constant (vu que les gains
décroissent en k) et pour tout k dans le cas Wiener, si on choisit un petit gain γk = γ (ce qui
doit être le cas, car pour poursuivre avec peu d’erreur un signal lentement variable, il faut que
les variations de l’estimateur soient petites).
On peut utiliser cette approximation de la moyenne pour obtenir une approximation du
biais ε (t) :
dε
= h̃ (ε) ,
dt
où h̃ (ε) = h (ε + x) est une fonction qui ne dépend pas du paramètre x.
En utilisant les hypothèses de symétrie AN2, AQ2 et AQ3, on peut démontrer que l’EDO
est globalement asymptotiquement stable, i.e., pour tout ε (0), on a ε (t) → 0 quand t → ∞.
Comme ε (t) approche le biais, l’algorithme est donc asymptotiquement non biaisé.
Analyse de la moyenne : cas Wiener avec dérive. Pour une dérive u petite, on s’attend
à ce que le gain γk = γ soit petit aussi (pour suivre le signal qui est lentement variable), dans
ce cas on peut aussi utiliser une approximation par EDO. Par contre, contrairement aux cas
précédents, on doit prendre en compte la variabilité de la moyenne de Xk . Ce qui fait que
maintenant, la moyenne de l’algorithme est obtenue en échantillonnant la solution de la paire
278
B. Résumé détaillé en français (extended abstract in French)
d’EDO suivante :
dx
dt
dx̂
dt
=
u
,
γ
= h̃ (x̂ − x) .
Si l’on soustrait les deux expressions, on a une EDO pour le biais
u
dε
= h̃ (ε) − .
dt
γ
(B.13)
La principale différence dans ce cas est le second terme à droite, qui fait que le biais n’est
pas asymptotiquement nul. Si l’EDO sans le second terme est globalement asymptotiquement
stable, on s’attend à une convergence du biais vers une valeur petite. Pour des petites valeurs
de biais, on peut linéariser (B.13) autour de zéro et obtenir une approximation du biais
asymptotique. Cette approximation est
E X̂k − Xk
≈
k→∞
u
.
γhε
où hε est la dérivé de h̃ (ε) évaluée en zéro.
EQM et variance normalisée. Les résultats de [Benveniste 1990] peuvent être utilisés
pour la caractérisation des fluctuations asymptotiques de l’algorithme adaptatif. Les fluctuations asymptotiques dans ce cas sont la hvariance
asymptotique
normalisée de l’erreur
i
√ 2
et l’EQM asymptotique pour
d’estimation de la constante σ∞ = lim Var k X̂k − x
k→∞
2 l’estimation de Xk variable EQMq,∞ = lim E X̂k − Xk
.
k→∞
Les expressions asymptotiques des fluctuations étant dépendantes de γ, on peut les minimiser par rapport aux gains. Les expressions des paires (gain optimal γ ⋆ , performance optimale) sont données en Tab B.1.
Signal
Constant
Wiener
Wiener avec dérive
Gain optimal
γ ⋆ = − h1ε
σw
γ⋆ = √
R
2 1
3
4u
⋆
γ = −hε R
Performance
2 = R
σ∞
h2
√
ε
w R
EQMq,∞ = σ−h
+ ◦ (γ ⋆ ) = σw σ∞ + ◦ (γ ⋆ )
ε
2
2
uR 3
2 3 + ◦ (γ ⋆ )
EQMq,∞ ≈ 3 4h
+ ◦ (γ ⋆ ) = 3 u4 σ∞
2
ε
Table B.1: Gains optimaux, EQM asymptotique et variance normalisée asymptotique de
l’algorithme adaptatif.
La quantité R dans ce tableau est la variance asymptotique normalisée des corrections de
B.2. Estimation et quantification : algorithmes et performances
279
l’algorithme quand X̂k = Xk
R =
x − x̂ + V
Var η Q
∆
x̂=x
NI
= 2
2
X
ηi2 F̃d (i, 0) ,
i=1
F̃d (i, 0) est la probabilité d’avoir la sortie i du quantifieur aussi quand X̂k = Xk .
Algorithme optimal et performance
Les performances asymptotiques présentées ci-dessus indiquent que la performance de l’algorithme
2 , qui est une fonction du vecteur de coefficients
dépend, dans les trois cas, de la quantité σ∞
i⊤
h
η = η1 · · · η NI . Par conséquent, pour maximiser la performance asymptotique, on doit
2
résoudre le problème de maximisation suivant
argmin
η
R
η ⊤ Fd η
=
argmin
2,
h2ε
η
2 (η ⊤ fd )
où Fd est une matrice diagonale donnée par
NI
,0 ,
Fd = diag F̃d (1, 0) , · · · , F̃d
2
et fd est le vecteur des dérivées des probabilités de sortie du quantifieur par rapport à X̂k
quand X̂k = Xk
⊤
NI
˜
˜
fd = fd (1, 0) · · · fd
,0
.
2
Ce dernier peut être vu comme le vecteur des différences entre les valeurs de la d.d.p. du bruit
′
pour des variations de seuil consécutives τi−1
et τi′ .
Ce problème peut être résolu facilement à l’aide de l’inégalité de Cauchy-Schwarz. En
tenant compte de la contrainte de positivité sur les coefficients, on trouve
η ⋆ = −Fd −1 fd .
2 minimum est donc
Le σ∞
2
σ∞
=

NI
2
−1
2
1
 X f˜d (i, 0) 
=
2


2 fd ⊤ Fd −1 fd
F̃ (i, 0)
i=1 d
=
1
.
Iq (0)
Dans le tableau suivant, on donne les gains et les performances asymptotiques optimales.
Notez que dans le cas de l’estimation d’une constante et d’un processus de Wiener lent,
l’algorithme a des performances asymptotiques optimales. Il est donc une alternative à basse
complexité aux algorithmes vus précédemment (le MV adaptatif et l’estimateur optimal adaptatif).
280
B. Résumé détaillé en français (extended abstract in French)
Signal
Constant
Wiener
Wiener avec dérive
Gain optimal
γ ⋆ = Iq1(0)
γ ⋆ = √σw
Iq (0)
2 1
3
γ ⋆ = I4u
2 (0)
q
Performance
2 = 1
σ∞
Iq (0)
EQMq,∞ = √σw + ◦ (σw )
Iq (0)
2
3
EQMq,∞ ≈ 3 4Iqu(0)
+ ◦ (γ ⋆ )
Table B.2: Gains optimaux, EQM et variance normalisée asymptotique de l’algorithme adaptatif pour η optimal.
Choix du gain d’entrée : pour simplifier le choix de la constante ∆, on peut considérer
que la fonction de répartition du bruit est caractérisée par un paramètre d’échelle δ :
x
,
F (x) = Fn
δ
où Fn est la fonction de répartition pour δ = 1. Dans ce cas, ∆
δ est un facteur clé pour
l’évaluation des coefficients η. Par conséquent, l’évaluation des coefficients peut être simplifiée
si on choisit
∆ = c∆ δ.
La constante c∆ peut être utilisée pour régler le gain d’entrée du quantifieur ou pour régler le
pas de quantification quand les seuils τ ′ sont uniformes et fixés à des valeurs qui ne peuvent
pas être modifiées.
Seuils optimaux. On voit que, dans les expressions pour les performances de l’algorithme
(Tab. B.2), l’influence des variations de seuil τ ′ se fait à travers la quantité Iq (0), donc, pour
optimiser les performances par rapport aux seuils on doit résoudre le problème d’optimisation
suivant :
Iq⋆ = argmax Iq (0) .
τ′
Or, comme on l’a mentionné précédemment, ce problème est difficile à résoudre en général
(pour NB > 3) et une approximation de la solution optimale sera présentée dans la Section
B.3. Pour les simulations qui seront présentées dans la suite, on imposera les variations de
seuil d’être uniformes :
⊤
′
′
′
′
′
τ = −τ NI = −∞ · · · − τ1 = −1 0 + τ1 = +1 · · · + τ NI = +∞ .
2
2
Sous cette contrainte, l’optimisation de la performance est faite par rapport à c∆ . La valeur
optimale de c∆ peut être obtenue de façon simple par recherche exhaustive.
Perte induite par la quantification. On peut comparer les performances asymptotiques
de l’algorithme adaptatif avec les performances asymptotiques de son équivalent qui utilise
des mesures continues :
B.2. Estimation et quantification : algorithmes et performances
281
• Cas constant : dans ce cas, on compare la performance asymptotique de l’algorithme
adaptatif
à pas décroissant avec la performance du MV avec des mesures continues –
Var X̂k
∼ kI1c .
k→∞
• Cas Wiener lent : la performance est comparée avec celle de l’estimateur optimal du
processus de Wiener lent avec des mesures continues – EQMc,∞ = √σwI + ◦ (σw ).
c
• Cas Wiener lent avec dérive : dans ce cas, on compare la performance de l’algorithme
adaptatif avec des mesures quantifiées avec celle de l’algorithme adaptatif avec des
2
2
3
+ ◦ u3 .
mesures continues – EQMc,∞ ≈ 3 4Icu(0)
Les pertes de performance relative en dB sont données en Tab. B.3.
Signal
Constant
Wiener
Wiener avec dérive
Perte I (0)
Lq = −10 log10 qIc
1
LW
q ≈ 2 Lq (σw petit)
2
≈ 3 Lq (σw et u petits)
D
LW
q
Table B.3: Pertes de performance asymptotique induites par la quantification.
Ce qui est surprenant dans ces résultats est le fait que la perte de performance est plus
petite dans le cas variable que dans le cas constant, ceci indique une certaine ressemblance
avec le phénomène de « dithering », connu en quantification classique. Dans la quantification
classique, un ajout de variabilité à l’entrée du quantifieur (ajout de bruit) peut améliorer les
performances de reconstruction après quantification. Dans les résultats présentés ci-dessus,
on voit que la variabilité intrinsèque au signal induit une perte de performance d’estimation
plus petite que dans le cas constant.
Simulations
Modèle de bruit : pour les résultats de simulation présentés dans la suite deux modèles
de bruit (respectant les hypothèses de travail) ont été utilisés. Ces modèles sont caractérisés
par les distributions suivantes :
• Gaussienne généralisée (GG). Cette distribution a pour d.d.p.
x β
β
exp − ,
fGGD (x) =
δ
2δΓ β1
où β est un paramètre de forme (réel et positif).
• Student-t (ST). Sa d.d.p. est
β+1
Γ β+1
2
1 x 2 − 2
1+
.
fST D (x) = √
β δ
δ βπΓ β2
282
B. Résumé détaillé en français (extended abstract in French)
Perte théorique : les résultats de simulation de l’algorithme seront comparés aux pertes
théoriques qui dépendent toutes de Lq , l’évolution de cette quantité en fonction de NB est
donnée en Fig. B.8.
4
GG - β = 1.5
GG - β = 2 (Gaussien)
GG - β = 2.5
GG - β = 3
ST - β = 1 (Cauchy)
ST - β = 2
ST - β = 3
Perte [dB]
3
2
1
0
1
2
3
4
Nombre de bits [NB ]
5
Figure B.8: Perte Lq induite par la quantification dans le cas constant pour différents nombres
de bits et différents types de bruit.
Notez que dans tous les cas, la perte est faible pour la quantification binaire (de 1 à 4
dB) et qu’elle décroît très rapidement avec NB pour devenir négligeable pour 4 ou 5 bits de
quantification.
Simulation pour le cas constant : on vérifie la convergence des pertes simulées pour NB
de 2 à 5 et plusieurs distributions de bruit dans la Fig. B.9.
Simulation pour le cas Wiener : dans la Fig. B.10, on vérifie que si le signal est lent (σw =
0.001), alors les résultats asymptotiques simulés sont très proches des résultats théoriques. Par
contre, dès que l’on s’éloigne de l’hypothèse de signal lent (σw = 0.1), les résultats théoriques
et simulés ont un certain écart.
Simulation pour le cas Wiener avec dérive : la Fig. B.11 montre les performances
asymptotiques simulées de l’algorithme adaptatif pour la poursuite du processus de Wiener
avec dérive. Le petit écart entre les résultats théoriques et simulés vient du fait que le gain
optimal γ ⋆ est calculé avec une estimation en ligne de u (qui est inconnue en pratique).
Comparaison avec les algorithmes à haute complexité : avant de passer aux extensions de l’algorithme adaptatif on discutera rapidement des différences entre l’algorithme
adaptatif proposé ici et les algorithmes vus dans les sous sections précédentes.
Etant donnée l’équivalence en termes de performance asymptotique de l’algorithme adaptatif et des solutions à haute complexité (le MV adaptatif et le FP adaptatif), des simulations
1
0.5
0
100
β
β
β
β
101
102
Temps [k]
0.1
0.05
0
100
103
β
β
β
β
= 1.5
= 2 (Gaussien)
= 2.5
=3
101
(a)
0.4
0.2
0
100
103
0.4
β = 1 (Cauchy)
β=2
β=3
Perte [dB]
Perte [dB]
0.6
102
Temps [k]
(b)
1
0.8
283
0.15
= 1.5
= 2 (Gaussien)
= 2.5
=3
Perte [dB]
Perte [dB]
B.2. Estimation et quantification : algorithmes et performances
101
102
Temps [k]
103
(c)
0.3
0.2
β=1 (Cauchy)
β=2
β=3
0.1
0
100
101
102
Temps [k]
103
(d)
Figure B.9: Perte induite par la quantification pour des distributions GG et ST et pour
NB ∈ {2, 3, 4, 5} quand Xk est constant. Pour chaque type de bruit il y a 4 courbes, les constantes sont les résultats théoriques et les courbes décroissantes sont les résultats simulés avec
l’algorithme adaptatif. Pour chaque paire de courbes, les résultats plus hauts correspondent
à moins de bits de quantification. En (a) on a les résultats pour un bruit GG avec NB = 2 et
3 et en (b) avec NB = 4 et 5. En (c) on présente les résultats pour un bruit ST avec NB = 2
et 3 et en (d) avec NB = 4 et 5.
des transitoires des algorithmes ont été réalisées pour les différencier de façon plus précise. Les
résultats de simulation indiquent que l’algorithme adaptatif peut atteindre des performances
similaires voire meilleures que le MV adaptatif pour l’estimation d’une constante et que, dans
le cas de la poursuite d’un signal variable, le FP semble être dans la plupart des cas plus performant. En conclusion, pour l’estimation d’une constante, l’algorithme adaptatif semble être
la meilleure solution, car il a une complexité très basse en comparaison avec le MV. Toutefois,
dans le cas de l’estimation du processus de Wiener, l’algorithme adaptatif ne sera la meilleure
solution que si les contraintes de complexité empêchent d’utiliser le FP (aussi très complexe).
Extensions de l’algorithme adaptatif
Paramètre d’échelle inconnu : une extension possible du problème d’estimation d’une
constante consiste à considérer que le paramètre d’échelle δ est inconnu et donc que l’on doit
estimer conjointement la paire (x, δ) à partir de mesures quantifiées.
Pour améliorer la performance d’estimation on peut envisager non seulement l’utilisation
de X̂k−1 comme biais du quantifieur, mais aussi de δ̂k−1 , pour régler le gain d’entrée du
quantifieur. Ceci est montré en Fig. B.12.
284
B. Résumé détaillé en français (extended abstract in French)
Cauchy - σw = 0.1
Cauchy - σw = 0.001
Gaussien - σw = 0.1
Gaussien - σw = 0.001
Cauchy – Théo.
Gaussien – Théo.
Perte [dB]
1
0.5
0
2
3
4
Nombre de bits [NB ]
5
Figure B.10: Perte induite par la quantification dans le cas Wiener pour différents nombres
de bits, écarts types des incréments du signal et types de bruits (Gaussien – GG avec β = 2
et Cauchy – ST avec β = 1).
Gaussien
Cauchy Gaussien
Cauchy -
Perte [dB]
0.4
0.3
- Sim.
Sim.
- Théo.
Théo.
0.2
0.1
2
3
4
Nombre de bits [NB ]
5
Figure B.11: Perte induite par la quantification dans le cas Wiener avec dérive pour différents
nombre de bits et types de bruit (Gaussien et Cauchy).
B.2. Estimation et quantification : algorithmes et performances
Quantifieur
réglable
285
τ2′
τ1′
Yk
1
c∆ δ̂k−1
ik
Mesures
quantifiées
0
−τ1′
−
−τ2′
δ̂k−1
X̂k−1
Mise à jour
X̂k
Estimateur
Figure B.12: Schéma d’estimation/quantification pour retrouver conjointement le paramètre
de centrage x et d’échelle δ.
Pour la mise à jour, on peut encore une fois utiliser un algorithme adaptatif de basse
complexité :
"
# "
#
Γ
X̂k
X̂k−1
ηx (ik )
,
=
+ δ̂k−1
ηδ (ik )
k
δ̂k
δ̂k−1
où Γ est une matrice 2 × 2 de gains.
Sous certaines hypothèses de convergence en moyenne de l’algorithme, on peut montrer
que les coefficients optimaux η x et η δ sont donnés par
(x)
η x = −F−1
d fd ,
(δ)
η δ = −F−1
d fd ,
(x)
(δ)
où Fd a déjà été détaillée plus haut et où fd et fd sont des vecteurs de dérivées des probabilités des sortiesdu quantifieur par rapport à X̂k et δ̂k respectivement. Ces dérivées sont
évaluées au point X̂k = x, δ̂k = δ .
Avec les coefficients optimaux on trouve les valeurs asymptotiques optimales de la covariance normalisée d’estimation P et du gain Γ⋆ :

2
δ

P = δ 2 Γ⋆ =
2
1
(x)T
fd
0
(x)
F−1
d fd
0
1
(δ)T
fd
(δ)
F−1
d fd

.
Les éléments de la diagonale de P sont les informations de Fisher pour l’estimation de
x et δ à partir des données quantifiées, l’algorithme est donc optimal asymptotiquement et
on voit que le fait de ne pas connaître δ ne dégrade pas les performances asymptotiques de
l’estimateur de x.
286
B. Résumé détaillé en français (extended abstract in French)
Approche multi capteur : une autre extension consiste à utiliser des mesures obtenues
de façon simultanée par plusieurs capteurs pour estimer un paramètre constant x.
Dans cette approche, chaque capteur quantifie une mesure continue
(j)
Yk
(j)
for j ∈ {1, · · · , Ns } ,
= x + Vk ,
où Ns est le nombre de capteurs, et transmet la mesure quantifiée à un centre de fusion. Le
centre de fusion utilise toutes les mesures des capteurs pour générer une estimation X̂k de x
qui est diffusée à tous les capteurs pour être utilisée comme seuil central des quantifieurs. Le
schéma qui représente cette approche est montré en Fig. B.13.
(1)
Vk
(1)
Quant. 1
ik
Capteur 1
(2)
Vk
(2)
x
Quant. 2
ik
ik
Capteur 2
Mise à jour
X̂k
Centre de fusion
(Ns )
Vk
(Ns )
Quant. Ns
ik
Capteur Ns
Figure B.13: Schéma d’estimation/quantification multicapteur avec un centre de fusion.
Pour la mise à jour, on peut utiliser l’extension suivante de l’algorithme adaptatif :
γ
η (ik ) ,
k
i
(N ) T
· · · ik s .
X̂k = X̂k−1 +
h
(1)
où ik est le vecteur des mesures quantifiées ik
Les coefficients η optimaux sont donnés cette fois par
Ns ˜(j) (j) X
fd i
.
η (i) = −
(j) (j) i
F̃
j=1 d
(B.14)
Pour ces coefficients, on trouve la variance asymptotique d’estimation normalisée et le gain
optimal suivants
1
2
.
(B.15)
σ∞
= γ⋆ = N
Ps P f˜d(j) 2 [i(j) ]
(j) (j)
j=1 i(j) ∈I (j) F̃d [i ]
B.3. Estimation et quantification : approximations à haute résolution
287
Les expressions (B.14) et (B.15) nous montrent que l’algorithme adaptatif pour l’approche
à un seul capteur s’étend de façon très naturelle à l’approche multicapteur : le coefficient
de l’algorithme η en multicapteur est la somme des coefficients en monocapteur et la performance ainsi que le gain sont donnés par l’inverse de la somme des informations de Fisher en
monocapteur.
B.3
Estimation et quantification : approximations à haute résolution
On présente maintenant des résultats concernant la caractérisation asymptotique des quantifieurs optimaux pour l’estimation. Le mot « asymptotique » dans ce cas vient du fait que l’on
suppose que le nombre d’intervalles de quantification NI est très grand. Comme on impose
aussi que les tailles ∆i des intervalles de quantification tendent vers zéro, on les appelle aussi
approximations à haute résolution.
B.3.1
Approximation à haute résolution de l’information de Fisher
Pour trouver la caractérisation asymptotique des quantifieurs optimaux et la performance
correspondante en termes d’estimation, on s’intéresse aux questions suivantes :
• comment décrire l’information de Fisher pour l’estimation d’un paramètre x en fonction
du quantifieur quand NI est grand ?
• Comment maximiser l’information de Fisher par rapport à la caractérisation du quantifieur ?
• Quelle est la performance optimale correspondante ?
Remarque : notez que dans la suite on n’impose pas que le problème soit un problème
d’estimation de paramètre de centrage.
Approximation asymptotique
Pour répondre à la première question, on va commencer par une réécriture de l’information
de Fisher :
h
i
Iq = Ic − E (Sc − Sq )2 .
(B.16)
Le membre de droite dans (B.16) peut être vu comme la perte L induite par la quantification.
L’espérance en L peut être écrite comme une somme d’intégrales sur les différents intervalles
de quantification qi :
NI Z X
∂ log f (y; x) ∂ log P (i; x) 2
L=
−
f (y; x) dy.
∂x
∂x
i=1 q
i
288
B. Résumé détaillé en français (extended abstract in French)
Des développements en série de Taylor nous donnent
L=
N I X
i=1
∆3
(y) 2
Sc,i fi i
12
+o
∆3i
,
(B.17)
(y)
où Sc,i est la dérivée du score par rapport à y évaluée au centre de l’intervalle de quantification
qi et fi est la d.d.p. des mesures continues aussi évaluée au centre de l’intervalle.
Pour obtenir la caractérisation de la perte en fonction du quantifieur, on définit la densité
d’intervalles λ :
1
, pour y ∈ qi .
(B.18)
λ (y) = λi =
NI ∆i
La densité d’intervalles est une fonction qui, si on l’intègre dans un intervalle donné, donne la
fraction d’intervalles de quantification dans cet intervalle.
Si l’on utilise (B.18) dans (B.17), si l’on fait NI → ∞ et si les ∆i convergent uniformément
vers zéro, on obtient le résultat suivant
2
Z ∂Sc (y;x) f (y; x)
∂y
1
dy.
lim NI2 L =
NI →∞
12
λ2 (y)
Si on revient à (B.16), le résultat asymptotique ci-dessus nous amène à l’approximation asymptotique de l’information de Fisher :
2
Z ∂Sc (y;x) f (y; x)
∂y
1
dy.
(B.19)
Iq ≈ Ic −
2
λ2 (y)
12NI
Celle-ci est la réponse de la première question. On constate que si l’intégrale du membre à
droite converge, alors Iq converge vers Ic quand NI → ∞ et, de cette façon, on répond aussi
à une question qui avait été posée précédemment (p. 265).
Si toutes les valeurs possibles en sortie du quantifieur sont quantifiées avec des mots binaires
de même taille NB = log2 (NI ), alors (B.19) peut être réécrite de la façon suivante :
Iq ≈ Ic −
2−2NB
12
Z
2
∂Sc (y;x)
f
∂y
λ2 (y)
(y; x)
dy.
On voit de façon explicite la convergence exponentielle en NB de Iq vers Ic .
Densité d’intervalles optimale : maintenant on répond à la deuxième question.
L’expression (B.19) nous montre directement que, pour maximiser la performance par
rapport au quantifieur, on doit minimiser l’intégrale de droite par rapport à λ. Ce problème
de minimisation peut être facilement résolu avec l’inégalité de Hölder, ce qui donne
2
∂Sc (y;x) 3 1
2
f 3 (y; x)
∂y
∂Sc (y; x) 3 1
⋆
f 3 (y; x) .
(B.20)
∝
λ (y) = R ∂Sc (y;x) 32 1
∂y
f 3 (y; x) dy
∂y
B.3. Estimation et quantification : approximations à haute résolution
289
Notez que, contrairement aux résultats asymptotiques pour la reconstruction des mesures
1
où λ⋆ (y) ∝ f 3 (y; x), en quantification optimale pour l’estimation le score du problème
d’estimation intervient sur la densité d’intervalles.
Si l’on remplace (B.20) en (B.19), on peut donner une réponse à la troisième question.
L’expression analytique de l’approximation asymptotique de l’information de Fisher optimale
est
"Z #3
2
3
1
∂S
(y;
x)
1
c
(B.21)
f 3 (y; x) dy .
Iq⋆ ≈ Ic −
∂y
12NI2
Approximation pratique des seuils optimaux : la définition de la densité d’intervalles
nous dit que le pourcentage d’intervalles jusqu’à l’intervalle qi , NiI doit être égal à l’intégrale
de la densité d’intervalles jusqu’à τi . Par conséquent, une approximation pratique des seuils
optimaux est donnée par
i
, pour i ∈ {1, · · · , NI − 1} ,
(B.22)
τi⋆ = Fλ−1
NI
où Fλ−1 est l’inverse de la fonction de répartition obtenue par intégration de la densité
d’intervalles λ.
Remarque sur la solution à débit variable : on pourrait aussi considérer que les sorties
du quantifieur sont encodées avec des mots de tailles égales au logarithme de leur probabilité,
ceci entraînerait une possible réduction de la taille moyenne des mots en sortie du quantifieur.
Cette solution est connue sous le nom d’encodage à débit variable.
La taille moyenne des mots en sortie du quantifieur avec l’encodage à débit variable est
donnée par l’entropie des mots de sortie. De la même manière que précédemment, où en
imposant un NB on a trouvé la densité d’intervalles optimale pour des mots de sortie de
taille égale, on peut s’intéresser au problème de quantification optimale avec encodage à débit
variable. Si l’on impose un débit moyen R, on peut montrer que la densité optimale est
∂Sc (y;x) ∂y λ⋆ (y) = R ∂S (y;x) c∂y dy
et l’information de Fisher maximale
Iq ≈ Ic −
où hy = −
R
1 −2
2
12
n
i
o
h
R
∂S (y;x) R−hy − log2 c∂y f (y;x) dy
,
f (y; x) log2 [f (y; x)] dy est l’entropie différentielle des mesures.
Le problème avec cette solution est que l’encodage des sorties du quantifieur dépend du
paramètre qui est inconnu. Même si on utilise une approche adaptative pour la quantification
avec une convergence vers l’encodage optimal, on ne respectera pas les contraintes de débit
moyen pendant toute la phase de convergence de l’algorithme adaptatif.
290
B. Résumé détaillé en français (extended abstract in French)
Application à l’estimation d’un paramètre de centrage : pour la distribution Gaussienne, la densité d’intervalles optimale et l’approximation de l’information de Fisher maximale
sont données par
" #
√ −(2N −1) i
2 h
x
B
1
y−x 2
32
.
(B.23)
I
≈
1
−
π
x
q,G
λG (y) = √ exp − √
,
δ2
δ 3π
3δ
Pour la distribution de Cauchy on a
λxC (y) =
1
δB
1 5
2; 6
h
h
1−
1+
y−x 2
δ
y−x 2
δ
i2
3
i5 ,
3
x
Iq,C
#
"
3
B 12 ; 65
1
≈ 2 1−
2−2NB +1 .
2δ
3π
(B.24)
Afin de valider les résultats théoriques, l’information de Fisher (B.3) a été évaluée avec
δ = 1 pour les deux distributions et pour
• les seuils optimaux pour NB ∈ {1, 2, 3}. Les seuils optimaux ont été obtenus par
recherche exhaustive. Pour NB ∈ {4, 5, 6, 7, 8} les résultats théoriques (B.23) et (B.24)
sont utilisés comme une approximation.
• la quantification uniforme pour NB ∈ {1, · · · , 8}. En plaçant le seuil central sur x,
l’intervalle de quantification optimal ∆⋆ est obtenu par maximisation de l’information
de Fisher. Dans ce cas aussi, le maximum est trouvé par recherche exhaustive.
• l’approximation pratique des seuils optimaux donnée par (B.22), pour NB ∈ {1, · · · , 8}.
Les résultats sont montrés en Tab. B.4.
NB
1
2
3
4
5
6
7
8
x = 2)
Gaussien (Ic,n
Approx.
Optimal
Uniforme
pratique
1.27323954†
1.76503630†
1.93090199†
1.97874454⋆
1.99468613⋆
1.99867153⋆
1.99966788⋆
1.99991697⋆
–
1.76503630
1.92837814
1.97841622
1.99353005
1.99807736
1.99943563
1.99983649
1.27323954
1.75128300
1.92740111
1.98038526
1.99489906
1.99869886
1.99967136
1.99991741
x = 0.5)
Cauchy (Ic,n
Approx.
Optimal
Uniforme
pratique
0.40528473†
0.43433896†
0.48474865†
0.49533850⋆
0.49883463⋆
0.49970866⋆
0.49992716⋆
0.49998179⋆
–
0.43433896
0.45600797
0.48136612
0.49204506
0.49656712
0.49851056
0.49935225
0.40528473
0.40528473
0.47893785
0.49504170
0.49879785
0.49970408
0.49992659
0.49998172
Table B.4: Information de Fisher Iq pour l’estimation d’un paramètre de centrage des distributions Gaussienne et Cauchy. En Optimal† se trouve l’information de Fisher maximale
obtenue par recherche exhaustive des seuils optimaux. Optimal⋆ est l’approximation asymptotique de l’information de Fisher maximale. En Uniforme, les valeurs de l’information
de Fisher pour la quantification uniforme optimale sont montrées. Les colonnes Approx.
pratique correspondent à l’information de Fisher obtenue avec l’approximation pratique des
seuils asymptotiquement optimaux.
On constate que, dans tous les cas, Iq converge rapidement vers Ic quand NB augmente.
B.3. Estimation et quantification : approximations à haute résolution
291
Ici encore, on voit que 4 ou 5 bits sont suffisants pour obtenir une performance d’estimation
proche de celle obtenue avec des mesures continues. La différence de performance entre la
quantification uniforme et non uniforme semble être plus importante pour la distribution de
Cauchy, mais pratiquement négligeable dans le cas Gaussien, ceci indique que la quantification uniforme est probablement une meilleure solution en pratique (étant donnée sa simplicité d’implantation). Finalement, on observe aussi que l’approximation asymptotique de
l’information de Fisher et sa valeur obtenue avec l’approximation pratique des seuils optimaux
sont très proches, même pour des valeurs petites de NB (NB = 4).
Utilisation de l’algorithme adaptatif : pour la réalisation pratique du quantifieur optimal dans l’estimation d’un paramètre de centrage, un problème important est la dépendance
explicite au paramètre x de l’approximation pratique des seuils optimaux τi⋆ . Une solution
pour résoudre ce problème et atteindre une performance asymptotique optimale, même en ne
connaissant pas x, consiste à utiliser l’algorithme adaptatif proposé en Sous-section B.2.3 :
X̂k = X̂k−1 +
1
η (ik ) ,
kIq
avec le vecteur de variations de seuil τ ′ donné par τ ⋆ avec x en (B.22) considéré comme étant
f (τ ⋆ ;x)−f (τi⋆ ;x)
égal à zéro et η (ik ) donnés par η (i) = − F τi−1
⋆ ;x . Si NB ≥ 4, pour k grand, on
( i⋆ ;x)−F (τi−1
)
s’attend à une performance d’estimation proche de l’optimale et, par conséquent, proche de
l’approximation suivante :
h i
1
Var X̂k ≈ BCRq ≈
,
kIq
où Iq est l’approximation asymptotique (B.21).
Les résultats de simulation pour des distributions de Gauss et de Cauchy avec NB = 4 et
5 indiquent la validité cette approche. Ils sont montrés en Fig. B.14.
Allocation de bits pour l’estimation d’un paramètre de centrage scalaire
On suppose maintenant que Ns capteurs mesurent, avec du bruit additif et indépendant d’un
capteur à l’autre, une constante x. En raison des contraintes de communication, la somme
des nombres de bits par mesure alloués à chaque capteur NB,i est contrainte à être égale à
une valeur NB . Le question que l’on se pose est la suivante : quelle est l’allocation de bits qui
maximise la performance d’estimation sous la contrainte de communication ?
Ceci équivaut de façon quantitative à résoudre le problème de maximisation suivant :
maximiser
en NB,i
sujet à
Iq =
Ns
X
Iq,i (NB,i ) ,
i=1
Ns
X
NB,i = NB ,
i=1
NB,i ∈ N,
où Iq,i (NB,i ) est l’information de Fisher maximale pour NB,i .
B. Résumé détaillé en français (extended abstract in French)
EQM × k
292
10−0.3
BCRq
Algorithme
10−0.35 0
10
101
102
103
Temps [k]
104
EQM × k
(a)
BCRq
Algorithme
100.35
100.3
100
101
102
103
Temps [k]
104
(b)
Figure B.14: EQM simulée pour l’algorithme adaptatif avec les seuils non uniformes asymptotiquement optimaux. Les mesures continues sont distribuées selon la loi de Gauss (a) et de
Cauchy (b). Les nombres de bits de quantification utilisés sont NB = 4 et 5. Les courbes qui
ont des valeurs asymptotiques plus hautes correspondent à NB = 4.
On peut trouver la solution analytique de ce problème par la comparaison de toutes les
combinaisons possibles des NB,i . Cependant, l’aspect combinatoire de la solution rend impossible en pratique l’application de cette solution, même pour quelques dizaines de capteurs.
Une autre solution possible consiste à utiliser les expressions asymptotiques analytiques
des informations de Fisher en levant la contrainte NB,i ∈ N. La solution du problème
d’optimisation sous ces nouvelles conditions peut être trouvée sous forme analytique et, si
on arrondi les NB,i trouvés de cette manière, on a une approximation pratique de la solution.
Solution à NB,i réels : si l’on considère que les distributions de bruit ont la même forme
mais des paramètres d’échelle δi différents, alors, en utilisant les approximations asymptotiques
on trouve


NB,i


δ
NB
s i
− log2 
=

Ns
Qs
N N
s
j=1
δj


.


On voit que les NB,i optimaux ne dépendent que des paramètres d’échelle des bruits.
B.4. Conclusions
293
L’approximation de l’information de Fisher dans ce cas est
#
"
x
Ic,n
2−2N̄B
κ′ (fn )
−
,
Iq ≈ Ns
2
2
12 GM δ12 , · · · , δN
HM δ12 , · · · , δN
s
s
x est l’information de Fisher pour un paramètre d’échelle unitaire, κ′ (f ) est une
où Ic,n
n
2, · · · , δ2
B
et
HM
δ
fonctionnelle de la d.d.p. du bruit aussi pour δ = 1, N̄B = N
1
Ns et
Ns
2
GM δ12 , · · · , δN
sont les moyennes harmoniques et géométriques des paramètres d’échelle.
s
On peut démontrer que l’approximation de Iq optimale ainsi obtenue est toujours plus
grande que celle donnée par une allocation de bits uniforme.
Solution à NB,i réels et positifs : si l’on contraint les NB,i à être positifs, on peut
démontrer que la solution optimale est obtenue en deux étapes. D’abord on choisit un ν qui
satisfait
Ns
X
[ν − log2 (δi )]+ = NB ,
i=1
où [x]+ = max (x, 0), puis, on obtient les NB,i avec
NB,i = [ν − log2 (δi )]+ .
On peut facilement vérifier que cette solution est équivalente à la procédure de « waterfilling » qui est utilisée pour l’allocation de puissance aux sous porteuses dans les modulations
multi porteuses.
B.4
Conclusions
Dans cette thèse, nous avons traité le problème d’estimation à partir de mesures quantifiées,
un problème qui attire depuis quelque temps l’attention de la communauté de traitement du
signal, en raison de l’essor des réseaux de capteurs. Nous avons traité, plus spécifiquement,
le problème d’estimation d’un paramètre de centrage scalaire, soit constant, soit lentement
variable avec un modèle de Wiener.
B.4.1
Conclusions principales
Nous avons observé que, pour la plupart des modèles de bruit considérés en pratique, la
performance d’estimation se dégrade lorsque la dynamique de quantification est loin de la
vraie valeur du paramètre. Ceci indique qu’une bonne performance d’estimation peut être
obtenue par une approche adaptative, où l’on place la dynamique de quantification grâce à
l’information donnée par l’estimation la plus récente du paramètre.
Avec le schéma adaptatif, nous avons vu que la perte de performance d’estimation induite
par la quantification est petite. Pour tous les cas testés, nous avons observé une perte de
performance petite pour 1–3 bits de quantification et une perte négligeable pour 4 ou 5 bits.
294
B. Résumé détaillé en français (extended abstract in French)
Ceci indique que dans un contexte d’estimation à distance, où le nombre de bits total est
contraint, il est possible qu’une solution multicapteur/basse résolution soit préférable à la
solution classique monocapteur/haute résolution.
Nous avons proposé des alternatives à basse complexité pour les algorithmes trouvés dans
la littérature et leurs extensions. Nous avons démontré que les algorithmes à basse complexité
proposés atteignent les mêmes performances asymptotiques que leurs pendants à haute complexité. En utilisant les approches à basse complexité, nous avons présenté des solutions assez
naturelles pour traiter des extensions du problème de base : l’extension au cas d’un paramètre
d’échelle inconnu et l’extension à plusieurs capteurs.
Pour traiter le problème de placement des seuils optimaux pour l’estimation quand un
grand nombre d’intervalles de quantification est utilisé, nous avons étudié une approche asymptotique. Cette approche asymptotique nous a permis d’obtenir une approximation pratique
des seuils optimaux ainsi qu’une expression analytique de la performance d’estimation optimale, dans ce cas l’information de Fisher optimale. Nous avons vu aussi avec cette approche
que la performance d’estimation avec des mesures quantifiées converge exponentiellement vite
vers la performance avec des mesures continues quand le nombre de bits de quantification augmente. En appliquant les résultats sur un problème d’estimation de paramètre de centrage,
nous avons constaté que l’approximation proposée, censée être valable seulement asymptotiquement, est valable pour un nombre petit de bits (4 dans ce cas), ceci indique que les
résultats asymptotiques peuvent être utilisés en pratique.
Nous avons montré, avec l’approche asymptotique, la dépendance des seuils asymptotiquement optimaux par rapport au paramètre inconnu. Ceci indique, encore une fois, l’importance
de l’approche adaptative qui permet de placer asymptotiquement les seuils sur leurs valeurs
optimales et donc d’obtenir une performance asymptotiquement optimale.
Nous voudrions aussi attirer l’attention sur le fait que la différence de performance entre
un schéma de quantification uniforme et un schéma non uniforme semble être petite pour
l’estimation d’un paramètre de centrage. Par conséquent, en pratique, si une forte contrainte
sur la complexité est présente, la quantification uniforme peut être préférable.
B.4.2
Perspectives
Cette « discussion » entre quantification et estimation sera terminée par la présentation des
possibles sujets de travaux futurs.
• Paramètre vectoriel et quantification vectorielle : ce sujet est une extension naturelle
du problème. Tandis que l’extension à la quantification vectorielle est assez directe (en
termes d’algorithmes d’estimation et de leurs performances asymptotiques), l’extension
aux paramètres vectoriels est moins directe car elle nécessitera une nouvelle définition
pour la performance d’estimation et un changement de la structure des algorithmes pour
prendre en compte les corrélations possibles entre les composantes vectorielles.
• Canaux bruités : un canal de communication bruité peut être intégré au problème de
différentes façons. La plus simple consistant à introduire un indice binaire pour chaque
B.4. Conclusions
295
mesure quantifiée et un modèle de canal binaire symétrique. Avec un étiquetage fixe des
mesures quantifiées, des extensions de l’algorithme de basse complexité proposé peuvent
être directement conçues. Cependant, si les indices ne sont pas fixés, le problème qui en
résulte, avec l’étiquetage des sorties du quantifieur, peut être très difficile à traiter.
D’autres extensions peuvent être envisagées par l’introduction d’un canal à amplitude
continue, par exemple des canaux à bruit additif et à évanouissements. Dans ce cas,
on sera obligé, encore une fois, d’ajouter le problème d’étiquetage et donc de traiter un
problème conjoint de conception d’encodeur/estimation.
• Estimation avec distribution de bruit inconnue: on a supposé depuis le début que la
distribution du bruit est connue, en pratique ceci ne sera pas toujours le cas et on sera
obligé de trouver d’autres approches d’estimation.
• Variations rapides : dans certaines parties de cette thèse nous avons supposé que le
signal est lentement variable. Sous cette hypothèse, nous avons vu que la perte de
performance induite par la quantification est petite. On peut se poser la question de
savoir si cette conclusion reste vraie lorsque le signal varie rapidement.
• Problème distribué : pour arriver aux applications classiques des réseaux de capteur, on
doit généraliser les algorithmes et résultats obtenus ici pour un capteur à un contexte
partiellement distribué, où certains capteurs recueillent l’information des capteurs qui
sont autour, ou complètement distribué, où tous les capteurs recueillent l’information.
• Temps continu : dans le cas d’un paramètre variable, nous avons considéré, depuis le
début, que le temps est discret et nous n’avons pas traité de l’échantillonnage. Un
sujet qui reste ouvert est donc l’estimation d’un signal à temps continu, échantillonné
et quantifié.
Bibliography
[Akyildiz 2002] I.F. Akyildiz, W. Su, Y. Sankarasubramaniam and E. Cayirci. A survey on
sensor networks. Communications magazine, IEEE, vol. 40, no. 8, pages 102–114, 2002.
(Cited in page(s) 17, 19 and 254.)
[Ali 1966] S.M. Ali and S.D. Silvey. A general class of coefficients of divergence of one distribution from another. Journal of the Royal Statistical Society. Series B (Methodological),
pages 131–142, 1966. (Cited in page(s) 207.)
[Arampatzis 2005] T. Arampatzis, J. Lygeros and S. Manesis. A survey of applications of
wireless sensors and wireless sensor networks. In Intelligent Control, 2005. Proceedings
of the 2005 IEEE International Symposium on, Mediterrean Conference on Control and
Automation, pages 719–724. Ieee, 2005. (Cited in page(s) 18 and 255.)
[Aysal 2008] T.C. Aysal and K.E. Barner. Constrained decentralized estimation over noisy
channels for sensor networks. Signal Processing, IEEE Transactions on, vol. 56, no. 4,
pages 1398–1410, 2008. (Cited in page(s) 20 and 256.)
[Bailey 1994] R.W. Bailey. Polar Generation of Random Variates with the t-Distribution.
Mathematics of Computation, pages 779–781, 1994. (Cited in page(s) 252.)
[Baker 2004] E.T. Baker and C.R. German. On the global distribution of hydrothermal vent
fields. Mid-ocean ridges: hydrothermal interactions between the lithosphere and
oceans, vol. 148, pages 245–266, 2004. (Cited in page(s) 173.)
[Benitz 1989] G.R. Benitz and J.A. Bucklew. Asymptotically optimal quantizers for detection
of iid data. Information Theory, IEEE Transactions on, vol. 35, no. 2, pages 316–325,
1989. (Cited in page(s) 19 and 256.)
[Benveniste 1990] A. Benveniste, M. Métivier and P. Priouret. Adaptive algorithms and
stochastic approximations. Springer-Verlag New York, Inc., 1990. (Cited in page(s) 113,
114, 115, 121, 122, 123, 136, 150, 151, 152, 158, 159, 166, 276 and 278.)
[Berzuini 1997] C. Berzuini, N.G. Best, W.R. Gilks and C. Larizza. Dynamic conditional
independence models and Markov chain Monte Carlo methods. Journal of the American
Statistical Association, vol. 92, no. 440, pages 1403–1412, 1997. (Cited in page(s) 86.)
[Blahut 1987] R.E. Blahut. Principles and practice of information theory. Addison-Wesley
Longman Publishing Co., Inc., 1987. (Cited in page(s) 208.)
[Borkar 1995] V.S. Borkar, S.K. Mitteret al. LQG control with communication constraints.
Technical report, Massachusetts Institute of Technology, Laboratory for Information
and Decision Systems, 1995. (Cited in page(s) 95.)
[Box 1958] G.E.P. Box and M.E. Muller. A note on the generation of random normal deviates.
The Annals of Mathematical Statistics, vol. 29, no. 2, pages 610–611, 1958. (Cited in
page(s) 248 and 250.)
297
298
Bibliography
[Boyd 2004] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University
Press, New York, NY, USA, 2004. (Cited in page(s) 57, 58 and 59.)
[Chong 2003] C.Y. Chong and S.P. Kumar. Sensor networks: evolution, opportunities, and
challenges. Proceedings of the IEEE, vol. 91, no. 8, pages 1247–1256, 2003. (Cited in
page(s) 18 and 255.)
[Costa 2003] J. Costa, A. Hero and C. Vignat. On Solutions to Multivariate Maximum αEntropy Problems. In Anand Rangarajan, Mário Figueiredo and Josiane Zerubia,
editors, Energy Minimization Methods in Computer Vision and Pattern Recognition, volume 2683 of Lecture Notes in Computer Science, pages 211–226. Springer
Berlin/Heidelberg, 2003. (Cited in page(s) 136.)
[Cover 2006] T.M. Cover and J.A. Thomas. Elements of information theory 2nd edition.
Wiley-Interscience, 2006. (Cited in page(s) 136, 187 and 188.)
[Crisan 2000] D. Crisan and A. Doucet. Convergence of sequential Monte Carlo methods.
Technical report, Signal Processing Group, Department of Engineering, University of
Cambridge, 2000. (Cited in page(s) 91.)
[Crowder 1976] M.J. Crowder. Maximum likelihood estimation for dependent observations.
Journal of the Royal Statistical Society. Series B (Methodological), pages 45–53, 1976.
(Cited in page(s) 62, 67 and 70.)
[Curry 1970] R. Curry, W.V. Velde and J. Potter. Nonlinear estimation with quantized
measurements–PCM, predictive quantization, and data compression. Information Theory, IEEE Transactions on, vol. 16, no. 2, pages 152–161, March 1970. (Cited in
page(s) 95.)
[Doucet 1998] A. Doucetet al. On sequential simulation-based methods for Bayesian filtering.
Technical report, 1998. (Cited in page(s) 83, 84, 85 and 86.)
[Doucet 2000] A. Doucet, S. Godsill and C. Andrieu. On sequential Monte Carlo sampling
methods for Bayesian filtering. Statistics and computing, vol. 10, no. 3, pages 197–208,
2000. (Cited in page(s) 246.)
[Durisic 2012] M.P. Durisic, Z. Tafa, G. Dimic and V. Milutinovic. A survey of military applications of wireless sensor networks. In Embedded Computing (MECO), 2012 Mediterranean Conference on, pages 196–199. IEEE, 2012. (Cited in page(s) 18 and 255.)
[Fang 2008] J. Fang and H. Li. Distributed adaptive quantization for wireless sensor networks:
from delta modulation to maximum likelihood. Signal Processing, IEEE Transactions
on, vol. 56, no. 10, pages 5246–5257, 2008. (Cited in page(s) 20, 32, 62, 63, 64, 66,
106, 112, 118, 256, 268 and 269.)
[Fine 1968] T. Fine. The response of a particular nonlinear system with feedback to each of
two random processes. Information Theory, IEEE Transactions on, vol. 14, no. 2, pages
255–264, 1968. (Cited in page(s) 63 and 239.)
Bibliography
299
[Gallager 1996] R.G. Gallager. Discrete stochastic processes, volume 101. Kluwer Academic
Publishers, 1996. (Cited in page(s) 239.)
[Gastpar 2008] M. Gastpar. Uncoded transmission is exactly optimal for a simple Gaussian
“sensor” network. Information Theory, IEEE Transactions on, vol. 54, no. 11, pages
5247–5251, 2008. (Cited in page(s) 19.)
[Gersho 1992] A. Gersho and R.M. Gray. Vector quantization and signal compression.
Springer, 1992. (Cited in page(s) 91, 109, 148, 174, 184 and 197.)
[Golub 1973] G.H. Golub. Some modified matrix eigenvalue problems. SIAM Review, pages
318–334, 1973. (Cited in page(s) 153 and 231.)
[Golub 1991] G.H. Golub and J.M. Ortega. Scientific computing and differential equations: An introduction to numerical methods. Academic Press, Inc., 1991. (Cited
in page(s) 116.)
[Gordon 1993] N.J. Gordon, D.J. Salmond and A.F.M. Smith.
Novel approach to
nonlinear/non-Gaussian Bayesian state estimation. In Radar and Signal Processing,
IEE Proceedings F, volume 140, pages 107–113. IET, 1993. (Cited in page(s) 86.)
[Gubner 1993] J.A. Gubner. Distributed estimation and quantization. Information Theory,
IEEE Transactions on, vol. 39, no. 4, pages 1456–1459, 1993. (Cited in page(s) 20
and 256.)
[Gupta 2003] R. Gupta and A.O. Hero III. High-rate vector quantization for detection. Information Theory, IEEE Transactions on, vol. 49, no. 8, pages 1951–1969, 2003. (Cited
in page(s) 19, 208, 212 and 256.)
[Hardy 1988] G.H. Hardy, J.E. Littlewood and G. Polya. Inequalities. Cambridge University
Press, 1988. (Cited in page(s) 43 and 184.)
[Herzig 2002] P. Herzig, M.D. Hannington and S. Petersen. Polymetallic massive sulphide
deposits at the modern seafloor and their resources potential. Technical report, 2002.
(Cited in page(s) 173.)
[Hoagland 2010] P. Hoagland, S. Beaulieu, M.A. Tivey, R.G. Eggert, C. German, L. Glowka
and J. Lin. Deep-sea mining of seafloor massive sulfides. Marine Policy, vol. 34, no. 3,
pages 728–732, 2010. (Cited in page(s) 173.)
[Hol 2006] J.D. Hol, T.B. Schon and F. Gustafsson. On resampling algorithms for particle
filters. In Nonlinear Statistical Signal Processing Workshop, 2006 IEEE, pages 79–82.
IEEE, 2006. (Cited in page(s) 86.)
[Intanagonwiwat 2000] C. Intanagonwiwat, R. Govindan and D. Estrin. Directed diffusion: a
scalable and robust communication paradigm for sensor networks. In Proceedings of
the 6th annual international conference on Mobile computing and networking, pages
56–67. ACM, 2000. (Cited in page(s) 17 and 254.)
300
Bibliography
[Jazwinski 1970] A.H. Jazwinski. Stochastic processes and filtering theory. Academic Press,
New York, 1970. (Cited in page(s) 29, 78, 91 and 97.)
[Karlsson 2005] G.R. Karlsson and F. Gustafsson. Particle Filtering for Quantized Sensor
Information. In 13th European Signal Processing Conference, EUSIPCO. EURASIP,
2005. (Cited in page(s) 87.)
[Kassam 1977] S. Kassam. Optimum quantization for signal detection. Communications, IEEE
Transactions on, vol. 25, no. 5, pages 479–484, 1977. (Cited in page(s) 19, 53 and 256.)
[Kay 1993] S.M. Kay. Fundamentals of statistical signal processing, volume 1: Estimation
theory. PTR Prentice Hall, 1993. (Cited in page(s) 38, 39, 40, 53, 62, 70, 88, 91, 95,
227, 263 and 264.)
[Khalil 1992] H.K. Khalil and J.W. Grizzle. Nonlinear systems. Macmillan Publishing Company New York, 1992. (Cited in page(s) 116.)
[Knuth 1997] D.E. Knuth. The art of computer programming, volume 2: Seminumerical
algorithms. Addison-Wesley, 1997. (Cited in page(s) 248.)
[Kong 1994] A. Kong, J.S. Liu and W.H. Wong. Sequential imputations and Bayesian missing
data problems. Journal of the American Statistical Association, vol. 89, no. 425, pages
278–288, 1994. (Cited in page(s) 85.)
[Lange 1989] K.L. Lange, R.J.A. Little and J.M.G. Taylor. Robust statistical modeling using
the t distribution. Journal of the American Statistical Association, pages 881–896,
1989. (Cited in page(s) 136.)
[Li 1999] J. Li, N. Chaddha and R.M. Gray. Asymptotic performance of vector quantizers with
a perceptual distortion measure. Information Theory, IEEE Transactions on, vol. 45,
no. 4, pages 1082–1091, 1999. (Cited in page(s) 188.)
[Li 2007] H. Li and J. Fang. Distributed adaptive quantization and estimation for wireless
sensor networks. Signal Processing Letters, IEEE, vol. 14, no. 10, pages 669–672,
2007. (Cited in page(s) 60, 62, 106, 118 and 268.)
[Longo 1990] M. Longo, T.D. Lookabaugh and R.M. Gray. Quantization for decentralized
hypothesis testing under communication constraints. Information Theory, IEEE Transactions on, vol. 36, no. 2, pages 241–255, 1990. (Cited in page(s) 19 and 256.)
[Luo 2005] Z.Q. Luo. Universal decentralized estimation in a bandwidth constrained sensor
network. Information Theory, IEEE Transactions on, vol. 51, no. 6, pages 2210–2219,
2005. (Cited in page(s) 20 and 256.)
[Marano 2007] S. Marano, V. Matta and P. Willett. Asymptotic design of quantizers for
decentralized MMSE estimation. Signal Processing, IEEE Transactions on, vol. 55,
no. 11, pages 5485–5496, 2007. (Cited in page(s) 20, 42, 208 and 256.)
Bibliography
301
[Marsaglia 2000] G. Marsaglia and W.W. Tsang. A simple method for generating gamma
variables. ACM Transactions on Mathematical Software (TOMS), vol. 26, no. 3, pages
363–372, 2000. (Cited in page(s) 249.)
[Molden 2007] D. Molden. Water for food, water for life: a comprehensive assessment of water
management in agriculture. Earthscan/James & James, 2007. (Cited in page(s) 27.)
[Nardon 2009] M. Nardon and P. Pianca. Simulation techniques for generalized Gaussian
densities. Journal of Statistical Computation and Simulation, vol. 79, no. 11, pages
1317–1329, 2009. (Cited in page(s) 249.)
[Papadopoulos 2001] H.C. Papadopoulos, G.W. Wornell and A.V. Oppenheim. Sequential
signal encoding from noisy measurements using quantizers with dynamic bias control.
Information Theory, IEEE Transactions on, vol. 47, no. 3, pages 978–1002, 2001. (Cited
in page(s) 20, 44, 53, 60, 66, 70, 72, 105, 106, 256, 265 and 269.)
[Picinbono 1988] B. Picinbono and P. Duvaut. Optimum quantization for detection. Communications, IEEE Transactions on, vol. 36, no. 11, pages 1254–1258, 1988. (Cited in
page(s) 19 and 256.)
[Poor 1977] H.V. Poor and J. Thomas. Applications of Ali-Silvey distance measures in the
design of generalized quantizers for binary decision systems. Communications, IEEE
Transactions on, vol. 25, no. 9, pages 893–900, 1977. (Cited in page(s) 19 and 256.)
[Poor 1988] H.V. Poor. Fine quantization in signal detection and estimation. Information
Theory, IEEE Transactions on, vol. 34, no. 5, pages 960–972, 1988. (Cited in page(s) 19,
20, 178, 207, 209, 214 and 256.)
[Puccinelli 2005] D. Puccinelli and M. Haenggi. Wireless sensor networks: applications and
challenges of ubiquitous sensing. Circuits and Systems Magazine, IEEE, vol. 5, no. 3,
pages 19–31, 2005. (Cited in page(s) 18 and 255.)
[Rhodes 1971] I. Rhodes. A tutorial introduction to estimation and filtering. Automatic Control, IEEE Transactions on, vol. 16, no. 6, pages 688–706, 1971. (Cited in page(s) 98.)
[Ribeiro 2006a] A. Ribeiro and G.B. Giannakis. Bandwidth-constrained distributed estimation
for wireless sensor networks-Part I: Gaussian case. Signal Processing, IEEE Transactions on, vol. 54, no. 3, pages 1131–1143, 2006. (Cited in page(s) 20, 44, 53, 58, 60,
63, 106, 256 and 265.)
[Ribeiro 2006b] A. Ribeiro and G.B. Giannakis. Bandwidth-constrained distributed estimation for wireless sensor networks-part II: Unknown probability density function. Signal
Processing, IEEE Transactions on, vol. 54, no. 7, pages 2784–2796, 2006. (Cited in
page(s) 20, 106 and 256.)
[Ribeiro 2006c] A. Ribeiro, G.B. Giannakis and S.I. Roumeliotis. SOI-KF: Distributed Kalman
filtering with low-cost communications using the sign of innovations. Signal Processing,
IEEE Transactions on, vol. 54, no. 12, pages 4782–4795, 2006. (Cited in page(s) 20,
75, 94, 95, 106 and 256.)
302
Bibliography
[Robert 1999] C.P. Robert and G. Casella. Monte Carlo statistical methods. Springer New
York, 1999. (Cited in page(s) 81, 82 and 245.)
[Ruan 2004] Y. Ruan, P. Willett, A. Marrs, S. Marano and F. Palmieri. Practical fusion of
quantized measurements via particle filtering. In Target Tracking 2004: Algorithms
and Applications, IEE, pages 13–18. IET, 2004. (Cited in page(s) 87.)
[Rubin 1988] D.B. Rubinet al. Using the SIR algorithm to simulate posterior distributions.
Bayesian statistics, vol. 3, pages 395–402, 1988. (Cited in page(s) 86.)
[Samorodnitsky 1994] G. Samorodnitsky and M.S. Taqqu. Stable non-Gaussian random processes: stochastic models with infinite variance. Chapman and Hall/CRC, 1994. (Cited
in page(s) 34.)
[Sigman 1999] K. Sigman. Appendix: A primer on heavy-tailed distributions. Queueing Systems, vol. 33, no. 1, pages 261–275, 1999. (Cited in page(s) 47.)
[Sukhavasi 2009a] R.T. Sukhavasi and B. Hassibi. The Kalman like particle filter : Optimal estimation with quantized innovations/measurements. arXiv:0909.0996, September 2009.
(Cited in page(s) 95.)
[Sukhavasi 2009b] R.T. Sukhavasi and B. Hassibi. Particle filtering for Quantized Innovations.
In Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International
Conference on, pages 2229–2232, April 2009. (Cited in page(s) 91.)
[Tichavsky 1998] P. Tichavsky, C.H. Muravchik and A. Nehorai. Posterior Cramér-Rao
bounds for discrete-time nonlinear filtering. Signal Processing, IEEE Transactions on,
vol. 46, no. 5, pages 1386 –1396, May 1998. (Cited in page(s) 88.)
[Tsitsiklis 1993] J.N. Tsitsiklis. Extremal properties of likelihood-ratio quantizers. Communications, IEEE Transactions on, vol. 41, no. 4, pages 550–558, 1993. (Cited in page(s) 19
and 256.)
[Van Trees 1968] H. L. Van Trees. Detection, estimation, and modulation theory. Part 1. New
York: John Wiley and Sons, Inc., 1968. (Cited in page(s) 37, 78, 88 and 208.)
[Varanasi 1989] M.K. Varanasi and B. Aazhang. Parametric generalized Gaussian density
estimation. The Journal of the Acoustical Society of America, vol. 86, pages 1404–
1415, 1989. (Cited in page(s) 136.)
[Villard 2010] J. Villard, P. Bianchi, E. Moulines and P. Piantanida. High-rate quantization
for the Neyman-Pearson detection of hidden Markov processes. In Information Theory
Workshop (ITW), 2010 IEEE, pages 1–5. IEEE, 2010. (Cited in page(s) 19 and 256.)
[Villard 2011] J. Villard and P. Bianchi. High-rate vector quantization for the Neyman–
Pearson detection of correlated processes. Information Theory, IEEE Transactions on,
vol. 57, no. 8, pages 5387–5409, 2011. (Cited in page(s) 19 and 256.)
[Wang 2010] L.Y. Wang, G. Yin, J.F. Zhang and Y. Zhao. System identification with quantized observations. Birkhauser, 2010. (Cited in page(s) 20, 32 and 256.)
Bibliography
303
[Wasserman 2003] L. Wasserman. All of statistics: a concise course in statistical inference.
Springer, 2003. (Cited in page(s) 53, 68 and 72.)
[You 2008] K. You, L. Xie, S. Sun and W. Xiao. Multiple-level quantized innovation Kalman
filter. In IFAC World Congress, volume 17, pages 1420–1425, 2008. (Cited in page(s) 75,
95 and 106.)
[Zhao 2004] F. Zhao and L. Guibas. Wireless sensor networks: an information processing
approach. Morgan Kaufmann, 2004. (Cited in page(s) 17 and 254.)
Abstract: With recent advances in sensing and communication technology, sensor networks
have emerged as a new field in signal processing. One of the applications of this new field is
remote estimation, where the sensors gather information and send it to some distant point
where estimation is carried out. For overcoming the new design challenges brought by this
approach (constrained energy, bandwidth and complexity), quantization of the measurements
can be considered. Based on this context, we study the problem of estimation based on
quantized measurements. We focus mainly on the scalar location parameter estimation problem, the parameter is considered to be either constant or varying according to a slow Wiener
process model. We present estimation algorithms to solve this problem and, based on performance analysis, we show the importance of quantizer range adaptiveness for obtaining optimal
performance. We propose a low complexity adaptive scheme that jointly estimates the parameter and updates the quantizer thresholds, achieving in this way asymptotically optimal
performance. With only 4 or 5 bits of resolution, the asymptotically optimal performance
for uniform quantization is shown to be very close to the continuous measurement estimation
performance. Finally, we propose a high resolution approach to obtain an approximation of
the optimal nonuniform quantization thresholds for parameter estimation and also to obtain
an analytical approximation of the estimation performance based on quantized measurements.
Keywords: estimation, quantization, compression, adaptive algorithms.
Résumé : L’essor des nouvelles technologies de télécommunication et de conception des
capteurs a fait apparaître un nouveau domaine du traitement du signal : les réseaux de capteurs. Une application clé de ce nouveau domaine est l’estimation à distance : les capteurs
acquièrent de l’information et la transmettent à un point distant où l’estimation est faite.
Pour relever les nouveaux défis apportés par cette nouvelle approche (contraintes d’énergie,
de bande et de complexité), la quantification des mesures est une solution. Ce contexte nous
amène à étudier l’estimation à partir de mesures quantifiées. Nous nous concentrons principalement sur le problème d’estimation d’un paramètre de centrage scalaire. Le paramètre est
considéré soit constant, soit variable dans le temps et modélisé par un processus de Wiener
lent. Nous présentons des algorithmes d’estimation pour résoudre ce problème et, en se basant
sur l’analyse de performance, nous montrons l’importance de l’adaptativité de la dynamique
de quantification pour l’obtention d’une performance optimale. Nous proposons un schéma
adaptatif de faible complexité qui, conjointement, estime le paramètre et met à jour les seuils
du quantifieur. L’estimateur atteint de cette façon la performance asymptotique optimale.
Avec 4 ou 5 bits de résolution, nous montrons que la performance optimale pour la quantification uniforme est très proche des performances d’estimation à partir de mesures continues.
Finalement, nous proposons une approche à haute résolution pour obtenir les seuils de quantification non-uniformes optimaux ainsi qu’une approximation analytique des performances
d’estimation.
Mots clés : estimation, quantification, compression, algorithmes adaptatifs.
GIPSA-lab, 11 rue des Mathématiques, Grenoble Campus BP 46,
F-38402 Saint Martin d’Hères CEDEX
Was this manual useful for you? yes no
Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Related manuals

Download PDF

advertisement

Languages